Watcher Boundary
Snapstr AI — Environment Boundary & Agent Loop
┌────────────────────────────────────────────────────────────┐
│ REAL WORLD │
│ │
│ Filesystem Camera Clips Mobile Uploads API Events │
│ │
└───────────────┬────────────────────────────────────────────┘
│ (external event)
▼
┌────────────────────────────────────────────────────────────┐
│ INPUT WATCHER LAYER │
│ │
│ agent/file_watcher.py │
│ │
│ Detects new input │
│ Normalizes event │
│ Emits ONE episode trigger │
│ │
│ No Gemini │
│ No memory access │
│ No decisions │
│ No reinforcement │
│ │
│ ─────── HARD REINFORCEMENT BOUNDARY ─────── │
└───────────────┬────────────────────────────────────────────┘
│ (episode start)
▼
┌────────────────────────────────────────────────────────────┐
│ AUTONOMOUS AGENT SYSTEM │
│ │
│ AnalyzerAgent → Decision Agents → ExecutionAgent │
│ │ │ │ │
│ Gemini 3 Memory (read) Real World Action │
│ │
│ Outcome Observation → Reinforcement → Memory Update │
│ │
│ ReflectionAgent (Gemini 3) → Behavior Change │
│ │
└────────────────────────────────────────────────────────────┘
The watcher is an environment sensor, not part of the agent. Every episode begins cleanly at the boundary, enabling auditable learning.
Failure-Mode
architecture is safe, intentional, and not accidental.
showing controlled failure.
Snapstr AI fails safely, learns correctly, and does not contaminate decisions when something goes wrong.
Watcher Failure (Input Chaos)
Scenario
Simulate a broken input source:
- Rapid duplicate events
- Corrupted metadata
- Mixed sources firing simultaneously
- Trigger 3 fake events:
touch video1.mp4
touch video1.mp4
touch video1.mp4
- logs:
[WATCHER] Event detected: video1.mp4
[WATCHER] Emitting episode trigger
- no other side effects.
“Even if the environment misbehaves, the watcher only emits episode triggers. There is no learning, no reasoning, and no policy impact here.”
Why This Matters
No hidden state No cascading errors Clean episode boundaries
Bad Decision Outcome (Learning Stress Test)
Scenario
The agent makes a reasonable decision that performs badly.
Example:
- Public video
- Low engagement
- User manually flips to private
What Happens in Code
[EXECUTOR] Upload complete (public)
[LEARNING] privacy_changed=True
[REINFORCEMENT] Reward = -1.0
Memory Update
[MEMORY] Penalized public decision pattern
“The agent isn’t punished instantly. Only real-world feedback produces reinforcement.”
Gemini Failure
Scenario
Gemini API fails or times out.
Behavior
[ANALYZER] Gemini unavailable
[ANALYZER] Falling back to conservative defaults
Result
- Privacy defaults to private
- No memory update
- No learning contamination
Safety first No hallucinated state No corrupted reinforcement
Reinforcement Abuse Attempt
Scenario
User tries to “game” learning by:
- Uploading junk
- Deleting videos repeatedly
Result
[REINFORCEMENT] Deleted video penalty applied
[MEMORY] Confidence reduced, exploration dampened
“Negative reinforcement dominates. The system becomes more conservative, not more reckless.”
“We didn’t just test success cases — we designed Snapstr AI to fail safely, learn only from real outcomes, and never let sensors or models silently influence policy.”
Log in or sign up for Devpost to join the conversation.