Snapstr AI
# A policy-driven AI system that learns whether efficiency or revenue is the better optimization strategy—and switches automatically based on real performance data.
Problem
AI content tools typically hard-code a single optimization strategy:
- Speed & efficiency (produce content quickly), or
- Revenue maximization (optimize for high-paying ads and sponsors)
In real creator workflows, the optimal strategy changes over time due to:
- Algorithm shifts
- Sponsorship availability
- Burnout and time constraints
- Rising AI API costs
Most systems cannot adapt without manual retuning.
Solution
We built a policy-driven AI orchestration system that automatically learns which optimization strategy performs better and dynamically switches between them—without changing the underlying architecture.
The system supports two strategies:
- RTM (Revenue per Time): Efficiency-focused
- RPM (Revenue per 1,000 Views): Revenue-focused
Both strategies run through the same agents, analyzers, and orchestration layer. Only the reward definition changes.
Key Design Principle:
Separate “how the system works” from “what the system optimizes for.”
Core Technical Components
1. Policy Toggle (Single Source of Truth)
A single config flag determines system behavior:
- Agent prompts
- Bandit selection logic
- Reinforcement learning rewards
This avoids branching pipelines or duplicated logic.
2. Cost-Aware Reward Modeling
The system explicitly tracks:
- Gemini API token usage
- Infrastructure costs
- Estimated production time
This allows optimization to account for real economic tradeoffs, not just views or clicks.
3. Dual Reward Functions
| Mode | Reward Definition |
|---|---|
| RTM | Revenue ÷ Cost |
| RPM | Total Revenue |
Both share a quality gate to prevent low-quality exploitation.
4. Built-In A/B Testing
Each content item is randomly assigned to:
- RTM-optimized workflow
- RPM-optimized workflow
The system logs:
- Revenue generated
- Costs incurred
- Final reward score
This creates live, comparable strategy data.
5. Convergence Detection (Key Innovation)
A convergence tracker monitors:
- Reward stability
- Performance deltas between strategies
When variance drops below a threshold, the system:
- Identifies the dominant strategy
- Shifts policy weights automatically
- Maintains a small exploration window
This allows autonomous strategy selection rather than perpetual experimentation.
Frontend Transparency
A live dashboard shows:
- Current dominant strategy (RTM or RPM)
- Confidence level
- Average reward per strategy
This makes the AI’s decisions:
- Inspectable
- Explainable
- Trustworthy
Why This Is Technically Novel
- Strategy is learned, not hard-coded
- Same architecture supports multiple business objectives
- AI costs are first-class inputs
- The system knows when to stop experimenting
Most AI tools optimize outputs. This system optimizes decision policy.
Practical Impact
For creators and teams:
- Less manual analysis
- Fewer guesswork pivots
- Automatic adaptation to monetization changes
For AI systems:
- Better cost control
- Safer scaling
- More stable optimization behavior
Inspiration
Publishing video content sounds simple—until you actually do it at scale.
Creators, parents, and small teams constantly face the same questions:
- Is this video safe to post publicly?
- Should this be a Short or long‑form?
- When should it be uploaded?
- Did the last decision actually work?
Today, most tools stop at analysis or automation. We wanted to build something fundamentally different:
An autonomous agent that makes decisions, acts in the real world, observes outcomes, and changes its future behavior based on what actually worked.
That vision directly aligns with Gemini 3’s Action Era—AI systems that don’t just answer, but do.
What It Does
Snapstr AI is a multi‑agent, reinforcement‑driven video publishing system.
Given a raw video file, Snapstr AI:
- Analyzes the video using Gemini 3’s multimodal reasoning
- Debates decisions internally across specialized agents
- Publishes the video automatically
- Observes real‑world outcomes (views, engagement, corrections)
- Reinforces or penalizes its own decisions so future behavior improves
Over time, the agent learns your preferences and optimizes for both safety and performance.
Below is a one-to-one mapping from the conceptual steps you just saw to specific files, classes, and functions in the Snapstr AI architecture we’ve been designing.
Reinforcement-Driven Learning — Code-Level Mapping
We’ll walk the exact same scenario, but now every step points to real code locations.
STEP 0 — Trigger: New Video Appears
File
agent/file_watcher.py
Function
def on_new_video(video_path: str):
video_agent.process(video_path)
Responsibility
- Detects filesystem event
- No intelligence
- No Gemini calls
- No memory access
STEP 1 — Multimodal Analysis (Gemini 3)
Files
agents/analyzer_agent.py
core/gemini_analyzer.py
Call Chain
AnalyzerAgent.run(video_path)
└── GeminiAnalyzer.analyze_video(video_path)
Function
# core/gemini_analyzer.py
def analyze_video(self, video_path: str) -> dict:
response = self.client.generate_content(
video=video_path,
response_schema=VIDEO_ANALYSIS_SCHEMA
)
return response
What Happens
- Gemini 3 performs multimodal reasoning
- Output is strict JSON
- No decisions
- No side effects
STEP 2 — Decision Agents Run Independently
Files
agents/privacy_agent.py
agents/format_agent.py
agents/timing_agent.py
Privacy Decision
PrivacyAgent.run(analysis, user_prefs)
Internally:
MemoryAgent.query_privacy_pattern(analysis)
Uses:
- Past rewards
- Hard overrides
- No Gemini calls here
Format Decision
FormatAgent.run(analysis)
Internally:
MemoryAgent.query_format_performance()
Timing Decision
TimingAgent.run()
Internally:
MemoryAgent.query_best_upload_time()
Key Constraint
Decision agents only read memory. They do not write. They do not execute.
STEP 3 — Decision Arbitration
File
agents/decision_merger.py
Function
DecisionMerger.merge(privacy, format_, timing)
Output
{
"privacy": {...},
"format": {...},
"timing": {...},
"overall_confidence": 0.80
}
Why This Matters
- Disagreements are preserved
- Confidence is explicit
- Decisions are inspectable
STEP 4 — Real-World Action
File
agents/execution_agent.py
services/google_services.py
Call Chain
ExecutionAgent.run(video_path, analysis, decisions)
└── GoogleServices.upload_to_youtube(...)
Side Effects
- Video published
- YouTube ID returned
- No learning yet
STEP 5 — Decision Snapshot Stored
Files
agents/memory_agent.py
core/memory_system.py
Function
MemoryAgent.store_decision({
"analysis": analysis,
"decisions": decisions,
"youtube_id": video_id
})
Important This snapshot is immutable. Learning happens later, not now.
STEP 6 — Outcome Observation (Delayed)
File
agents/learning_agent.py
services/google_services.py
Function
performance = google.get_video_metrics(youtube_id)
LearningAgent.run(youtube_id, performance)
Asynchronous
- Happens hours or days later
- Separates action from consequence
STEP 7 — Reinforcement Scoring (The Learning Signal)
File
core/reinforcement.py
Function
ReinforcementScorer.score(performance)
Where Learning Happens (Mathematically)
Each published video receives a scalar reward:
[ R = w_v \cdot \text{views} + w_w \cdot \text{watch_ratio} + w_l \cdot \text{likes} - P ]
Where penalties (P) apply for:
- Manual privacy reversals
- Content deletion
- Safety violations
Memory updates reward-weighted statistics:
[ \text{avg_reward}_{a} = \frac{\sum R_a}{\text{count}_a} ]
Future decisions bias toward actions with higher expected reward.
Output
reward = 0.075
This is the scalar signal
Everything downstream keys off this number.
STEP 8 — Pattern Update (Behavior Changes)
File
core/memory_system.py
Functions
store_performance(youtube_id, performance)
_update_patterns(reward, record)
Example
stats["count"] += 1
stats["total_reward"] += reward
Crucial Point Future decisions will now change.
STEP 9 — Gemini 3 Reflection (Strategic Learning)
Files
agents/reflection_agent.py
core/gemini_analyzer.py
Trigger
if reward < LOW_REWARD_THRESHOLD:
ReflectionAgent.run(record)
Gemini Call
GeminiAnalyzer.reflect_on_outcome(
analysis=record["analysis"],
decisions=record["decisions"],
performance=performance
)
Output
{
"reflection": "...",
"suggested_adjustment": "lower privacy bias"
}
Gemini’s Second Role
- Cross-episode reasoning
- Strategic adjustment suggestions
- Not possible with rules alone
STEP 10 — Adjustment Applied
File
core/memory_system.py
Function
apply_adjustment("privacy_confidence_bias", -0.1)
This subtly changes future confidence calculations
End-to-End Loop Summary
file_watcher
→ AnalyzerAgent (Gemini 3)
→ Decision Agents (memory read)
→ DecisionMerger
→ ExecutionAgent (real world)
→ Memory snapshot
→ LearningAgent (delayed)
→ ReinforcementScorer
→ Memory pattern update
→ ReflectionAgent (Gemini 3)
→ Behavior shift
“Every autonomous decision Snapstr AI makes is logged, executed, scored against real-world outcomes, and then fed back into memory, where Gemini 3-powered reflection alters future agent confidence and behavior.”
Why This Is Not a Wrapper
Snapstr AI is not a prompt wrapper, a static workflow, or a one‑shot automation.
It is a persistent agent loop:
Analyze → Decide → Act → Observe → Learn → Adapt
Every decision is:
- Multi‑agent
- Logged with reasoning
- Scored with real‑world feedback
- Used to shape future decisions
The system becomes meaningfully better the longer it runs.
Multi‑Agent Architecture
Snapstr AI is composed of specialized agents, each with a single responsibility:
- AnalyzerAgent – Uses Gemini 3 to extract structured understanding from video
- PrivacyAgent – Decides public vs private, including hard safety overrides
- FormatAgent – Chooses Shorts vs long‑form based on content and outcomes
- TimingAgent – Determines upload timing using learned performance patterns
- DecisionMerger – Arbitrates agent decisions and confidence
- ExecutionAgent – Acts in the real world (publishing)
- MemoryAgent – Sole authority over long‑term memory
- LearningAgent – Updates behavior based on reinforcement
- ReflectionAgent – Uses Gemini 3 to explain why decisions succeeded or failed
Agents never call each other directly. They communicate only through structured messages and shared memory, making the system auditable and extensible.
agent/file_watcher.py
def on_new_video(video_path: str):
"""
Entry point for the autonomous agent loop.
This function initiates a full reinforcement-driven decision cycle:
- Triggers multimodal analysis (Gemini 3)
- Enables autonomous multi-agent decision making
- Leads to real-world action whose outcomes will later be scored
and used as reinforcement signals to update future behavior.
No decisions or learning occur here; this function only signals
the start of an episode in the agent's reinforcement loop.
"""
agents/analyzer_agent.py
def run(self, video_path: str) -> dict:
"""
Performs multimodal semantic grounding using Gemini 3.
This step converts raw video into structured, machine-readable
signals (people, activities, risk indicators) that downstream
decision agents use as state inputs in a reinforcement-driven system.
This function does NOT make decisions and does NOT access memory.
It exists to provide a consistent state representation for
outcome-based learning across episodes.
"""
core/gemini_analyzer.py
def analyze_video(self, video_path: str) -> dict:
"""
Uses Gemini 3's multimodal reasoning to extract semantic state
from raw video input.
The output of this function represents the environment state
for a reinforcement learning episode. It is intentionally
structured so that decision outcomes and rewards can be
correlated with specific semantic features over time.
Gemini 3 is used here for reasoning, not generation or formatting.
"""
agents/privacy_agent.py
def run(self, analysis: dict, user_prefs: dict) -> dict:
"""
Selects a privacy decision (public/private) based on:
- Current semantic state
- User hard constraints
- Historical reward-weighted outcomes
This agent reads reinforcement-informed patterns from memory
but does not update them. Its output will later be evaluated
against real-world outcomes and reinforced or penalized accordingly.
"""
agents/format_agent.py
def run(self, analysis: dict) -> dict:
"""
Chooses content format (short-form vs long-form).
This decision is influenced by historical reinforcement signals,
allowing the agent to favor formats that have previously
maximized engagement reward for similar content.
The agent itself does not learn; learning occurs after outcomes
are observed and rewards are computed.
"""
agents/timing_agent.py
def run(self) -> dict:
"""
Determines upload timing based on reinforcement-weighted
historical performance patterns.
Timing preferences evolve as reinforcement signals accumulate,
enabling adaptive scheduling behavior over long-running deployments.
"""
agents/decision_merger.py
def merge(self, privacy, format_, timing) -> dict:
"""
Arbitrates between independent agent decisions.
This function preserves per-agent confidence so that future
reinforcement updates can attribute success or failure
to specific decision components rather than treating the
outcome as a monolithic action.
"""
agents/execution_agent.py
def run(self, video_path: str, analysis: dict, decisions: dict) -> dict:
"""
Executes the selected action in the real world (publishing content).
This function marks the transition from decision-making
to environment interaction. Outcomes produced by this action
(engagement, corrections, deletions) will later generate
reinforcement signals that shape future agent behavior.
"""
agents/memory_agent.py
def store_decision(self, decision_record: dict):
"""
Stores an immutable snapshot of the agent's decision state.
This snapshot represents the action taken in a reinforcement
episode and is later paired with observed outcomes to compute
reward signals. No learning occurs at this stage.
"""
agents/learning_agent.py
def run(self, youtube_id: str, performance: dict):
"""
Initiates post-hoc learning after real-world outcomes are observed.
This function connects delayed environment feedback to prior
autonomous decisions, enabling reinforcement-driven updates
to future behavior without retraining models.
"""
core/reinforcement.py
def score(self, performance: dict) -> float:
"""
Converts real-world performance metrics into a scalar reward.
This reward serves as the reinforcement signal that determines
whether past decisions should be strengthened or weakened
in future decision cycles.
The scoring function is intentionally interpretable to
preserve transparency and auditability.
"""
core/memory_system.py
def store_performance(self, youtube_id: str, performance: dict):
"""
Associates observed outcomes with a prior decision episode
and applies reinforcement updates.
This function is the primary learning mechanism of the system:
it updates reward-weighted patterns that directly influence
future autonomous decisions.
"""
def _update_patterns(self, reward: float, record: dict):
"""
Updates decision patterns using reward-weighted aggregation.
Over time, this mechanism biases future decisions toward
actions that have historically produced higher reinforcement
signals, enabling adaptive behavior without model retraining.
"""
agents/reflection_agent.py
def run(self, decision_record: dict):
"""
Uses Gemini 3 to perform strategic reflection on low- or high-reward outcomes.
This agent reasons across:
- The semantic state (analysis)
- The autonomous decisions taken
- The observed reinforcement signal
Its output produces qualitative explanations and quantitative
adjustment suggestions that further shape future agent behavior.
"""
“Every function that makes or influences a decision is explicitly tied to a reinforcement signal derived from real-world outcomes, and Gemini 3 is used only where reasoning across state, history, and strategy is required.”
Reinforcement‑Driven Learning (Key Innovation)
Snapstr AI does behavioral reinforcement, not model retraining.
Each published video receives a reward score based on real outcomes:
- Views
- Likes
- Watch time
- Penalties if privacy was manually changed
- Strong penalties if content was deleted
These scores directly influence future decisions.
For example:
- If public Shorts consistently outperform → confidence increases
- If public uploads get reversed → privacy confidence decreases
This allows Snapstr AI to adapt safely and autonomously, without fine‑tuning models or collecting sensitive training data.
Example: Reinforcement-Driven Learning
(End-to-end process flow, no abstractions)
Scenario
A user drops a 45-second family video into the watched folder.
The agent has some history, but not much.
Step 1 — AnalyzerAgent (Gemini 3)
Input
video_path = "2026-01-park-play.mp4"
Gemini 3 output (structured, not text blob)
{
"people": [
{ "id": "child_1", "age_estimate": 6 },
{ "id": "adult_1", "age_estimate": 34 }
],
"activity": "playing at a public park",
"risk_signals": ["minor_present"],
"duration_sec": 45,
"summary": "A child playing on playground equipment with a parent nearby.",
"suggested_title": "Afternoon at the Park"
}
Mechanism
- Gemini 3 performs multimodal reasoning
- Output is machine-consumable JSON
- No decisions yet
Step 2 — Competing Decision Agents
PrivacyAgent
{
"decision": "private",
"confidence": 0.82,
"reasoning": "Child detected in video; past similar videos had negative outcomes"
}
FormatAgent
{
"decision": "shorts",
"confidence": 0.91,
"reasoning": "Duration under 60 seconds"
}
TimingAgent
{
"decision": "now",
"confidence": 0.67,
"reasoning": "No strong timing pattern yet"
}
Step 3 — DecisionMerger
{
"privacy": "private",
"format": "shorts",
"timing": "now",
"overall_confidence": 0.80
}
Important
- Decision is not random
- Confidence is computed
- Reasoning is preserved
Step 4 — ExecutionAgent
The video is uploaded as:
- Private
- YouTube Shorts
- Immediate publish
A record is stored in memory:
{
"video_id": "yt_123",
"analysis": {...},
"decisions": {...},
"timestamp": "2026-01-09"
}
Step 5 — Outcome Observation (Delayed)
48 hours later, Snapstr AI fetches performance:
{
"views": 12,
"likes": 0,
"watch_ratio": 0.21,
"privacy_changed": false,
"deleted": false
}
Step 6 — Reinforcement Scoring
ReinforcementScorer computes:
score =
views_score = 0.012
+ watch_time_score = 0.063
+ likes_score = 0.000
--------------------------------
= 0.075 (low reward)
Key point This is real-world feedback, not simulated.
Step 7 — Pattern Update (This Is the Learning)
Memory updates:
"patterns": {
"privacy": {
"private": {
"count": 7,
"total_reward": 1.82
},
"public": {
"count": 3,
"total_reward": 2.40
}
}
}
Now the average reward is:
- private →
1.82 / 7 = 0.26 - public →
2.40 / 3 = 0.80
Result Even though private is more common, public performs better.
Step 8 — Future Behavior Changes
Next time a similar video appears:
{
"decision": "public",
"confidence": 0.87,
"reasoning": "Similar child-present videos historically performed better when public"
}
This is the learning moment The agent changed behavior because of outcomes, not rules.
Gemini 3’s Role
Gemini 3 is used as a reasoning engine, not a formatter.
Specifically, Gemini 3:
- Performs multimodal video understanding
- Identifies people, activities, and risk signals
- Generates structured semantic analysis
- Produces reflection narratives explaining success or failure
Its long‑context reasoning allows Snapstr AI to connect past decisions, current context, and future strategy—a core requirement for long‑running agents.
Example: Gemini 3’s Role
(Where Gemini 3 is essential, not replaceable)
Gemini 3 is used in two specific, high-leverage places:
A. Multimodal Semantic Grounding (Before Decisions)
Why Gemini 3 Matters Here
A classical CV model could say:
“There is a person and playground equipment.”
Gemini 3 reasons:
“This is a minor in a public setting, which historically impacts privacy and engagement outcomes.”
Mechanism
analysis = gemini.analyze_video(
video_path,
output_schema=STRICT_JSON_SCHEMA
)
Gemini 3:
- Binds visual context → semantic meaning
- Produces decision-ready signals
- Enables downstream agents to reason symbolically
Without Gemini 3:
- No risk signals
- No structured reasoning
- No explainability
B. ReflectionAgent (After Outcomes)
This is where Gemini 3 becomes strategic.
Trigger Condition
if reward < 0.2:
reflection_agent.run(...)
Gemini 3 Prompt (Conceptual)
“Given this analysis, decision, and outcome, explain why the decision underperformed and suggest an adjustment.”
Gemini 3 Output
{
"reflection": "Videos featuring minors tend to underperform when private because discoverability is reduced.",
"suggested_adjustment": "Increase confidence threshold for private-only decisions when engagement history is low."
}
Mechanism
Gemini 3 reasons across:
- Past decisions
- Current outcome
- Policy constraints
Produces actionable strategy, not text fluff
Memory Update
{
"reflection": "...",
"applied_adjustment": "privacy_confidence_bias -= 0.1"
}
Now future decisions are subtly altered.
“Snapstr AI doesn’t just call Gemini once. Gemini is embedded at two critical points: semantic grounding before decisions and strategic reflection after outcomes. Reinforcement learning closes the loop by turning real-world performance into behavior change.”
- Prompt apps
- Vision demos
- Static pipelines
Snapstr AI uses Gemini 3 for multimodal semantic reasoning and post-hoc reflection, then applies reinforcement scoring on real-world outcomes to continuously reshape future autonomous decisions.
Demo Scenario
In the demo, we show:
- First upload – No prior memory → cautious decisions
- Second upload – Early reinforcement applied
- Third upload – Behavior visibly changed due to learned rewards
We also surface:
- Agent disagreements
- Decision confidence
- Reinforcement scores
- Reflection explanations
Why This Matters
Snapstr AI demonstrates how Gemini 3 enables:
- Autonomous systems that operate over time
- Multi‑agent reasoning instead of single prompts
- Safe learning from real‑world feedback
- Transparent, inspectable decision‑making
This is the kind of system required for the next generation of AI assistants—ones that act responsibly, adapt continuously, and earn user trust.
What’s Next
- Expanded reinforcement signals
- Multi‑persona agents (e.g. parent vs brand vs team)
- Externalized agents running at different cost tiers
- Long‑term preference modeling
Snapstr AI is designed to grow, learn, and evolve. It is part of a creator ecosystem that adapts: AWS,NVIDIA https://devpost.com/software/vidcraft-aws-nvidia-use-ai-to-give-power-to-every-voice Gemini, ElevenLabs https://devpost.com/software/vidcraft-vids-on-your-phone-uploaded-ai-sees-spins-tale SpoonOS, SpoonAI https://devpost.com/software/video-auto-uploader-spoonos-react-agents-mcp-neo-blockchain Use it to build a community https://devpost.com/software/ai-community-building Showcase technology https://devpost.com/software/korea grassroots marketing using the most influential creator https://devpost.com/software/skop
Built For the Gemini 3 Hackathon
We refactored the system from a sequential pipeline into a multimodal, agentic feedback loop, replacing linear execution with native multimodality and autonomous self-learning.
1. Architectural Pivot: From Linear Pipeline to Closed Loop
The original architecture followed a one-way flow: Analysis → Decision → Execution. We introduced auto-learning to form a closed feedback loop, enabling the system to re-evaluate its own outputs and improve subsequent uploads.
Background Polling
A scheduled task invokes rewatch_and_learn 24–48 hours post-upload, allowing the system to reassess real-world performance and incorporate outcomes into future decisions.
2. Logic Upgrade: Multimodal Aesthetic Learning
Visual Fingerprinting
The LearningAgent now captures not only engagement metrics (views, likes, CTR) but also the visual delta between AI-generated recommendations and user-applied edits.
Weighted Euclidean Similarity
Using calculate_similarity, the system detects when a user has meaningfully overridden the AI’s aesthetic.
If a user’s manual color grade produces a higher engagement_score, the MemoryAgent elevates that visual style as the preferred prior for comparable future contexts.
3. Agentic Autonomy: Self-Correcting Metadata
This introduces true agentic behavior—initiative without prompts.
Autonomous NLP Adaptation
When improvement_vs_baseline indicates a significant post-edit lift in CTR, the system automatically updates its metadata and NLP patterns for that content category—no manual retraining required.
Code Injection Summary
We extended the VideoOrchestrator.run() method to register every upload for automatic post-hoc evaluation:
# existing upload code...
execution = executor.run(video_path, analysis, decisions)
# NEW: Register this upload for an Auto-Audit in 24 hours
self.auto_learner.schedule_audit(
video_id=execution['youtube_id'],
original_profile=self.color_detector.detect_from_frames(video_path),
context=analysis
)
This is no longer a passive optimization tool—it’s an autonomous channel partner, continuously learning from user intent and real-world performance to compound gains with every video published.
Built With
- gemini-3-pro-preview
- python
Log in or sign up for Devpost to join the conversation.