Reinforcement Learning System
Reinforcement learns from user feedback and outcome data:
3 Main Components:
core/reinforcement.py- Core RL SystemReinforcementScorer: Calculates reward from YouTube metricsDecisionRewardTracker: Tracks decisions + outcomesUserFeedbackLearner: Learns from user correctionsAdaptiveDecisionMaker: Makes decisions based on learned patternsLearningAgent: Orchestrates the learning loop
core/integrated_decision_engine.py- Updated Decision Engine- Integrates Gemini reasoning with learned patterns
- Records decisions for learning
- Improves decisions based on historical performance
- Respects user overrides first
scripts/update_learning.py- Feedback Collection- Fetches YouTube performance data
- Records user feedback & corrections
- Analyzes learning progress
- Usage:
python scripts/update_learning.py --fetch-youtube
scripts/learning_dashboard.py- Visualization- Interactive dashboard showing:
- Decision quality trends
- User feedback patterns
- Learned preferences
- Performance improvements
- Multiple views: main, history, feedback, trends
- Interactive dashboard showing:
How It Works:
- Record Decision → Agent makes decision with reasoning
- Get Outcome → YouTube performance + user feedback
- Calculate Reward → Score based on views, likes, engagement
- Learn → Update patterns from reward
- Improve → Next similar decision uses learned data
User Feedback Loop:
- User changes privacy:
"private" → "public"= -0.5 reward - User keeps decision = +reinforcement
- User rates 5 stars = +0.8 reward
- Patterns accumulated → Inferred preferences
Result: Agent gets smarter with every video, learning your actual preferences without retraining models!
Log in or sign up for Devpost to join the conversation.