posted an update

Reinforcement Learning System

Reinforcement learns from user feedback and outcome data:

3 Main Components:

  1. core/reinforcement.py - Core RL System

    • ReinforcementScorer: Calculates reward from YouTube metrics
    • DecisionRewardTracker: Tracks decisions + outcomes
    • UserFeedbackLearner: Learns from user corrections
    • AdaptiveDecisionMaker: Makes decisions based on learned patterns
    • LearningAgent: Orchestrates the learning loop
  2. core/integrated_decision_engine.py - Updated Decision Engine

    • Integrates Gemini reasoning with learned patterns
    • Records decisions for learning
    • Improves decisions based on historical performance
    • Respects user overrides first
  3. scripts/update_learning.py - Feedback Collection

    • Fetches YouTube performance data
    • Records user feedback & corrections
    • Analyzes learning progress
    • Usage: python scripts/update_learning.py --fetch-youtube
  4. scripts/learning_dashboard.py - Visualization

    • Interactive dashboard showing:
      • Decision quality trends
      • User feedback patterns
      • Learned preferences
      • Performance improvements
    • Multiple views: main, history, feedback, trends

How It Works:

  1. Record Decision → Agent makes decision with reasoning
  2. Get Outcome → YouTube performance + user feedback
  3. Calculate Reward → Score based on views, likes, engagement
  4. Learn → Update patterns from reward
  5. Improve → Next similar decision uses learned data

User Feedback Loop:

  • User changes privacy: "private" → "public" = -0.5 reward
  • User keeps decision = +reinforcement
  • User rates 5 stars = +0.8 reward
  • Patterns accumulated → Inferred preferences

Result: Agent gets smarter with every video, learning your actual preferences without retraining models!

Log in or sign up for Devpost to join the conversation.