YouTube Agent

Agent that extracts most relevant video segments. Agent personalizes over time with RL. Agent decides to fine tune model based LLM judge

Inspiration

We've all been there - spending 45 minutes watching a tutorial just to find a 2-minute explanation buried somewhere in the middle. With over 500 hours of content uploaded to YouTube every minute, finding specific information within videos is incredibly frustrating. Traditional search only works on titles and descriptions, completely ignoring the wealth of knowledge in spoken content. We wanted to build an intelligent system that could understand what you're looking for and learn from your preferences to get better over time.

What it does

YouTube Agent is a Chrome extension that revolutionizes how you search and discover video content:

Voice-Activated Search: Simply say "Find videos about machine learning optimization" and the system searches across multiple YouTube videos
Intelligent Transcript Analysis: Automatically extracts and processes video transcripts, breaking them into timestamped chunks
Smart Ranking: Uses advanced embedding models to rank transcript segments by relevance to your query
Reinforcement Learning: The system learns from your ratings and feedback, continuously improving its recommendations
Personalized Experience: Gets smarter with every interaction, adapting to your preferences and learning patterns

How we built it

Our system combines multiple cutting-edge technologies:

Backend (Python/Flask):

yt-dlp for YouTube video search and transcript extraction
Sentence Transformers for semantic embedding generation
Multi-armed bandit algorithm (epsilon-greedy) for chunk selection optimization
Self-learning pipeline that fine-tunes embedding models based on user feedback
LLM judge using Ollama + Gemma for automated quality assessment
SQLite database for storing user interactions and feedback
Weights & Biases integration for real-time analytics and monitoring

Frontend (Chrome Extension):

Voice recognition API for natural language queries
Interactive UI with star ratings for user feedback
Real-time display of ranked transcript chunks with timestamps

Reinforcement Learning Components:

Multi-Armed Bandit: Balances exploration of new content vs exploitation of proven relevant segments
Continuous Learning: Embedding model retrains automatically when quality drops
Quality Monitoring: LLM judge triggers learning cycles when performance degrades

Challenges we ran into

1. Personalization: Implementing a system that learns from user feedback while maintaining fast response times required careful optimization of the embedding and ranking pipelines.

2. Quality Assessment: Determining when the system is performing poorly and needs retraining was challenging - we solved this with an automated LLM judge that evaluates search quality.

3. Voice Integration: Getting reliable voice recognition working smoothly within a Chrome extension required handling various edge cases and browser permissions.

4. Intelligent Model Fine-tuning and Deployment: Building a robust system that automatically retrains embedding models and intelligently decides to deploy fine tuned model on a live system is powerful

Accomplishments that we're proud of

🧠 Practical RL Implementation: Successfully deployed reinforcement learning in a real-world application that demonstrably improves with user interaction.

🚀 End-to-End System: Built a complete pipeline from voice input to intelligent video segment recommendations with continuous learning.

📊 Measurable Improvement: The system shows quantifiable improvements in user satisfaction as it learns from feedback over time.

🔬 Novel Architecture: Combined multi-armed bandits, self-learning embeddings, and LLM judges in a cohesive system that adapts to user preferences.

⚡ Performance Optimization: Achieved sub-second response times while processing multiple video transcripts and running complex ML models.

📈 Production Monitoring: Integrated comprehensive analytics with W&B to track system performance, user satisfaction, and learning progress in real-time.

What we learned

Reinforcement Learning in Practice: RL isn't just for games and robotics - it's incredibly powerful for information retrieval and recommendation systems when you have user feedback loops.

Human-in-the-Loop ML: The combination of automated quality assessment (LLM judge) with human feedback creates a robust learning system that prevents degradation.

Exploration vs Exploitation Trade-offs: Balancing trying new content vs showing proven relevant results is crucial for user satisfaction and system learning.

Continuous Learning Challenges: Building systems that learn continuously without catastrophic forgetting requires careful engineering and monitoring.

Voice UI Design: Natural language interfaces need to handle ambiguity gracefully and provide clear feedback to users about what the system understood.

What's next for YouTube Voice Agent

🎯 Advanced RL Algorithms: Implement contextual bandits and deep RL approaches to better understand user preferences and content relationships.

🌐 Multi-Platform Support: Extend beyond YouTube to other video platforms like Vimeo, educational sites, and corporate training platforms.

👥 Collaborative Filtering: Learn from the collective intelligence of all users while maintaining privacy through federated learning approaches.

🧠 Multimodal Understanding: Incorporate visual analysis of video frames alongside transcript analysis for richer content understanding.

📱 Mobile App: Develop native mobile applications with offline capabilities for transcript search and learning.

🏢 Enterprise Integration: Build enterprise features for corporate training, knowledge management, and team learning analytics.

🔬 Research Publication: Document our findings on practical RL applications in information retrieval for the academic community.

🚀 Open Source Community: Release core components as open-source tools to help other developers build learning-enabled search systems.

Note - This is an extension to project I shipped in a previous hackathon

Built With

asr
embedding
finetuning
google
llm
python
rl

Updates

Jaspal Singh started this project — Oct 12, 2025 04:24 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.