Minerva - the edu-youtuber

UI for User Prompt
Job Status UI
These are the frames generated by the AI along with the text and audio.
These are the frames generated by the AI along with the text and audio.
Youtube Agent

Inspiration

Learning is a complex and complicated process but having visual aids help a lot to understand difficult concepts. Education should be accessible to anyone, irrespective of their financial background. In that respect, we aim to make an educational animation generator which offers visual simulation of multiple diverse concepts at costs magnitudes less than offered by animation platforms.

What it does

It provides an easy, clean, elegant interface which anyone can use to get an animation of a concept they wish to learn. For those on a mission to educate the world, it offers an interface where they can enter a prompt and the system periodically creates videos and uploads them to YouTube. We also ensured that the video generated for a user across all his video creations are unique.

How we built it

This system was built as a Python-based backend for automated story-to-video generation. It uses Flask to provide API endpoints for job submission and status checking. Jobs are stored in a MongoDB database (agent_jobs collection) and processed by a worker that generates story scripts, renders graph-based frames, synthesizes narration audio, and composes animated videos. Key technologies include MoviePy (video), NetworkX/Matplotlib (graph rendering), ElevenLabs (TTS), and OpenRouter (script generation). The backend is modular, with clear separation between API, database access, and processing logic, and is configured via environment variables for easy deployment. We used an embedding model with cosine similarity to ensure the uniqueness of the videos generated.

Challenges we ran into

LLM Integration: Inconsistent outputs from OpenRouter API requiring robust validation and retry mechanisms Graph Visualization: Frame inconsistencies and suboptimal node rendering in NetworkX/MoviePy animations, requiring careful tuning of layout algorithms Authentication Flow: YouTube upload OAuth implementation with token management challenges and redirect URI configuration Audio-Visual Sync: Aligning ElevenLabs-generated audio with video frame timing to ensure smooth narration throughout the animation

Accomplishments that we're proud of

End-to-End Automation: Successfully built a complete pipeline from text prompt to published YouTube video without manual intervention Cost-Effective Solution: Created an educational tool that generates professional-quality animations at a fraction of traditional animation costs Robust Architecture: Implemented a scalable worker-based system with job queuing and status tracking using MongoDB Seamless Integration: Successfully integrated multiple complex APIs (OpenRouter, ElevenLabs, YouTube) into a cohesive workflow User-Friendly Interface: Designed an intuitive frontend that makes advanced AI video generation accessible to non-technical users

What we learned

Working with Multiple APIs: Gained experience integrating and managing diverse third-party services with different authentication methods and rate limits Video Processing Pipeline: Learned the intricacies of programmatic video generation, including frame rendering, audio synchronization, and format conversion Asynchronous Job Processing: Understood the importance of worker-based architecture for handling long-running tasks without blocking user interactions OAuth 2.0 Implementation: Deepened our knowledge of OAuth flows, token management, and secure credential storage Graph-Based Visualization: Discovered techniques for creating visually appealing educational diagrams using NetworkX and managing dynamic animations Error Handling at Scale: Learned to build resilient systems that gracefully handle API failures, timeouts, and inconsistent outputs

What's next for Minerva - the edu-youtuber

Enhanced Visualization Styles: Add support for multiple animation types beyond graph-based visuals, including diagrams, illustrations, and whiteboard-style animations Interactive Learning Features: Implement quizzes and interactive elements that can be embedded within videos Multi-Language Support: Expand narration capabilities to support multiple languages using ElevenLabs' multilingual voices Customization Options: Allow users to control animation style, pacing, voice selection, and visual themes Content Optimization: Implement AI-driven analytics to optimize video content based on engagement metrics and learning outcomes Batch Processing: Enable users to generate entire course series or playlists with a single prompt Community Features: Build a platform where educators can share, remix, and improve educational animations created by others Advanced Scheduling: Implement intelligent content calendars for automated channel management with optimal posting times Quality Improvements: Enhance graph rendering with better layouts, smoother transitions, and more professional visual effects

Built With

Updates

Vaibhav Chaudhari started this project — Nov 16, 2025 12:28 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.