Vibe Video: Cinematic Algorithm Director
Tagline: Turn Algorithms into Cinematic Stories.
๐ก Inspiration
We grew up watching creators like 3Blue1Brown, whose visualizations of linear algebra and calculus made us fall in love with mathematics. However, we realized that creating even a five-minute Manim video requires hours if not days of painstaking Python tinkering, camera positioning, and asset management.
We asked: โWhat if we could use Gemini to act as the director, researcher, and animator?โ The inspiration for Vibe Video was to lower the barrier for technical storytelling, ensuring the next generation of mathematical insights isnโt trapped in a whiteboard but visualized with the cinematic "vibe" they deserve.
๐ What it does
Vibe Video is an AI-driven educational production studio that transforms abstract technical concepts into cinematic, mathematical animations. It handles the entire creative lifecycle:
- Deep Research: Automates technical research on complex topics.
- Visual Storyboarding: Translates conceptual knowledge into visual intents.
- Automated Production: Generates frame-by-frame Manim code and professional narrations.
- Intelligent Refinement: A "Magic Box" editor where users can chat with the AI to tweak animations, camera angles, and styles in real-time.
๐๏ธ How we built it
Vibe Video is powered by a modular 9-Phase Asynchronous Pipeline:
- Gemini 3 Pro: Acts as the brain for technical research, storyboarding, and code synthesis.
- Manim Engine: The core mathematical rendering engine.
- Next.js & Glassmorphism: For a premium, state-of-the-art web experience.
- Deepgram & Google Voice: For high-fidelity human-like narration.
- Agentic Critique Loop: A secondary AI process that "watches" log output and rendered frames to perform surgical code repairs.
โก Challenges we ran into
- Mathematical Hallucinations: Early versions often generated invalid Manim syntax. We solved this by creating a "coordinate-aware" prompting framework and a reference knowledge base.
- Video Compilation Complexity: Orchestrating the final assembly of Manim renders, SVG overlays, and dynamic narrations into a seamless MP4. Managing frame rates, encoding profiles, and FFMPEG concat logs was a significant DevOps hurdle.
- Multimodal Context Feeding: Finding the right balance when feeding Gemini massive amounts of research data alongside visual storyboards. If the context window was too "noisy" with non-visual data, the animation quality dropped, requiring a sophisticated "context pruning" strategy.
- Audio-Visual Sync: Synchronizing dynamic narrations with variable-length animations required a precise frame-mapping system based on phoneme-per-second calculations.
- Pipeline Scalability: Handling long-form video renders asynchronously required robust state management to keep the UI snappy while heavy processing happened in the background.
๐ Accomplishments that we're proud of
- The Magic Refinement Box: Successfully building a multimodal interaction layer where Gemini can "reason" about video content and apply surgical edits based on natural language.
- Zero-Shot Reliability: Reaching a state where complex mathematical animations (like Dijkstra's or Fourier Transforms) can be rendered correctly on the first try thanks to our self-critique loop.
- Premium Aesthetic: Building a UI that feels like a high-end cinematic tool rather than a standard dashboard.
๐ What we learned
- Multimodal Reasoning is Essential: Gemini 3โs ability to understand spatial relationships is what makes this project possible. Traditional LLMs fail at animation because they lack "spatial intuition."
- Asynchronous UX is Key: When dealing with 2-5 minute processing times, the user interface must be extremely communicative. We learned to treat the pipeline status as a primary UX feature.
- Agents Over Code: We learned that agentic error correction (AI fixing its own code) is far more flexible than writing thousands of lines of rigid validation logic.
๐ฎ What's next for Vibe Video
- Production-Grade Deployment & Scaling: Transitioning the pipeline from a local/dev environment to a robust cloud infrastructure. This involves overcoming significant deployment challenges and optimizing processing costs on Google Cloud Platform to scale Vibe Video into a full-fledged consumer product.
- Real-time Previews: Moving from full renders to an interactive WebGL-based scene preview for instant feedback.
- Multi-Agent Collaborative Directing: Allowing multiple AI experts (a "Director," "Cinematographer," and "Editor") to collaborate on a single project for even higher fidelity.
- Social Integration: Direct-to-platform exports for YouTube Shorts Instagram Reels etc. optimized with AI-generated captions and trending palettes.
Log in or sign up for Devpost to join the conversation.