Inspiration

3Blue1Brown is an amazing creator whose videos make difficult topics feel intuitive through clear explanations and beautiful visuals. As visual learners, we kept wishing the same experience existed for class notes, slides, and papers, where hard material becomes structured, visual, and easier to absorb.


What it does

ani.mate turns uploaded learning documents (.txt, .md, .pdf, .ppt, .pptx) into a narrated explainer videos through multiple subagents with different roles:

  • Parses the source document into machine-readable text
  • Builds a high-level outline, then detailed segment plans with beat-level narration
  • Leverages Manim, an open-source python library, to generate scene code for each segment with source-grounded context
  • Synthesizes multilingual narration with ElevenLabs
  • Renders segment videos, syncs audio and visuals, and concatenates a final MP4
  • Streams live progress logs to the dashboard and returns segment artifacts plus metadata
  • Lets users choose language and LLM provider (TerpAI, Gemini, or Claude)
  • Saves outputs to a user library with Firebase cloud sync (with local fallback)

How we built it

Backend (Python):
A FastAPI-driven pipeline orchestrates parsing, planning, narration beat generation, Manim code generation, rendering, and ffmpeg post-processing. The planner runs in stages (document outline -> segment plan -> narration beats), and segments are processed in parallel workers. We use retrieval for both source grounding and Manim style/runtime guidance.

Reliability layer:
Generated Manim code is verified with validation and repair loops: syntax checks, static linting, render retries with failure context, and merge safeguards that handle audio/video duration mismatches.

API:
Built with FastAPI + uvicorn:

  • Health checks
  • Upload/render endpoint with language/provider controls
  • Optional NDJSON streaming mode for real-time progress
  • File-serving endpoint for generated artifacts

Frontend:
A Next.js app with an Auth0-protected dashboard for uploads, render controls, live pipeline logs, playback, and downloads. It also supports cloud library management via Firebase (and browser-local fallback when cloud is unavailable).


Challenges we ran into

LLM-generated Manim can be brittle, especially around strict API rules and scene object handling. We addressed this by combining stronger prompts with automatic sanitation, lint checks, and retry-based repair flows.

Audio-visual alignment was another challenge. We moved to beat-structured narration timing and use measured/allocated beat durations to keep animation pacing aligned with spoken narration, then normalize final segment merges with ffmpeg.

Supporting real user workflows (auth, storage, download, cloud fallback) also required careful error handling so rendering stays usable even when external services are partially unavailable.


Accomplishments that we're proud of

  • Built an end-to-end document-to-video system with multimodal subagent orchestration
  • Implemented multi-stage planning that preserves narrative flow across segments
  • Added multilingual generation and selectable LLM backends
  • Improved Manim generation reliability with automated validation, sanitation, and retries
  • Shipped a real dashboard workflow with progress streaming, playback, download, and persistent libraries

What we learned

Inference cost is real, and orchestration quality matters as much as model quality. Strong intermediate structures (outline, segment plan, beats, metadata) made the biggest difference in output consistency and debuggability.

We also learned that robustness work (fallbacks, retries, validation, service failure handling) is what turns a cool demo into a usable product.


What's next for ani.mate

  • Customization: user controls for pacing, depth, style, and audience level
  • Interactivity: embedded quizzes, checkpoints, and recap prompts after segments
  • Editing workflows: let users review and tweak plans/beats before full rendering
  • Scalability: job queueing and better multi-user throughput
  • Accessibility: richer language support, captions/transcripts, and learning-style adaptations

Built With

Share this project:

Updates