ani.mate

homepage
dashboard
ani-mate logo
late night team bonding
multilingual functionality
attention is all you need

Inspiration

3Blue1Brown is an amazing creator whose videos make difficult topics feel intuitive through clear explanations and beautiful visuals. As visual learners, we kept wishing the same experience existed for class notes, slides, and papers, where hard material becomes structured, visual, and easier to absorb.

What it does

ani.mate turns uploaded learning documents (.txt, .md, .pdf, .ppt, .pptx) into a narrated explainer videos through multiple subagents with different roles:

Parses the source document into machine-readable text
Builds a high-level outline, then detailed segment plans with beat-level narration
Leverages Manim, an open-source python library, to generate scene code for each segment with source-grounded context
Synthesizes multilingual narration with ElevenLabs
Renders segment videos, syncs audio and visuals, and concatenates a final MP4
Streams live progress logs to the dashboard and returns segment artifacts plus metadata
Lets users choose language and LLM provider (TerpAI, Gemini, or Claude)
Saves outputs to a user library with Firebase cloud sync (with local fallback)

How we built it

Backend (Python):
A FastAPI-driven pipeline orchestrates parsing, planning, narration beat generation, Manim code generation, rendering, and ffmpeg post-processing. The planner runs in stages (document outline -> segment plan -> narration beats), and segments are processed in parallel workers. We use retrieval for both source grounding and Manim style/runtime guidance.

Reliability layer:
Generated Manim code is verified with validation and repair loops: syntax checks, static linting, render retries with failure context, and merge safeguards that handle audio/video duration mismatches.

API:
Built with FastAPI + uvicorn:

Health checks
Upload/render endpoint with language/provider controls
Optional NDJSON streaming mode for real-time progress
File-serving endpoint for generated artifacts

Frontend:
A Next.js app with an Auth0-protected dashboard for uploads, render controls, live pipeline logs, playback, and downloads. It also supports cloud library management via Firebase (and browser-local fallback when cloud is unavailable).

Challenges we ran into

LLM-generated Manim can be brittle, especially around strict API rules and scene object handling. We addressed this by combining stronger prompts with automatic sanitation, lint checks, and retry-based repair flows.

Audio-visual alignment was another challenge. We moved to beat-structured narration timing and use measured/allocated beat durations to keep animation pacing aligned with spoken narration, then normalize final segment merges with ffmpeg.

Supporting real user workflows (auth, storage, download, cloud fallback) also required careful error handling so rendering stays usable even when external services are partially unavailable.

Accomplishments that we're proud of

Built an end-to-end document-to-video system with multimodal subagent orchestration
Implemented multi-stage planning that preserves narrative flow across segments
Added multilingual generation and selectable LLM backends
Improved Manim generation reliability with automated validation, sanitation, and retries
Shipped a real dashboard workflow with progress streaming, playback, download, and persistent libraries

What we learned

Inference cost is real, and orchestration quality matters as much as model quality. Strong intermediate structures (outline, segment plan, beats, metadata) made the biggest difference in output consistency and debuggability.

We also learned that robustness work (fallbacks, retries, validation, service failure handling) is what turns a cool demo into a usable product.

What's next for ani.mate

Customization: user controls for pacing, depth, style, and audience level
Interactivity: embedded quizzes, checkpoints, and recap prompts after segments
Editing workflows: let users review and tweak plans/beats before full rendering
Scalability: job queueing and better multi-user throughput
Accessibility: richer language support, captions/transcripts, and learning-style adaptations