Inspiration
We've all been there: staring at a 50-minute lecture recording at \(2\times\) speed, eyes glazing over a static slideshow about derivatives. Traditional educational content, whether it's lecture recordings, textbook PDFs, or even polished YouTube videos, struggles to compete with the short-form, personality-driven content that dominates how our generation actually consumes information. We asked ourselves: what if learning calculus felt more like watching your favorite creator explain it in their own style? What if LeBron James broke down derivatives using basketball court analogies, or Goku powered through linear algebra like it was a training arc?
What it does
Jestify turns any math or science topic into an animated, character-narrated educational video. Users pick a topic, choose a character (LeBron James, Goku, Peter Griffin, or Alysa), select a difficulty level, and Jestify generates a fully produced video complete with:
- Character-voiced narration that teaches using analogies from the character's world (basketball, fighting, etc.)
- Synchronized Manim animations where graphs, equations, and visual explanations appear at exactly the moment they're being discussed
- Professional-quality voiceover using cloned character voices via Fish Audio
The entire video is generated end-to-end with no human intervention, from script to final render.
How we built it
Jestify is a full-stack application with three major layers:
Frontend -- A Next.js + React app styled with Tailwind CSS, featuring a glass-themed UI with character selection, topic input, difficulty picker, and real-time job status polling.
Backend -- A FastAPI server that handles generation requests and dispatches them to an async pipeline via Celery + Redis. The core of the backend is a two-API-call script generation architecture:
- Call 1 (Narration): Claude generates a character-voiced narration script with visual descriptions for each scene, focusing entirely on teaching quality and personality
- Call 2 (Code): Claude receives the narration and generates synchronized ManimCE animation code, using the narration as a timing guide so visuals match speech
This separation was a key architectural decision. Having the LLM do both narration writing and code generation in a single call degraded quality in both. Splitting them lets each call focus on one hard problem.
Pipeline -- A Python pipeline that executes in parallel:
- Voice synthesis via Fish Audio API with character-specific voice clones
- Manim rendering using ManimCE (Community Edition) to produce animation clips for each scene
- Compositing via FFmpeg to overlay character sprites, narration audio, and animations
- Assembly to stitch all scenes into a final video
The pipeline uses ThreadPoolExecutor for parallel scene rendering and includes an LLM-powered code repair system: if a Manim scene fails to render, Claude analyzes the error and rewrites the code automatically.
Challenges we ran into
ManimCE compatibility was our biggest technical hurdle. The LLM would frequently generate ManimGL syntax (from manimlib import *, ShowCreation, get_graph) instead of ManimCE (from manim import *, Create, plot). We built validation layers that catch these patterns and retry generation, plus runtime monkey-patches that intercept incompatible calls.
LaTeX rendering was another persistent issue. ManimCE's MathTex and Tex classes require a full LaTeX installation (standalone.cls, pdflatex, etc.) which is fragile in containerized environments. We solved this by implementing a __new__-level monkey-patch that silently converts all MathTex/Tex calls into plain Text objects, completely eliminating the LaTeX dependency without changing any generated code.
Narration-to-visual synchronization was the core creative challenge. Early versions had animations that bore no relation to what was being said. The two-call architecture with a visual_description bridge field solved this: Call 1 describes what should appear on screen, and Call 2 uses that description plus the exact narration text to time animations correctly.
Token limits and generation reliability required careful prompt engineering. JSON output from the LLM would sometimes be malformed, truncated, or wrapped in markdown code fences. We built robust parsing with multiple fallback strategies (direct JSON, regex extraction, brace matching) and validation with automatic retry loops.
Accomplishments that we're proud of
- End-to-end generation from a text topic to a fully rendered, voiced, animated video with zero human intervention
- The two-call architecture that produces noticeably better narration quality and visual synchronization than the single-call approach
- LLM-powered self-healing: when generated Manim code fails to render, Claude reads the traceback and fixes the code automatically
- The MathTex monkey-patch that elegantly eliminates an entire class of LaTeX-related failures at the Python class level
- Parallel pipeline execution that renders all scenes and synthesizes all voice clips simultaneously, reducing total generation time by ~40%
What we learned
- Splitting complex LLM tasks into focused, sequential calls produces dramatically better output than asking the model to do everything at once
- Prompt engineering for code generation is fundamentally different from prompt engineering for creative writing, and they benefit from different system prompts, temperature settings, and validation strategies
- Building reliable systems on top of LLM output requires multiple layers of defense: validation, retry loops, runtime patches, and graceful fallbacks
- ManimCE is powerful but has a steep integration curve, especially when the code is AI-generated and needs to run unsupervised
What's next for Jestify
- More characters and subjects beyond math, expanding into physics, chemistry, and computer science
- User accounts and video history so students can build a personal library of generated lessons
- Interactive mode where students can ask follow-up questions and get new scenes generated on the fly
- Custom character creation letting users define their own character personas, voices, and analogy domains
- Mobile app for on-the-go learning with push notifications when videos finish rendering
Built With
- celery
- claudeapi
- context7api
- docker
- fastapi
- ffmpeg
- fishaudio
- httpx
- javascript
- manimce
- next.js
- pydantic
- python
- react
- redis
- tailwind
- typescript
Log in or sign up for Devpost to join the conversation.