Jestify

Inspiration

We've all been there: staring at a 50-minute lecture recording at \(2\times\) speed, eyes glazing over a static slideshow about derivatives. Traditional educational content, whether it's lecture recordings, textbook PDFs, or even polished YouTube videos, struggles to compete with the short-form, personality-driven content that dominates how our generation actually consumes information. We asked ourselves: what if learning calculus felt more like watching your favorite creator explain it in their own style? What if LeBron James broke down derivatives using basketball court analogies, or Goku powered through linear algebra like it was a training arc?

What it does

Jestify turns any math or science topic into an animated, character-narrated educational video. Users pick a topic, choose a character (LeBron James, Goku, Peter Griffin, or Alysa), select a difficulty level, and Jestify generates a fully produced video complete with:

Character-voiced narration that teaches using analogies from the character's world (basketball, fighting, etc.)
Synchronized Manim animations where graphs, equations, and visual explanations appear at exactly the moment they're being discussed
Professional-quality voiceover using cloned character voices via Fish Audio

The entire video is generated end-to-end with no human intervention, from script to final render.

How we built it

Jestify is a full-stack application with three major layers:

Frontend -- A Next.js + React app styled with Tailwind CSS, featuring a glass-themed UI with character selection, topic input, difficulty picker, and real-time job status polling.

Backend -- A FastAPI server that handles generation requests and dispatches them to an async pipeline via Celery + Redis. The core of the backend is a two-API-call script generation architecture:

Call 1 (Narration): Claude generates a character-voiced narration script with visual descriptions for each scene, focusing entirely on teaching quality and personality
Call 2 (Code): Claude receives the narration and generates synchronized ManimCE animation code, using the narration as a timing guide so visuals match speech

This separation was a key architectural decision. Having the LLM do both narration writing and code generation in a single call degraded quality in both. Splitting them lets each call focus on one hard problem.

Pipeline -- A Python pipeline that executes in parallel:

Voice synthesis via Fish Audio API with character-specific voice clones
Manim rendering using ManimCE (Community Edition) to produce animation clips for each scene
Compositing via FFmpeg to overlay character sprites, narration audio, and animations
Assembly to stitch all scenes into a final video

The pipeline uses ThreadPoolExecutor for parallel scene rendering and includes an LLM-powered code repair system: if a Manim scene fails to render, Claude analyzes the error and rewrites the code automatically.

Challenges we ran into

ManimCE compatibility was our biggest technical hurdle. The LLM would frequently generate ManimGL syntax (from manimlib import *, ShowCreation, get_graph) instead of ManimCE (from manim import *, Create, plot). We built validation layers that catch these patterns and retry generation, plus runtime monkey-patches that intercept incompatible calls.

LaTeX rendering was another persistent issue. ManimCE's MathTex and Tex classes require a full LaTeX installation (standalone.cls, pdflatex, etc.) which is fragile in containerized environments. We solved this by implementing a __new__-level monkey-patch that silently converts all MathTex/Tex calls into plain Text objects, completely eliminating the LaTeX dependency without changing any generated code.

Narration-to-visual synchronization was the core creative challenge. Early versions had animations that bore no relation to what was being said. The two-call architecture with a visual_description bridge field solved this: Call 1 describes what should appear on screen, and Call 2 uses that description plus the exact narration text to time animations correctly.

Token limits and generation reliability required careful prompt engineering. JSON output from the LLM would sometimes be malformed, truncated, or wrapped in markdown code fences. We built robust parsing with multiple fallback strategies (direct JSON, regex extraction, brace matching) and validation with automatic retry loops.

Accomplishments that we're proud of

End-to-end generation from a text topic to a fully rendered, voiced, animated video with zero human intervention
The two-call architecture that produces noticeably better narration quality and visual synchronization than the single-call approach
LLM-powered self-healing: when generated Manim code fails to render, Claude reads the traceback and fixes the code automatically
The MathTex monkey-patch that elegantly eliminates an entire class of LaTeX-related failures at the Python class level
Parallel pipeline execution that renders all scenes and synthesizes all voice clips simultaneously, reducing total generation time by ~40%

What we learned

Splitting complex LLM tasks into focused, sequential calls produces dramatically better output than asking the model to do everything at once
Prompt engineering for code generation is fundamentally different from prompt engineering for creative writing, and they benefit from different system prompts, temperature settings, and validation strategies
Building reliable systems on top of LLM output requires multiple layers of defense: validation, retry loops, runtime patches, and graceful fallbacks
ManimCE is powerful but has a steep integration curve, especially when the code is AI-generated and needs to run unsupervised

What's next for Jestify

More characters and subjects beyond math, expanding into physics, chemistry, and computer science
User accounts and video history so students can build a personal library of generated lessons
Interactive mode where students can ask follow-up questions and get new scenes generated on the fly
Custom character creation letting users define their own character personas, voices, and analogy domains
Mobile app for on-the-go learning with push notifications when videos finish rendering

Built With

celery
claudeapi
context7api
docker
fastapi
ffmpeg
fishaudio
httpx
javascript
manimce
next.js
pydantic
python
react
redis
tailwind
typescript

Updates

Nakul Rajpal started this project — Feb 22, 2026 08:59 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.