ArXiv Animator — by Simple Story

Inspiration

Academic research moves fast — but reading papers doesn't. The average arXiv paper takes 30–60 minutes to properly read, and for researchers outside a specific subfield, dense math and jargon make comprehension even harder. We asked ourselves: what if you could see a paper in under a minute?

We were inspired by 3Blue1Brown's https://www.3blue1brown.com/ work — the idea that motion and visual metaphor can communicate complex ideas more intuitively than prose. We wanted to automate that: take any arXiv link and turn it into a visual story, instantly.

What it does

Simple Story takes an arXiv paper URL and automatically generates two visual outputs:

An animated video — a Manim-rendered animation that visualizes the paper's core concept (architecture diagrams, data flows, mathematical relationships) as a short explainer clip.
An interactive 3D scene — a Three.js visualization of the same concept as an explorable node-graph, rendered directly in the browser.

On top of that, there's a comic strip generator that turns a paper's key ideas into illustrated panels with AI-generated art — a format that makes research feel approachable and shareable.

The whole pipeline is driven by a single input: paste an arXiv link, hit enter, and watch.

How we built it

Backend (FastAPI + Python)

Fetches paper metadata and abstract from the arXiv API
Sends the title and abstract to an LLM (Amazon Nova 2 via OpenRouter) with a structured prompt
The LLM returns a Manim Python script and a Three.js JSON scene config in one shot
The Manim script is executed in a subprocess (python -m manim) inside a Docker container based on manimcommunity/manim:stable, which bundles all required system dependencies (cairo, pango, ffmpeg)
If the render fails, the error logs are fed back to the LLM for review and revision — up to 3 automated fix cycles
Results are streamed back to the frontend via SSE so users see live progress

Frontend (Next.js + Three.js)

Input form with animated loading states (Framer Motion)
Renders the MP4 video served from the backend's static directory
Renders the Three.js scene config as an interactive 3D graph in the browser
Comic generation view with panel-by-panel display

Infrastructure

Dockerized backend using manimcommunity/manim:stable as the base image — avoids any local system dependency setup
Environment-driven config for model selection, timeout, and review cycle count

Challenges we ran into

LLM-generated code quality — The model frequently produced Manim scripts with subtle bugs: deprecated API calls (ShowCreation instead of Create), misused VGroup with non-VMobjects, MathTex calls that fail without LaTeX, and off-screen elements. We solved this with a detailed system prompt with explicit rules, and an automated review-and-revise loop where the LLM diagnoses its own execution errors.
Structured output reliability — Getting the LLM to return a valid JSON blob containing both a runnable Python script and a Three.js config (with escaped newlines, no comments) required careful prompt engineering and a fallback parser.
Streaming progress to the frontend — Manim renders can take 30–60 seconds. We implemented SSE streaming so users see live step updates instead of staring at a blank screen.

Accomplishments that we're proud of

End-to-end automation — paste one URL, get a rendered animation. No manual steps, no configuration per paper.
Self-healing code generation — the LLM review loop means the system often recovers from its own mistakes without any human intervention.
Dual output formats — the same LLM call produces both a 2D animation script and a 3D scene config, giving two complementary views of the same concept.
Reproducible Docker setup — anyone can clone the repo and run docker compose up --build with no system dependency hunting.

What we learned

Prompt engineering for code generation is as much about constraints as it is about instructions — the most impactful prompt improvements were adding CRITICAL: NEVER use X rules based on observed failure modes.
LLMs are surprisingly good at debugging their own code when given the full stderr output — the review loop meaningfully improves success rates.
System dependency management is the hidden hard part of shipping ML-adjacent tools. Docker isn't just convenient here — it's necessary for reliability.
Streaming responses changes the perceived UX dramatically. A 45-second wait with live status updates feels much shorter than a 45-second spinner.

What's next for ArXiv Animator

Better visual quality — fine-tune the Manim prompt for richer, more paper-specific animations (e.g. actual attention heatmaps for transformer papers, loss curve plots for training papers)
Multi-section support — currently the animation covers the abstract only; next step is chunking the full paper body and generating multi-scene animations
Shareable links — host generated videos with permanent URLs so researchers can share visual summaries on social media or in slide decks
Model selection — let users choose between fast/cheap models for a quick sketch vs. slower/smarter models for higher-quality output
Browser extension — add a "Visualize" button directly on arxiv.org paper pages

Built With

amazon
animation
fastapi
lite
manim
nextjs
nova
nova2
premier
python
video

Updates

Albert Lie started this project — Mar 16, 2026 04:54 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.