ArXiv Animator — by Simple Story

Inspiration

Academic research moves fast — but reading papers doesn't. The average arXiv paper takes 30–60 minutes to properly read, and for researchers outside a specific subfield, dense math and jargon make comprehension even harder. We asked ourselves: what if you could see a paper in under a minute?

We were inspired by 3Blue1Brown's https://www.3blue1brown.com/ work — the idea that motion and visual metaphor can communicate complex ideas more intuitively than prose. We wanted to automate that: take any arXiv link and turn it into a visual story, instantly.


What it does

Simple Story takes an arXiv paper URL and automatically generates two visual outputs:

  1. An animated video — a Manim-rendered animation that visualizes the paper's core concept (architecture diagrams, data flows, mathematical relationships) as a short explainer clip.
  2. An interactive 3D scene — a Three.js visualization of the same concept as an explorable node-graph, rendered directly in the browser.

On top of that, there's a comic strip generator that turns a paper's key ideas into illustrated panels with AI-generated art — a format that makes research feel approachable and shareable.

The whole pipeline is driven by a single input: paste an arXiv link, hit enter, and watch.


How we built it

Backend (FastAPI + Python)

  • Fetches paper metadata and abstract from the arXiv API
  • Sends the title and abstract to an LLM (Amazon Nova 2 via OpenRouter) with a structured prompt
  • The LLM returns a Manim Python script and a Three.js JSON scene config in one shot
  • The Manim script is executed in a subprocess (python -m manim) inside a Docker container based on manimcommunity/manim:stable, which bundles all required system dependencies (cairo, pango, ffmpeg)
  • If the render fails, the error logs are fed back to the LLM for review and revision — up to 3 automated fix cycles
  • Results are streamed back to the frontend via SSE so users see live progress

Frontend (Next.js + Three.js)

  • Input form with animated loading states (Framer Motion)
  • Renders the MP4 video served from the backend's static directory
  • Renders the Three.js scene config as an interactive 3D graph in the browser
  • Comic generation view with panel-by-panel display

Infrastructure

  • Dockerized backend using manimcommunity/manim:stable as the base image — avoids any local system dependency setup
  • Environment-driven config for model selection, timeout, and review cycle count

Challenges we ran into

  • LLM-generated code quality — The model frequently produced Manim scripts with subtle bugs: deprecated API calls (ShowCreation instead of Create), misused VGroup with non-VMobjects, MathTex calls that fail without LaTeX, and off-screen elements. We solved this with a detailed system prompt with explicit rules, and an automated review-and-revise loop where the LLM diagnoses its own execution errors.

  • Structured output reliability — Getting the LLM to return a valid JSON blob containing both a runnable Python script and a Three.js config (with escaped newlines, no comments) required careful prompt engineering and a fallback parser.

  • Streaming progress to the frontend — Manim renders can take 30–60 seconds. We implemented SSE streaming so users see live step updates instead of staring at a blank screen.


Accomplishments that we're proud of

  • End-to-end automation — paste one URL, get a rendered animation. No manual steps, no configuration per paper.
  • Self-healing code generation — the LLM review loop means the system often recovers from its own mistakes without any human intervention.
  • Dual output formats — the same LLM call produces both a 2D animation script and a 3D scene config, giving two complementary views of the same concept.
  • Reproducible Docker setup — anyone can clone the repo and run docker compose up --build with no system dependency hunting.

What we learned

  • Prompt engineering for code generation is as much about constraints as it is about instructions — the most impactful prompt improvements were adding CRITICAL: NEVER use X rules based on observed failure modes.
  • LLMs are surprisingly good at debugging their own code when given the full stderr output — the review loop meaningfully improves success rates.
  • System dependency management is the hidden hard part of shipping ML-adjacent tools. Docker isn't just convenient here — it's necessary for reliability.
  • Streaming responses changes the perceived UX dramatically. A 45-second wait with live status updates feels much shorter than a 45-second spinner.

What's next for ArXiv Animator

  • Better visual quality — fine-tune the Manim prompt for richer, more paper-specific animations (e.g. actual attention heatmaps for transformer papers, loss curve plots for training papers)
  • Multi-section support — currently the animation covers the abstract only; next step is chunking the full paper body and generating multi-scene animations
  • Shareable links — host generated videos with permanent URLs so researchers can share visual summaries on social media or in slide decks
  • Model selection — let users choose between fast/cheap models for a quick sketch vs. slower/smarter models for higher-quality output
  • Browser extension — add a "Visualize" button directly on arxiv.org paper pages

Built With

  • amazon
  • animation
  • fastapi
  • lite
  • manim
  • nextjs
  • nova
  • nova2
  • premier
  • python
  • video
Share this project:

Updates