Inspiration

Reading dense academic papers is tough—especially when the visuals that make concepts click are missing or static. I wanted a way to turn those PDFs into lively explainers that combine clear narration, interactive 2D/3D visuals, and the pacing of a human instructor. That sparked Paper Animate Studio.

What it does

  • Parses a research paper PDF, extracts text, and sends it to Google Gemini 2.0 Flash for structured analysis.
  • Produces synchronized narration segments, animation steps, and emphasis tags that stay perfectly aligned.
  • Renders rich scenes across CSS, D3, KaTeX, and React Three Fiber, with automatic fallback geometry so every concept is visualized.
  • Provides an interactive player with captions, timeline scrubber, section regeneration, and narration style controls.

How I built it

  • Frontend: Next.js 16 + React 19 with Tailwind v4 utility classes, Framer Motion transitions, and GSAP easing curves.
  • 3D pipeline: React Three Fiber & Drei plus a custom ensureThreeVisuals normalizer to clean Gemini output and synthesize camera tracks.
  • Backend routes: /api/analyze handles PDF uploads via formData, rebuilds scripts, and returns a coherent presentation object. /api/generate-animation regenerates code or narration on demand.
  • AI orchestration: Gemini requests run through a retry wrapper, narration/step alignment logic, and scene enrichment utilities to keep content resilient.

Challenges

  • Gemini sometimes returned sparse or inconsistent data, so I wrote normalization layers that rebuild narration segments, deduplicate elements, and fill gaps with smart defaults.
  • 3D scenes would occasionally be empty; generating fallback geometries and camera motion ensured the Three.js renderer always has something meaningful to show.
  • Keeping narration timing in sync with animated steps required recalculating durations from word counts and mirroring those values across the timeline UI.

What I learned

  • Strong guardrails around AI output (schema validation, fallbacks, retries) are essential for production stability.
  • React Three Fiber becomes far more approachable with helper abstractions for geometry presets, easing, and camera choreography.
  • A clear separation between analysis data, narration scripts, and renderer components made it easy to iterate without breaking the player.
  • Deployment hygiene matters: surfacing missing GEMINI_API_KEY before making a request saved a lot of debugging time.

Next steps

  • Add integration tests that cover the full PDF → narration → playback loop.
  • Offer optional MP4 exports by pairing our narration audio with rendered frames.
  • Layer in analytics to track regeneration requests, Gemini error rates, and viewer engagement.

Built With

  • gemini3
  • nextjs
Share this project:

Updates