Inspiration
Reading dense academic papers is tough—especially when the visuals that make concepts click are missing or static. I wanted a way to turn those PDFs into lively explainers that combine clear narration, interactive 2D/3D visuals, and the pacing of a human instructor. That sparked Paper Animate Studio.
What it does
- Parses a research paper PDF, extracts text, and sends it to Google Gemini 2.0 Flash for structured analysis.
- Produces synchronized narration segments, animation steps, and emphasis tags that stay perfectly aligned.
- Renders rich scenes across CSS, D3, KaTeX, and React Three Fiber, with automatic fallback geometry so every concept is visualized.
- Provides an interactive player with captions, timeline scrubber, section regeneration, and narration style controls.
How I built it
- Frontend: Next.js 16 + React 19 with Tailwind v4 utility classes, Framer Motion transitions, and GSAP easing curves.
- 3D pipeline: React Three Fiber & Drei plus a custom
ensureThreeVisualsnormalizer to clean Gemini output and synthesize camera tracks. - Backend routes:
/api/analyzehandles PDF uploads viaformData, rebuilds scripts, and returns a coherent presentation object./api/generate-animationregenerates code or narration on demand. - AI orchestration: Gemini requests run through a retry wrapper, narration/step alignment logic, and scene enrichment utilities to keep content resilient.
Challenges
- Gemini sometimes returned sparse or inconsistent data, so I wrote normalization layers that rebuild narration segments, deduplicate elements, and fill gaps with smart defaults.
- 3D scenes would occasionally be empty; generating fallback geometries and camera motion ensured the Three.js renderer always has something meaningful to show.
- Keeping narration timing in sync with animated steps required recalculating durations from word counts and mirroring those values across the timeline UI.
What I learned
- Strong guardrails around AI output (schema validation, fallbacks, retries) are essential for production stability.
- React Three Fiber becomes far more approachable with helper abstractions for geometry presets, easing, and camera choreography.
- A clear separation between analysis data, narration scripts, and renderer components made it easy to iterate without breaking the player.
- Deployment hygiene matters: surfacing missing
GEMINI_API_KEYbefore making a request saved a lot of debugging time.
Next steps
- Add integration tests that cover the full PDF → narration → playback loop.
- Offer optional MP4 exports by pairing our narration audio with rendered frames.
- Layer in analytics to track regeneration requests, Gemini error rates, and viewer engagement.
Built With
- gemini3
- nextjs
Log in or sign up for Devpost to join the conversation.