Inspiration
AI coding agents like Claude Code, Cursor, and Codex are transforming how developers build software. But there's a critical gap: these agents can't process video. When a designer hands a developer a screen recording of an animation and says "build this," the developer must manually watch the video, mentally decompose every motion, timing curve, and state change, then translate that into precise text instructions for their AI coding agent.
This manual translation is:
- Slow - A 3-second animation can take 15-30 minutes to describe accurately
- Error-prone - Human descriptions miss subtle easing curves, stagger patterns, and micro-interactions
- Inconsistent - Two developers describe the same animation differently, producing different results
We built AnimSpec because we realized that Gemini 3's multimodal reasoning engine - capable of seeing, understanding temporal sequences, and reasoning deeply about visual patterns - is the perfect solution to this problem.
What it does
AnimSpec converts video animations into structured, implementable specifications that AI coding agents can directly consume. Upload any screen recording of a UI animation, and AnimSpec produces:
15 Output Formats across 4 categories:
Clone - Recreate what you see:
- Clone UI Animation (CSS keyframes with timing)
- Clone UI Component (React + Tailwind reproduction)
- Clone Landing Page (full page layout)
Extract - Pull design assets & specs:
- Copy Design Style (reusable CSS style guide)
- Extract Design Tokens (colors, typography, spacing)
- Figma Motion Spec (Smart Animate properties)
Export - Framework-specific code:
- Remotion Demo Template (video component)
- Tailwind Animate Config (custom keyframes)
- React Native Reanimated (mobile animations)
- Lottie/Rive Export (motion graphics data)
- Interaction State Machine (XState definitions)
Audit - Quality & compliance:
- QA Clone Checklist (acceptance criteria)
- Accessibility Audit (WCAG + prefers-reduced-motion)
- Performance Budget (GPU layer analysis, 60fps)
- Storyboard Breakdown (frame-by-frame)
Two Analysis Modes:
- Standard Mode - Single-pass analysis for quick results
- Agentic Mode - A 4-pass autonomous pipeline:
- Pass 1: Scene Decomposition (identify all animated elements)
- Pass 2: Deep Motion Analysis (timing, easing, subtle movements)
- Pass 3: Code Generation (in your chosen format)
- Pass 4: Self-Verification (compare output vs. original video, score 0-100)
How we built it
Architecture (see diagram in project media attached)
AnimSpec is a full-stack Next.js 15 application with a serverless architecture:
Video Upload → Client-side FFmpeg WASM (keyframe extraction)
→ Size-based routing (inline / Gemini Files API)
→ Gemini 3 Analysis (streaming SSE)
→ Real-time output display
Gemini 3 Integration (Core)
AnimSpec is built entirely around the Gemini 3 API. It leverages several key Gemini 3 capabilities:
1. Gemini 3 Thinking Mode (thinkingLevel: 'high')
Both gemini-3-flash-preview and gemini-3-pro-preview are used with thinking mode enabled. This is critical for animation analysis because the model needs to reason about:
- Temporal sequences (what happens at which timestamp)
- Spatial relationships (which elements move relative to others)
- Easing curves (is it ease-in, ease-out, or a custom bezier?)
- Stagger patterns (are elements animating sequentially or in parallel?)
The thinking traces are surfaced in the UI so users can see the model's reasoning process in real-time.
2. Multimodal Video Understanding
Using @google/genai SDK, we send video content directly to Gemini 3 via:
- Inline base64 for videos under 4MB
- Gemini Files API (
fileUri) for videos up to 100MB
The model processes the actual video frames — not just screenshots — enabling it to understand motion, timing, and transitions that static images cannot capture.
3. Multi-Pass Agentic Pipeline
Our agentic mode runs 4 sequential Gemini 3 calls, each building on the previous pass's output:
| Pass | Model | Purpose |
|---|---|---|
| 1 | gemini-3-flash-preview |
Scene decomposition — fast structural analysis |
| 2 | gemini-3-pro-preview |
Deep motion analysis — flagship reasoning for subtle details |
| 3 | gemini-3-pro-preview |
Code generation — precise implementation |
| 4 | gemini-3-flash-preview |
Self-verification — compare output against original video |
This mirrors the "Marathon Agent" strategic track from the hackathon — an autonomous system that maintains continuity across multi-step reasoning without human supervision.
4. Streaming with Thinking Traces
Responses are streamed via Server-Sent Events (SSE). For each chunk, we parse both the model's text output and its thinking traces, displaying them in parallel in the UI. Users can watch the model reason through complex animations in real-time.
5. Client-Side Frame Grid (Context Enrichment)
Using FFmpeg compiled to WebAssembly, we extract up to 24 keyframes from the video client-side and arrange them in a labeled grid. This grid is sent alongside the video to Gemini 3 as additional visual context, enabling more precise spatial-temporal analysis.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 19, Next.js 15, Tailwind CSS 4 |
| AI Engine | Google Gemini 3 API (@google/genai SDK) |
| Video Processing | FFmpeg.wasm (client-side keyframe extraction) |
| Auth | Firebase Authentication (Email + Google OAuth) |
| Database | Firebase Firestore |
| Storage | Firebase Storage + Cloudflare R2 |
| Payments | Lemon Squeezy |
| Hosting | Vercel (serverless) |
Code Quality
- Full TypeScript across the entire codebase
- 15 specialized prompt templates for each output format
- Atomic credit transactions via Firestore
- Real-time SSE streaming with error recovery
- Security headers (COOP/COEP for WASM, CSP)
Gemini 3 Integration — Detailed Write-up (~200 words)
AnimSpec is built entirely on the Gemini 3 API, leveraging its multimodal video understanding and extended thinking capabilities. The application uses two Gemini 3 models:
gemini-3-flash-preview(balanced quality) — Fast, intelligent analysis with thinking mode for structural decomposition and verification passesgemini-3-pro-preview(precise quality) — Flagship reasoning for deep motion analysis and code generation where accuracy is paramount
Key Gemini 3 features used:
Thinking Mode (
thinkingConfig: { thinkingLevel: 'high' }) — Enables extended reasoning, critical for decomposing complex animation sequences into precise timing, easing, and spatial relationships. Thinking traces are streamed to users in real-time.Video Understanding — Native video input processing (inline base64 and Files API) allows the model to analyze actual motion — not static screenshots. This is fundamental to detecting easing curves, stagger patterns, and micro-interactions.
Files API — Handles videos up to 100MB with automatic state polling until processing completes.
Multi-Pass Agentic Pipeline — 4 sequential Gemini 3 calls (decomposition → analysis → generation → verification) that maintain context continuity, with model selection optimized per pass (Flash for structural tasks, Pro for deep reasoning).
Gemini 3 is not an add-on — it IS the product. Without Gemini 3's multimodal reasoning, video-to-code translation at this fidelity would not be possible.
Challenges we ran into
Video Size Limits — Gemini's inline data limit required us to build a tiered upload system: inline base64 for small files, Gemini Files API for larger ones, with automatic state polling for processing completion.
Prompt Engineering for 15 Formats — Each output format required its own specialized prompt template. Getting the model to produce syntactically valid CSS keyframes vs. React Native Reanimated code vs. Lottie JSON required extensive iteration on format-specific instructions.
Streaming Thinking Traces — Parsing thinking traces from the SSE stream alongside text content required careful handling to separate reasoning from output and display both simultaneously.
FFmpeg WASM in Next.js — Required COOP/COEP headers for SharedArrayBuffer support, custom CSP rules for WASM execution, and careful lazy-loading to avoid blocking initial page render.
Agentic Pipeline Context Management — Feeding previous pass outputs as context into subsequent Gemini 3 calls while staying within token limits required careful prompt construction.
Accomplishments that we're proud of
- 15 output formats — From CSS keyframes to React Native to Lottie JSON to accessibility audits, covering the full spectrum of animation implementation needs
- Agentic self-verification — The 4-pass pipeline includes a verification step where Gemini 3 compares its own generated code against the original video and scores accuracy 0-100
- Real-time thinking traces — Users can watch Gemini 3 reason through complex animations, making the AI's decision-making transparent
- Production-ready SaaS — Complete with authentication, credit system, payment processing, and analysis history
- Client-side video processing — FFmpeg WASM extracts keyframes entirely in the browser, reducing server load and enabling richer context for Gemini 3
What we learned
- Gemini 3's thinking mode is transformative for structured output — Enabling
thinkingLevel: 'high'dramatically improved the quality of animation specifications, especially for complex multi-element sequences with staggered timing - Video beats screenshots for motion understanding — Sending actual video to Gemini 3 (rather than frame screenshots) produced significantly better results for detecting easing curves and timing relationships
- Multi-pass pipelines need careful model selection — Using Flash for structural tasks and Pro for deep analysis optimizes both cost and quality
- Frame grids as supplementary context — Providing a labeled frame grid alongside the video helped the model anchor its spatial-temporal reasoning
What's next for AnimSpec
- Batch processing — Analyze multiple animations from a single video upload
- Direct framework integration — One-click export to Figma, Rive, and After Effects
- Custom model fine-tuning — Train on common UI animation patterns for even higher accuracy
- Team collaboration — Share analyses, annotate specifications, and track implementation progress
- Animation diff — Compare two animation videos and highlight differences
Built With
- cloudflarer2
- ffmpeg.wasm
- firebase
- firestore
- gemini-3
- gemini-3-pro-preview)-`@google/genai`-sdk-next.js-15-react-19-typescript-tailwind-css-4-firebase-(auth
- google-genai-sdk
- lemonsqueezy
- next.js15
- react19
- tailwindcss-4
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.