AnimSpec - Turn video insights into agent ready prompts

project-architecture
Landing page hero
Main dashboard
Analysis options
Coding prompt generated from video

Inspiration

AI coding agents like Claude Code, Cursor, and Codex are transforming how developers build software. But there's a critical gap: these agents can't process video. When a designer hands a developer a screen recording of an animation and says "build this," the developer must manually watch the video, mentally decompose every motion, timing curve, and state change, then translate that into precise text instructions for their AI coding agent.

This manual translation is:

Slow - A 3-second animation can take 15-30 minutes to describe accurately
Error-prone - Human descriptions miss subtle easing curves, stagger patterns, and micro-interactions
Inconsistent - Two developers describe the same animation differently, producing different results

We built AnimSpec because we realized that Gemini 3's multimodal reasoning engine - capable of seeing, understanding temporal sequences, and reasoning deeply about visual patterns - is the perfect solution to this problem.

What it does

AnimSpec converts video animations into structured, implementable specifications that AI coding agents can directly consume. Upload any screen recording of a UI animation, and AnimSpec produces:

15 Output Formats across 4 categories:

Clone - Recreate what you see:

Clone UI Animation (CSS keyframes with timing)
Clone UI Component (React + Tailwind reproduction)
Clone Landing Page (full page layout)

Extract - Pull design assets & specs:

Copy Design Style (reusable CSS style guide)
Extract Design Tokens (colors, typography, spacing)
Figma Motion Spec (Smart Animate properties)

Export - Framework-specific code:

Remotion Demo Template (video component)
Tailwind Animate Config (custom keyframes)
React Native Reanimated (mobile animations)
Lottie/Rive Export (motion graphics data)
Interaction State Machine (XState definitions)

Audit - Quality & compliance:

QA Clone Checklist (acceptance criteria)
Accessibility Audit (WCAG + prefers-reduced-motion)
Performance Budget (GPU layer analysis, 60fps)
Storyboard Breakdown (frame-by-frame)

Two Analysis Modes:

Standard Mode - Single-pass analysis for quick results
Agentic Mode - A 4-pass autonomous pipeline:
- Pass 1: Scene Decomposition (identify all animated elements)
- Pass 2: Deep Motion Analysis (timing, easing, subtle movements)
- Pass 3: Code Generation (in your chosen format)
- Pass 4: Self-Verification (compare output vs. original video, score 0-100)

How we built it

Architecture (see diagram in project media attached)

AnimSpec is a full-stack Next.js 15 application with a serverless architecture:

Video Upload → Client-side FFmpeg WASM (keyframe extraction)
            → Size-based routing (inline / Gemini Files API)
            → Gemini 3 Analysis (streaming SSE)
            → Real-time output display

Gemini 3 Integration (Core)

AnimSpec is built entirely around the Gemini 3 API. It leverages several key Gemini 3 capabilities:

1. Gemini 3 Thinking Mode (thinkingLevel: 'high')

Both gemini-3-flash-preview and gemini-3-pro-preview are used with thinking mode enabled. This is critical for animation analysis because the model needs to reason about:

Temporal sequences (what happens at which timestamp)
Spatial relationships (which elements move relative to others)
Easing curves (is it ease-in, ease-out, or a custom bezier?)
Stagger patterns (are elements animating sequentially or in parallel?)

The thinking traces are surfaced in the UI so users can see the model's reasoning process in real-time.

2. Multimodal Video Understanding

Using @google/genai SDK, we send video content directly to Gemini 3 via:

Inline base64 for videos under 4MB
Gemini Files API (fileUri) for videos up to 100MB

The model processes the actual video frames — not just screenshots — enabling it to understand motion, timing, and transitions that static images cannot capture.

3. Multi-Pass Agentic Pipeline

Our agentic mode runs 4 sequential Gemini 3 calls, each building on the previous pass's output:

Pass	Model	Purpose
1	`gemini-3-flash-preview`	Scene decomposition — fast structural analysis
2	`gemini-3-pro-preview`	Deep motion analysis — flagship reasoning for subtle details
3	`gemini-3-pro-preview`	Code generation — precise implementation
4	`gemini-3-flash-preview`	Self-verification — compare output against original video

This mirrors the "Marathon Agent" strategic track from the hackathon — an autonomous system that maintains continuity across multi-step reasoning without human supervision.

4. Streaming with Thinking Traces

Responses are streamed via Server-Sent Events (SSE). For each chunk, we parse both the model's text output and its thinking traces, displaying them in parallel in the UI. Users can watch the model reason through complex animations in real-time.

5. Client-Side Frame Grid (Context Enrichment)

Using FFmpeg compiled to WebAssembly, we extract up to 24 keyframes from the video client-side and arrange them in a labeled grid. This grid is sent alongside the video to Gemini 3 as additional visual context, enabling more precise spatial-temporal analysis.

Tech Stack

Layer	Technology
Frontend	React 19, Next.js 15, Tailwind CSS 4
AI Engine	Google Gemini 3 API (`@google/genai` SDK)
Video Processing	FFmpeg.wasm (client-side keyframe extraction)
Auth	Firebase Authentication (Email + Google OAuth)
Database	Firebase Firestore
Storage	Firebase Storage + Cloudflare R2
Payments	Lemon Squeezy
Hosting	Vercel (serverless)

Code Quality

Full TypeScript across the entire codebase
15 specialized prompt templates for each output format
Atomic credit transactions via Firestore
Real-time SSE streaming with error recovery
Security headers (COOP/COEP for WASM, CSP)

Gemini 3 Integration — Detailed Write-up (~200 words)

AnimSpec is built entirely on the Gemini 3 API, leveraging its multimodal video understanding and extended thinking capabilities. The application uses two Gemini 3 models:

gemini-3-flash-preview (balanced quality) — Fast, intelligent analysis with thinking mode for structural decomposition and verification passes
gemini-3-pro-preview (precise quality) — Flagship reasoning for deep motion analysis and code generation where accuracy is paramount

Key Gemini 3 features used:

Thinking Mode (thinkingConfig: { thinkingLevel: 'high' }) — Enables extended reasoning, critical for decomposing complex animation sequences into precise timing, easing, and spatial relationships. Thinking traces are streamed to users in real-time.
Video Understanding — Native video input processing (inline base64 and Files API) allows the model to analyze actual motion — not static screenshots. This is fundamental to detecting easing curves, stagger patterns, and micro-interactions.
Files API — Handles videos up to 100MB with automatic state polling until processing completes.
Multi-Pass Agentic Pipeline — 4 sequential Gemini 3 calls (decomposition → analysis → generation → verification) that maintain context continuity, with model selection optimized per pass (Flash for structural tasks, Pro for deep reasoning).

Gemini 3 is not an add-on — it IS the product. Without Gemini 3's multimodal reasoning, video-to-code translation at this fidelity would not be possible.

Challenges we ran into

Video Size Limits — Gemini's inline data limit required us to build a tiered upload system: inline base64 for small files, Gemini Files API for larger ones, with automatic state polling for processing completion.
Prompt Engineering for 15 Formats — Each output format required its own specialized prompt template. Getting the model to produce syntactically valid CSS keyframes vs. React Native Reanimated code vs. Lottie JSON required extensive iteration on format-specific instructions.
Streaming Thinking Traces — Parsing thinking traces from the SSE stream alongside text content required careful handling to separate reasoning from output and display both simultaneously.
FFmpeg WASM in Next.js — Required COOP/COEP headers for SharedArrayBuffer support, custom CSP rules for WASM execution, and careful lazy-loading to avoid blocking initial page render.
Agentic Pipeline Context Management — Feeding previous pass outputs as context into subsequent Gemini 3 calls while staying within token limits required careful prompt construction.

Accomplishments that we're proud of

15 output formats — From CSS keyframes to React Native to Lottie JSON to accessibility audits, covering the full spectrum of animation implementation needs
Agentic self-verification — The 4-pass pipeline includes a verification step where Gemini 3 compares its own generated code against the original video and scores accuracy 0-100
Real-time thinking traces — Users can watch Gemini 3 reason through complex animations, making the AI's decision-making transparent
Production-ready SaaS — Complete with authentication, credit system, payment processing, and analysis history
Client-side video processing — FFmpeg WASM extracts keyframes entirely in the browser, reducing server load and enabling richer context for Gemini 3

What we learned

Gemini 3's thinking mode is transformative for structured output — Enabling thinkingLevel: 'high' dramatically improved the quality of animation specifications, especially for complex multi-element sequences with staggered timing
Video beats screenshots for motion understanding — Sending actual video to Gemini 3 (rather than frame screenshots) produced significantly better results for detecting easing curves and timing relationships
Multi-pass pipelines need careful model selection — Using Flash for structural tasks and Pro for deep analysis optimizes both cost and quality
Frame grids as supplementary context — Providing a labeled frame grid alongside the video helped the model anchor its spatial-temporal reasoning

What's next for AnimSpec

Batch processing — Analyze multiple animations from a single video upload
Direct framework integration — One-click export to Figma, Rive, and After Effects
Custom model fine-tuning — Train on common UI animation patterns for even higher accuracy
Team collaboration — Share analyses, annotate specifications, and track implementation progress
Animation diff — Compare two animation videos and highlight differences

Built With

cloudflarer2
ffmpeg.wasm
firebase
firestore
gemini-3
gemini-3-pro-preview)-`@google/genai`-sdk-next.js-15-react-19-typescript-tailwind-css-4-firebase-(auth
google-genai-sdk
lemonsqueezy
next.js15
react19
tailwindcss-4
typescript
vercel

Updates

Sanket Dongre started this project — Feb 09, 2026 04:19 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.