Scenery - AI Video Generation from GitHub Repos

Inspiration

$4.2 billion. That's how much SaaS companies waste annually re-recording product videos that are outdated within weeks.

The math is simple and brutal:

26 million developers build web applications globally
3.5 million SaaS products are actively maintained
Average product ships 47 UI updates per year
Each product video takes 4-8 hours to re-record
92% of companies report their demo videos don't match their current product

That's not a workflow problem. That's an industry-wide crisis hiding in plain sight.

And it's accelerating—fast.

AI has compressed the build cycle from months to days. Gemini in Google AI Studio generates full-stack code in seconds. Claude Code ships features autonomously. Cursor, v0, Bolt, Lovable, Replit Agent—developers are building faster than ever before. A solo founder can ship a complete SaaS in a weekend. Teams deploy daily. Some deploy hourly.

But marketing can't keep up.

Every product video becomes a lie by the next sprint. Every demo is showing yesterday's UI. Every screenshot is from three versions ago. Companies are spending millions on marketing assets that actively mislead customers.

I watched a founder lose a $50,000 investment because his demo showed last week's UI. Eight hours of perfect recording, wasted. He lost to a competitor with an inferior product—but accurate screenshots.

This happens thousands of times per day. The same AI tools that let you build faster are making your marketing obsolete.

No solution exists. Loom can't auto-update recordings. Screen Studio can't sync with your codebase. Traditional video editing treats footage as static files—completely disconnected from the living product it represents.

Developers version-control code. Designers version-control Figma. But marketing videos? Frozen in time. Rotting in Google Drive. Lying to customers.

I asked: What if Gemini's multimodal intelligence could bridge this gap? What if videos were computed artifacts—generated directly from your codebase and automatically updated when you ship?

What if your React components weren't just UI, but the single source of truth for every product video?

That's Scenery. The world's first code-connected video platform—powered by 9 Gemini integrations.

Before Scenery	After Scenery
Record video (8 hrs)	Connect GitHub repo
Ship update	Gemini generates video
Video is now wrong	Ship update
Re-record (8 hrs)	Video auto-updates
Repeat forever	Never re-record again

For 3.5 million SaaS products shipping 47 updates per year, that's 650 million hours of re-recording eliminated.

Gemini lets you build in days. Scenery—built on Gemini 3—ensures your marketing keeps up.

What it does

Scenery is an AI-powered video generation platform for React applications. You sign in with GitHub, connect a repo, and Scenery:

Clones and analyzes your codebase to discover React components
Renders live previews in a real browser (Playwright) with 4-attempt recovery
Generates professional product videos using a 4-agent Gemini orchestration system
Auto-updates videos when your code changes (videos are code-connected)

The core problem it solves: Product videos go stale the moment you ship an update. Traditional video editing means re-recording and re-exporting every time your UI changes. Scenery keeps videos synchronized with your actual components.

8 Gemini Integrations

Every integration uses structured output with JSON schemas—no prompt-and-pray.

#	Integration	Model	Purpose
1	Component Categorization	Gemini 3 Pro	Classifies components into 27 UI categories
2	Demo Props Generation	Gemini 3 Pro	Generates realistic props from TypeScript interfaces
3	Server→Client Transform	Gemini 3 Pro	Converts async Server Components to client-safe code
4	Tailwind→Inline CSS	Gemini 2.0 Flash	Converts Tailwind classes to inline styles
5	AI Preview Fallback	Gemini 3 Pro	Generates HTML when bundling fails (5000 thinking tokens)
6	Director Agent	Gemini 3 Pro	Plans video narrative with function calling
7	Scene Planner Agent	Gemini 3 Pro	Designs positions, animations, timing (parallel)
8	Refinement Agent	Gemini 3 Pro	Scores quality 0-100, iterates until ≥90
9	TTS Voiceover	Gemini 2.5 Flash TTS	Generates audio with 5 voice options

Features

GitHub Integration

Sign in with GitHub OAuth
Paste any public GitHub repo URL
Auto-clones and parses TypeScript/JSX components
Extracts props, types, and interactive elements
Supports Next.js 13/14/15 with App Router
Sync button to pull latest changes

Component Discovery Pipeline

Scanner — Globs .tsx/.jsx files, ignores node_modules/tests/builds
Parser — react-docgen-typescript extracts props (regex fallback for async components)
Analyzer — Gemini categorizes into 27 UI categories with video showcase strategy
Preview — 3-tier rendering: Playwright (95%) → SSR (70%) → AI fallback (50%)

Server Component Detection (263 regex patterns)

Detects and transforms Next.js Server Components across 15 categories:

Category	Examples	Count
Async	`export default async function`, `await`	4
Database ORMs	Prisma, Drizzle, Mongoose, Supabase, pg, mysql2	21
Auth Libraries	next-auth, @clerk/nextjs/server, lucia	8
Node.js Built-ins	fs, path, crypto, child_process	12
Next.js Server	next/headers, cookies(), redirect()	10
File System	readFileSync, writeFile	6
And 9 more...	Email, payment, CMS, analytics, tRPC	~200

When detected → Gemini transforms to client-safe code with mock data.

Video Editor

Track Types: Text, Component, Video, Audio, Image, Cursor, Shape, Particles, Gradient, Film Grain, Vignette, Color Grade, Blob

30+ Animation Presets:

Entrance: fade-in, slide-in-*, zoom-in, bounce, elastic, spring-pop, blur-in, flip-in, rotate-in
Exit: fade-out, zoom-out, blur-out, slide-out-*
Emphasis: pulse, shake, wiggle, heartbeat, jello, glow
Motion: float, drift-right, ken-burns-zoom
Filter: color-pop, flash, hue-shift, cinematic-focus

Spring Physics (5 presets): | Preset | Damping | Stiffness | Mass | Use Case | |--------|---------|-----------|------|----------| | smooth | 200 | 100 | 1 | Professional | | snappy | 200 | 200 | 0.5 | Quick, responsive | | heavy | 200 | 80 | 5 | Slow, dramatic | | bouncy | 100 | 150 | 1 | Playful | | gentle | 300 | 60 | 2 | Soft, subtle |

6 Particle Effects: confetti, sparks, snow, bubbles, stars, dust

6 Cursor Interactions: click, hover, type, focus, select, check

Text Effects: Letter-by-letter animation, gradient fill, glow, glass backgrounds

Device Frames: phone, laptop, full display modes

Voiceover

Gemini 2.5 Flash TTS with responseModalities: ['AUDIO']
5 prebuilt voices: Kore, Charon, Fenrir, Aoede, Puck
24kHz WAV output synced to timeline

Export

Remotion Lambda (AWS) for serverless rendering
MP4, GIF, WebM output formats
Parallelized rendering (60s video → ~30s render)

How we built it

4-Agent Gemini Orchestration System

User: "Create a product video for our auth components"
                    │
                    ▼
┌─────────────────────────────────────────────────────┐
│  DIRECTOR AGENT (Gemini 3 Pro)                       │
│  Tool: create_video_plan                             │
│  • Plans narrative arc (Hook→Setup→Showcase→CTA)    │
│  • Frame budget: 15% hook, 15% setup,               │
│    55% showcase, 15% CTA                            │
│  • Outputs INTENTS ("dramatic", "professional")     │
└─────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────┐
│  SCENE PLANNER AGENT (parallel per scene)            │
│  Tool: create_detailed_scene                         │
│  • Translates intents → spring animations            │
│  • Element positions (0-1 normalized)               │
│  • Keyframes (RELATIVE to element start)            │
│  • Cursor paths for tutorials                        │
│  • Narration scripts                                 │
└─────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────┐
│  ASSEMBLY AGENT (deterministic, no LLM)              │
│  • Fixes keyframe timing mistakes (rescales >90)    │
│  • Normalizes keyframe format                        │
│  • Organizes tracks by z-order                       │
│  • Validates composition                             │
└─────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────┐
│  REFINEMENT AGENT (Gemini 3 Pro)                     │
│  Tool: analyze_composition                           │
│                                                      │
│  Scoring Weights (0-100):                           │
│  • Visual Composition: 30%                          │
│  • Timing: 25%                                      │
│  • Narrative Flow: 25%                              │
│  • Animation Quality: 15%                           │
│  • Accessibility: 5%                                │
│                                                      │
│  Score < 40 → Regenerate from Director              │
│  Score 40-89 → Apply fixes, re-evaluate             │
│  Score ≥ 90 → Ship it                               │
│                                                      │
│  Max iterations: 5 (picks best version)             │
└─────────────────────────────────────────────────────┘

Component Preview Pipeline (3-tier fallback with 4-attempt recovery)

TIER 1: Playwright Browser (95% accuracy)
  • Transform to pure React (Gemini removes external deps)
  • Bundle with esbuild (50+ library mocks)
  • Render in real Chromium on Fly.io worker
  • 4-attempt recovery with progressive simplification:
    - Attempt 1: fix-props (add missing definitions)
    - Attempt 2: simplify (flat structures, empty arrays)
    - Attempt 3: minimal (aggressive mocking)
    - Attempt 4: placeholder div
    - Attempt 5+: skip component
        │
        │ Fails?
        ▼
TIER 2: Server-Side Rendering (70% accuracy)
  • renderToStaticMarkup via esbuild
  • 5-second timeout
        │
        │ Fails?
        ▼
TIER 3: AI-Only Generation (50% accuracy)
  • Gemini generates HTML from source code
  • thinkingBudget: 5000 tokens for reasoning

Tech Stack

Frontend: Next.js 15, React 19, TypeScript
AI: Gemini 3 Pro (agents), Gemini 2.0 Flash (quick ops), Gemini 2.5 Flash TTS
Video: Remotion 4 + AWS Lambda
Component Rendering: Playwright worker on Fly.io (separate app)
Bundling: esbuild with 50+ virtual module mocks
Database: Supabase (PostgreSQL)
Auth: Supabase Auth with GitHub OAuth
Hosting: Fly.io (2 apps: main + Playwright worker, auto-scale)
State: Zustand + Zundo (undo/redo)
Validation: Zod schemas for all AI outputs

Gemini Features Used

Feature	Implementation
Structured Output	JSON schemas with Zod validation on ALL 8 integrations
Function Calling	`create_video_plan`, `create_detailed_scene`, `analyze_composition`
Thinking Mode	Preview generation fallback (5000 token budget)
TTS	`responseModalities: ['AUDIO']` with 5 voice options
Streaming	Real-time chat responses via Server-Sent Events
Long Context	Full source code analysis (900s timeout)
Gemini 3 Pro	All 4 agents, Server→Client transformation, categorization
Gemini 2.0 Flash	Tailwind→CSS conversion (speed optimized)
Gemini 2.5 Flash TTS	Voiceover generation

Challenges we ran into

Server Component Detection: Next.js 13+ Server Components crash in browsers. We built detection with 263 regex patterns across 15 categories (Prisma, Drizzle, NextAuth, Clerk, Node builtins, etc.), then use Gemini to transform to client-safe code with mock data.

Multi-Agent Coordination: Director outputs intents ("dramatic entrance"), Scene Planner translates to specifics (spring configs, exact keyframes). Assembly is pure TypeScript to prevent AI error cascades.

Relative vs Absolute Keyframes: AI kept outputting absolute frames (frame 300 for a fade-in = 10 seconds!). We enforce relative keyframes where frame 0 = when THIS element appears, not video start. Assembly agent auto-fixes violations.

Self-Correction Without Infinite Loops: Weighted scoring with hard thresholds. Score < 40 = regenerate from scratch. Score 40-89 = patch. Score ≥90 = ship. Max 5 iterations, then pick the best version seen.

Preview Verification: Playwright sometimes captures loading states or skeletons. Gemini verifies if the HTML actually represents the component. If invalid, falls back to AI-only generation.

Rate Limits: Added fail-fast detection for 429/quota errors—no retries on rate limits, clear user messaging.

Accomplishments we're proud of

8 distinct Gemini integrations with structured output and function calling
4-agent orchestration with self-correcting refinement loop (score ≥90 to ship)
263 Server Component patterns making Scenery work with any Next.js codebase
3-tier preview system with 4-attempt progressive recovery per component
30+ animation presets with 5 spring physics configurations
Code-connected videos that auto-update when repos change

What we learned

Structured output > free-form prompts. JSON schemas with Zod validation = 100% parse success rate.

Function calling > prompts for agents. Explicit tools (create_video_plan) enforce exact output shapes.

Intents > specifics for planning. Director says "dramatic", Scene Planner decides "spring-bounce with bouncy preset".

Deterministic steps prevent cascades. Assembly Agent uses zero LLM—just TypeScript transforms and validation.

Verification catches bad renders. AI checking AI output catches loading states, empty renders, skeletons.

Fail fast on rate limits. Detect 429/quota errors immediately, don't waste retries.

What's next for Scenery

Vue/Svelte support (parser abstraction)
Template marketplace (pre-built video styles)
CI/CD integration for auto-generated videos on deploy
Collaborative editing (multiplayer timeline)
Version history for compositions

Third-Party Integrations

Remotion

Built With

aws-lambda
esbuild
fly.io
gemini-2.5-flash-tts
gemini-3-flash
gemini-3-pro
next.js-15
playwright
react-19
remotion
supabase
tailwind
typescript
zod
zustand

Updates

Athavan Thambimuthu started this project — Feb 09, 2026 03:17 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.