Big Squeeze — Agent-Swarm Filmmaker

Inspiration

Filmmaking is a miracle of collaboration — but it's also slow, expensive, and locked behind a wall of specialized tools and people. A screenwriter types words. A director imagines shots. A cinematographer frames them. Sound designers build worlds from silence. Composers score emotions. Colorists paint light. An editor makes it breathe.

We wanted to know: what if you could collapse that entire production pipeline into a single web page?

The idea was simple: paste a logline. Watch a swarm of AI agents — each with a distinct filmmaking identity — plan, generate, and assemble a short film in real time. No crew, no cameras, no months of post. Just your idea, seven agents, and a live timeline you can watch fill up.

Built for the DevNetwork AI+ML Hack 2026.

What it does

Big Squeeze is a full-stack agent-swarm filmmaker. You type a one-line movie idea (a "logline") like "A getaway driver gets one last job — but the cargo is alive" and press Generate. Behind the scenes, a LangGraph StateGraph orchestrates 7 agents:

Agent	Role	What they produce
Mara Vex (Screenwriter)	Expands the logline into a treatment	Logline + synopsis + 3–5 story beats
Ito Kishida (Director)	Blocks coverage	Ordered shot list with camera moves, moods, durations
Léa Roussel (Cinematographer)	Frames each shot	Text-to-video prompt for AI video models
Tomek Bauer (Sound Designer)	Designs audio	Atmos, foley, and mix notes per shot
Reva Okafor (Composer)	Scores the film	Theme, instrumentation, tempo per shot
Noor Asad (Colorist)	Grades each shot	Palette, contrast, grade direction
Jun Park (Editor)	Reviews the cut	Pacing notes, transitions, final assessment

As the pipeline runs, events stream via Server-Sent Events to a DAW-inspired live timeline UI — you see agents go from IDLE → WORKING → DONE, shots appear on the timeline, and the preview viewport shows each render progress in real time. When the last shot is done, ffmpeg assembles everything into a downloadable MP4.

How we built it

Stack

Next.js 16 (App Router) — full-stack framework
LangGraph (LangChain) — state machine orchestrating the 7-agent pipeline
Vercel AI SDK — LLM calling with structured output (generateObject / generateText)
Groq / OpenAI / AI Gateway — swappable LLM providers
fal.ai — LTX-2 and Seedance 2.0 for video generation
ffmpeg + rsvg-convert — shot assembly (concat MP4 segments, SVG→PNG→MP4 fallback)
Pure CSS — DAW-inspired dark theme with OKLCH colors, no Tailwind in components
TypeScript (strict mode, ES2022)

Architecture

The pipeline is a LangGraph StateGraph with 8 nodes and 2 conditional routers:

START → Screenwriter → Director → Cinematographer → Renderer → PostProduction
                                                                   │
                                           ┌─────────────────────────┘
                                           ▼
                                     QC Router
                                      ├── retryShot → Cinematographer (retry)
                                      └── advanceShot → Advance Router
                                            ├── Cinematographer (more shots)
                                            └── Editor → END

The orchestrator runs the graph as a background promise while a polling loop drains a shared event channel at 50ms, yielding PipelineEvent objects as an AsyncGenerator. The API route wraps this in a ReadableStream and emits SSE lines (data: {...}\n\n). The Studio UI reads via fetch + ReadableStreamDefaultReader.

Dual-path agent design

Every agent has two code paths:

LLM path — calls the configured provider (Groq, OpenAI, or AI Gateway) via the ai SDK. Handles rate-limit retry with exponential backoff, JSON-schema-aware routing, and per-agent model selection (each agent can use a different model).
Fallback path — produces deterministic output using seeded data arrays. This means the entire pipeline runs with zero API keys — perfect for demos, CI, and hackathon judging.

Provider abstraction

Video generation is abstracted behind a VideoProvider interface:

interface VideoProvider {
  generateShot(input: GenerateShotInput): Promise<ShotRender>;
}

Three implementations:

SimulatedVideoProvider — keyless, produces animated SVGs with gradient backgrounds, text overlays, camera/mood metadata. The pipeline's default.
FalVideoProvider — calls fal.ai's LTX-2 or Seedance 2.0 for real AI video with audio. Supports reference-image conditioning for shot-to-shot consistency.
RunPodVideoProvider — stub for self-hosted inference.

Switch with VIDEO_PROVIDER env var. No code changes.

UI

The Studio is inspired by DAWs (Ableton, Pro Tools) — dark, dense, information-rich:

6-track timeline (Story, V1 Shots, V2 B-Roll, A1 Dialogue, A2 Foley, A3 Score) with playhead, clips, and clickable shot regions
Agent panel — 7 named personas with IDLE/WORKING/DONE status lights
Preview viewport — REC indicator, timecode, camera metadata, render progress sweep
Director's Notes — scrolling log of all agent activity with timestamps
Configurable output — 6 aspect ratios, 4 resolutions, 10 target runtimes
Provider selector — swap between simulated / Seedance / LTX-2 on the fly

Challenges we ran into

Shot-to-shot consistency

AI video models generate each clip independently — characters, settings, and style vary wildly between shots. We tackled this with reference-image conditioning: the first frame of each completed shot is extracted (via ffmpeg) and fed as an image_url to the next shot's generation request. It's not perfect, but it creates a visible visual thread.

Rate limits and retries

The Groq free tier has an 8,000 TPM limit. Our post-production node runs Sound Designer, Composer, and Colorist sequentially (not in parallel) to stay under the limit. The LLM wrapper has exponential-backoff retry with rate-limit detection (parses try again in Xs from error messages).

Video model quirks

LTX-2 and Seedance 2.0 have completely different parameter schemes:

LTX-2 uses duration (int 6/8/10) and resolution (1080p/1440p/2160p)
Seedance uses duration (string "4"–"15"), resolution (480p/720p), and aspect_ratio

We built model-specific parameter mappers and resolution-validity guards.

ffmpeg even-dimension constraints

libx264 requires even width and height. We added automatic parity correction at every stage — SVG dimension parsing, fallback generation, and output — after spending an embarrassing hour debugging "width 405 not divisible by 2."

Real-time streaming UX

The SSE stream needed to feel alive without overwhelming the browser. We batch events at 50ms polling intervals, parse incremental buffer chunks (SSE messages can split across TCP segments), and handle unmount safety (clean up timers, abort in-flight requests).

Making the keyless demo feel real

Without real LLM calls, agent outputs needed to be varied, context-relevant, and cinematic — not repetitive. We seeded deterministic fallbacks with rich arrays of cameras, moods, palettes, and themes, and the simulated video provider adds per-shot hue variation via FNV-1a hashing of the prompt text.

Accomplishments that we're proud of

Full pipeline runs with zero credentials — keyless simulation mode means anyone can visit the site and see the full 7-agent pipeline, complete with animated SVGs, live timeline, and a downloadable MP4. No .env setup required.
7 named agents with distinct creative voices — each agent has a name, initials, role, and accent color. The pipeline treats them as actual collaborators, not just function calls.
DAW-grade UI in pure CSS — the timeline, transport controls, shot strip, and agent panel are entirely hand-styled. No component library, no shadcn, no Tailwind in components. 1,287 lines of globals.css with OKLCH color interpolation.
Real-time transparency — you don't just get a film at the end. You watch every creative decision: the writer's beat sheet, the director's shot plan, the cinematographer's prompts, the sound designer's foley notes. The "Director's Notes" log captures it all.
6 aspect ratios, 4 resolutions, 10 runtimes — from 9:16 phone vertical to 2.39:1 cinema scope, 360p to 1080p, 5 seconds to 10 minutes. The pipeline dynamically adjusts shot count and duration scaling.
22 curated preset loglines — daily-seeded random selection of 4 presets so the demo always feels fresh.
Reference-image conditioning — automated ffmpeg frame extraction feeds the previous shot's first frame as a visual reference for the next shot, creating shot-to-shot consistency without manual intervention.

What we learned

LangGraph is remarkably well-suited for agent orchestration. The StateGraph pattern with typed, mergeable annotations made it natural to add new fields (shot revisions, per-shot results, reference frames) without refactoring existing nodes. The conditional edge system lets us build a QC loop (retry failed shots up to 3 times) that cleanly routes back through the generation pipeline.

Provider abstraction is worth the overhead. The VideoProvider interface let us develop and test the entire pipeline with simulated video, then swap in real fal.ai generation with a single env var change. When Seedance 2.0 came out mid-hackathon, adding support was a new adapter class — zero changes to agents, graph, or UI.

SSE is still the simplest real-time protocol. WebSockets would have been overkill for one-directional event streaming. A ReadableStream + TextEncoder + "data: ...\n\n" format is trivially debuggable (you can curl the endpoint), works through any proxy, and the browser's fetch streaming API handles backpressure automatically.

Fallback-first development is a superpower. Building every agent with a deterministic fallback path meant we could iterate on the pipeline logic, UI, and assembly without waiting for LLM responses or burning API credits. The fallback path is not a downgrade — it's a first-class development and demo strategy.

AI video models are powerful but chaotic. LTX-2 and Seedance produce remarkable clips, but they're effectively random seeds given the same prompt. Shot-to-shot consistency is the open problem. Our reference-image approach helps but doesn't solve it — the output reads as an animatic / concept visualizer, which is an honest and useful niche.

What's next for Big Squeeze

Near-term

RunPod provider — self-hosted LTX-2 for sovereign/private generation (sponsor-aligned)
Pre-rendered fallback films — cached demo outputs for stage presentations where latency is critical
Audio generation — proper music bed and foley synthesis instead of descriptive notes
Longer film support — beyond 10 minutes with scene-level parallelism

Medium-term

Real-time iteration — pause the pipeline, tweak a shot's prompt or duration, resume
Shot re-ordering and editing — drag-and-drop the shot strip to reorder before assembly
Multi-modal input — accept full screenplays (.fountain, .fdx) and reference images
User accounts and history — saved projects, shared films, public gallery

Long-term

Custom agent personas — let users define their own crew with custom model assignments
Consistent characters — fine-tune or LoRA a video model on generated character images for true visual continuity
Branching pipelines — explore multiple creative directions in parallel and pick the best cut
Collaborative directing — multiple users watch and annotate the same live timeline

Built With

ai-gateway-(vercel)
css-(oklch)
fal.ai
fal.ai-(ltx-2-/-seedance-2.0-text-to-video)-media:-ffmpeg-(concat
ffmpeg
frame-extraction)
groq
langchain
langgraph
langgraph-(langchain-stategraph)-ai/ml:-vercel-ai-sdk
ltx-2
next.js-16-(app-router)
node.js
node.js-?24-frameworks:-next.js-16-(app-router)
openai-api
pnpm
react-19
rsvg-convert
rsvg-convert-(svg?png)-infrastructure:-pnpm-10
seedance
server-sent-events-streaming-standards:-zod-(schema-validation)
typescript
typescript-(strict)
typescript-6-strict-mode
vercel-ai-sdk
vercel-deployable
zod

Updates

Dushyant Kumar Kashyap started this project — May 27, 2026 01:39 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.