Inspiration

On a working film set, a director juggles seven specialists at once: a DP arguing about framing, a script supervisor flagging continuity breaks, an AD chasing the shot list, a VFX supervisor marking clean plates, a writer reminding everyone what the scene means, an editor asking for coverage that does not exist yet, and a playback op pulling up the last take.

Most of them react after the fact. At lunch. In dailies. By then the shot is gone and the location is struck. The classic mid-shoot panic is some version of "we don't have the reaction, can we pick it up?" And you can't, because the actors have gone home.

HollyProd was built to fix that window. The "while the actors are still in costume" moment is where AI can save a production.

What it does

HollyProd is a real-time multi-agent AI co-pilot for film directors. A director uploads their script, the app parses it and generates scene nodes on a canvas. For each scene they upload footage from multiple camera angles (a super common thing in films).

Our Holly then:

Picks the best camera angle and explains the reasoning Guards continuity by detecting wardrobe, prop and lighting mismatches between scenes Reimagines the scene live using xAI Grok image editing ("imagine a tiger behind the actor" generates a previs still of the actual frame) Answers script questions from the shooting script itself ("what is the emotional intent of this line?") Reads the current frame for composition, lighting and depth Learns from wrapped scenes and suggests angles, emotions and wardrobe notes for the next scene before you roll The entire interface is voice-first. The director speaks, the AI responds out loud. No forms, no dropdowns, no prompting.

How we built it

Electron desktop shell for direct mic and camera access React, Vite, Tailwind and Zustand on the frontend with a cinematic broadcast-monitor UI FastAPI and WebSockets on the backend running a five-agent pipeline: Vision, Continuity, Creative Director, VFX and Post, Script and Story, all coordinated by an Orchestrator xAI Grok (grok-3 and grok-3-mini) as the primary LLM for all reasoning, synthesis and scene reimagining Gemini Live API for voice transcription from the director's mic OpenAI TTS so the AI speaks responses back to the director OpenCV for extracting the first frame of every uploaded video clip for analysis A per-scene, per-camera-slot footage system (Camera A/B/C/D) with persistent frame storage across playground visits

Challenges we ran into

Getting the correct video frame into Grok's image editing API rather than generating a random image required server-side frame extraction with OpenCV and a dedicated server-fetch path to avoid browser CORS issues Parsing diverse script formats (markdown headings, bold text, triple-dash separators) into clean structured scene nodes required a full rewrite of the scene splitter Keeping footage isolated per scene so Camera A from Scene 1 never leaked into Scene 2 The orchestrator was recommending camera angles that had not been uploaded yet, fixed by injecting the live list of available camera slots into the LLM system prompt at request time

Accomplishments that we're proud of

A fully voice-driven director experience where you can ask anything on set and get a grounded, actionable answer in seconds A real continuity error detection flow with a 20-second analysis animation followed by a visual side-by-side frame comparison showing exactly what changed Cross-scene intelligence: wrapping a scene triggers AI suggestions for the next scene covering angles, emotions and wardrobe continuity A UI that genuinely looks like a professional broadcast monitor with camcorder-style tiles, scanlines, vignette overlays and illuminated camera badges

What we learned

Multi-agent architectures work best when each agent owns exactly one crew role. Fewer agents is always sharper than more. The "while the actors are still in costume" window is the most valuable moment in production. Real-time AI that catches problems during the shoot is a fundamentally different product category from everything that exists today. xAI Grok's image editing is powerful for on-set previs but only when the actual scene frame is provided as context, not a text description. Voice-first interfaces feel completely natural for a director on a noisy set in a way that a chat box never will.

What's next for HollyProd

Real camera ingest via Blackmagic, NDI and SDI capture cards so the feed is a live sensor, not an uploaded clip Multi-user on-set mode where the DP, script supervisor and AD each get a tailored view of the same agents running in parallel Integration with StudioBinder and Frame.io to read the shot list and write the take log back automatically On-premise deployment for studios that cannot send dailies to third-party clouds Director style fine-tuning so "make this feel like a Fincher shot" is trained on the director's own films and actually converges to that look

Built With

Share this project:

Updates