ScriptViz AI - Hackathon Submission

Inspiration

Every great film starts as a text document, but the journey from script to screen is often where the vision gets lost. We realized that writers, directors, and producers suffer from a massive "visualization gap." Describing a "cyberpunk city at dusk" is easy, but ensuring the whole team imagines the same city is incredibly difficult and expensive.

We asked ourselves: What if you could "watch" your movie the moment you finished writing it?

Inspired by the rapid advancements in Generative AI, we wanted to build a bridge between the written word and the visual medium. We built ScriptViz AI to democratize pre-visualization, allowing creators to iterate on their visual storytelling instantly, saving weeks of pre-production time and budget.

What it does

ScriptViz AI is an intelligent pre-production engine that transforms raw screenplay text into a comprehensive visual guide.

  • Instant Script Parsing: Users simply upload a PDF screenplay. The system analyzes scene headings, action lines, and dialogue to understand the context of every scene.
  • AI Storyboarding: It automatically generates cinematic, high-quality storyboard panels for every scene, translating text descriptions into visual camera angles using the Gemini Nano Banana model.
  • Smart Script Breakdown: Using NER (Named Entity Recognition) models, the system automatically identifies and categorizes key elements like characters, props, and locations, turning a block of text into structured production data.
  • Visual Consistency via "Artifacts": We implemented a feature called Artifacts to solve the consistency problem. It allows the system to store reference images for specific characters and locations. The AI uses these "master shots" to generate new frames, ensuring the hero looks the same in Scene 1 as they do in Scene 50.

How we built it

We utilized a modern, scalable tech stack designed for speed and seamless user experience:

  • Frontend: Built with React and Material UI to create a clean, intuitive, and responsive dashboard experience.
  • Backend: A robust Node.js server handles the API requests and orchestration, integrated with Firebase for real-time data management and secure authentication.
  • Text Processing: We utilize Google's Gemini model to parse the unstructured script text. Gemini's massive context window allows us to feed in entire acts of a screenplay at once, ensuring the AI understands the full narrative arc.
  • Analytics & Breakdown: We integrated NER (Named Entity Recognition) models to scan the script and tag entities, providing a detailed breakdown of who and what is in every scene.
  • Image Generation: The visual engine is powered by the Gemini Nano Banana model. We engineered a pipeline that combines the script's action lines with our Artifacts system (reference images) to create consistent, high-fidelity storyboard frames.

Challenges we ran into

  • The "Consistency Problem": The biggest hurdle with Generative AI is keeping a character looking the same across different generated images. We had to build the Artifacts system from scratch, creating a way to pass character and location reference data into the image generation pipeline effectively.
  • Parsing Screenplay Formats: Screenplays are technically standard but practically messy. Teaching our parser to correctly identify the difference between a character name, a parenthetical, and a dialogue line in a raw PDF was a significant engineering challenge.
  • Prompt Engineering for Cinema: Translating literary descriptions (e.g., "He looks at her with a burning rage") into visual instructions (e.g., "Close up, male face, angry expression, dramatic lighting, 85mm lens") required extensive trial and error.

Accomplishments that we're proud of

  • The "One-Click" Pipeline: We successfully built a system where a user uploads a raw PDF and gets a full visual feed without any manual intervention.
  • Solving Consistency: We are particularly proud of the Artifacts implementation. Seeing the same character appear consistently across different scenes and lighting conditions was a major breakthrough for the tool's usability.
  • Cinematic Quality: The images aren't just generic cartoons; they look like film stills. We are proud of the "cinematic look" prompt modifiers we developed.

What we learned

  • Context is King: We learned that for AI to generate a meaningful storyboard for Scene 5, it often needs context from Scene 4. Isolated generation leads to disjointed storytelling.
  • The Human Element: We discovered that AI isn't replacing the artist; it's accelerating them. The best results came when we added features for users to manually tweak the prompts or update an Artifact after the initial generation.
  • Data Structure Matters: We gained a new appreciation for the importance of structuring unstructured data early in the pipeline to prevent errors downstream.

What's next for Scriptviz AI

  • 3D Environment Export: We plan to bridge the gap to 3D by exporting scene data directly into Unreal Engine, allowing for rough 3D blocking and camera movement planning.
  • Animatics (Video) Generation: Moving from static storyboards to AI-generated video clips to better visualize motion, timing, and camera pans.
  • Collaborative Mode: Implementing real-time collaboration so Directors and DPs can annotate, draw over, and edit the storyboards together in a shared workspace.
  • Custom Model Training: Allowing production houses to train the model on their specific concept art style or the likeness of their actual cast.

Built With

Share this project:

Updates