Inspiration Honestly, Just started out playing around and been a youtuber,I wanted to create an app that will help me automate part of my content creation process: the massive gap between having a story in your head and actually seeing it come to life visually. Pre-production is brutal—storyboarding, designing consistent characters, mapping out every single shot. It takes forever, requires a bunch of specialized skills, and honestly was too much.

The whole idea for ContentGen AI started with me asking myself: what if AI could actually be my creative partner? Not just another tool that spits out generic content, but something that genuinely understands how films work. I wanted to build something where you could dump in a script and it would intelligently break it down into everything you'd need for production—making it possible for anyone to turn their story into something visual, right away.

What it does ContentGen AI acts as your AI co-director, transforming written scripts into production-ready video content. The platform takes your story and automatically generates a complete storyboard with shot descriptions, camera movements, and cinematic pacing. It creates detailed character profiles to maintain visual consistency across scenes, and offers two powerful video production workflows: an audio-first approach where the app generates narration and syncs visuals to match, or a B-roll sync mode where you upload your own voiceover and manually assign images to specific moments using an intuitive transcript editor.

You can enhance your scripts with richer detail, upload reference images for style consistency, and export a fully edited video—all from your browser. It's designed to bridge the gap between written narrative and visual storytelling without requiring technical film production skills.

How I built it The entire platform runs on Google's Gemini 3 models, and I basically treat them like a full production crew managed by one AI director.

The Foundation: Everything depends on Gemini 3's JSON Mode with responseSchema. When you submit a story, Gemini acts like a script supervisor, breaking your narrative into structured JSON output with scenes, shot descriptions, pacing notes, camera movements, and exact text references. That structured data becomes the blueprint for everything else.

The Creative Engine: I built features that tap into Gemini's reasoning capabilities. The "Enhance" function enriches scripts with cinematic detail, while "Suggest Style" analyzes tone and recommends artistic direction. For visual consistency, I created a two-part character profile system: one part locks in unchangeable features (facial structure, body type), and the other handles scene-specific variations (emotions, clothing). Gemini uses these profiles to maintain continuity across your storyboard.

The Production Pipeline: The in-browser video editor offers two workflows. Audio-First Production generates narration using Gemini TTS, uses forced alignment for word-level timestamps, and sequences images to match voiceover pacing. B-Roll Sync lets users upload their own audio, transcribes it, and provides a "Transcript Editor" where you highlight words and assign images to specific moments.

I built intelligent routing so different models handle different tasks: Gemini 3 Pro does heavy reasoning and image generation, Gemini Flash handles quick tasks, and the specialized TTS model creates narration.

Challenges ran into Script Coverage: Early versions would occasionally summarize or skip parts of the script when building storyboards. I fixed this by adding an "unbreakable rule" about Total Narrative Coverage in the system prompt. Now the combined text from all shots must perfectly match the original script.

Audio-Visual Synchronization: The biggest technical nightmare was synchronizing visuals with the audio timeline. Generating audio, getting precise word-level timestamps, and mapping dozens of clips to those timestamps is complex. I built a robust pipeline with fallback mechanisms (like using Gemini's transcription if the timestamp model fails) and normalization functions to keep everything perfectly synced.

Browser Performance: Running a full video editor in the browser is intense. I implemented several optimizations: a custom LRU cache for pre-rendered subtitle bitmaps, requestAnimationFrame for the playback loop, and throttled rendering to ensure smooth playback even with complex effects.

Accomplishments that we're proud of I'm most proud of building a truly collaborative AI tool rather than just an automation system. The Transcript Editor creates an entirely new way for creators to work with AI—it's intuitive, gives users precise control, and feels natural.

The character consistency system was a major breakthrough. Solving the visual continuity problem that plagues AI image generation opened up possibilities for longer-form storytelling that weren't feasible before.

And honestly? Getting a full video production pipeline running entirely in the browser with smooth performance—that was a technical challenge I'm really happy we conquered.

What I learned The biggest realization? Gemini isn't just a text generator—it's a reasoning engine. Its ability to follow complex instructions, work with structured data through responseSchema, and handle multimodal analysis makes it fundamentally different from basic AI models.

The real magic isn't in any single AI call—it's in how you architect the workflow around it. Features like the Transcript Editor don't just automate tasks; they create entirely new ways for humans to collaborate with AI.

The most powerful creative tools facilitate partnership between user and AI. Whether it's refining character descriptions or manually assigning images, building a co-director instead of an autopilot produces the most compelling creative work.

What's next for ContentGen AI The roadmap is ambitious.First is to make it accessible to public then I want to add real-time collaborative editing so teams can work together on projects simultaneously. I'm exploring integration with professional video editing software for users who want to take their ContentGen outputs into more advanced post-production workflows.

I'm also working on expanding the model capabilities—adding support for more cinematic techniques like match cuts, transitions, and complex camera movements. There's potential to integrate live-action reference footage analysis, where users could upload example scenes and have the AI replicate cinematography styles.

Long-term, I see ContentGen AI evolving into a full production suite where you can generate not just storyboards and videos, but also shot lists, production schedules, and even AI-generated music scores tailored to your story's emotional beats. The goal is to make professional-quality video production accessible to anyone with a story to tell.

Built With

  • ai33
  • elevenlab
  • gemini
  • gradio
  • stability
Share this project:

Updates