Inspiration

Video editing hasn't fundamentally changed in 20 years. You still drag clips, scrub timelines, and manually adjust every parameter. We asked: what if your editor could just listen?

With Gemini's Live API enabling real-time multimodal interaction, we saw the opportunity to build something genuinely new - an agent that hears your intent, sees your current edit, generates creative assets, and applies changes directly to the timeline.

App is deployed and fully functional: http://gemininychackathon.vercel.app

The website is live and fully functional. Right now, it supports only 1 video per email, so please do not abuse it.

What it does

StoryLab is a Gemini Live-powered video creation and editing agent with three modes:

Live Edit Mode - Talk to your video. Scout, our editing agent, runs on Gemini Live API with full interruption support. Ask it to add text overlays, swap music, apply visual effects, trim scenes, or fill YouTube metadata - all by voice, in one persistent session.

Screen-Aware Mode - Scout watches your editor at 1 frame/second. When you say "edit this image" or "fix this scene," it knows exactly what's on screen. No clicking, no selecting, no explaining context.

Creative Director Mode - Gemini's interleaved generation produces storyboards, scene images, and narrative direction in one co-created output stream. Not stitched together after the fact - generated together.

StoryLab also includes a full video generation pipeline: voice/text/PDF → Gemini 2.5 Pro script agent → TTS voiceover → image generation with visual QA → FFmpeg composition → Twick render → publish to YouTube Shorts, Instagram, and TikTok.

How we built it

  • Backend: FastAPI (Python) on Google Cloud Run, two services - voicevid-api (512MB/60s) and voicevid-worker (4GB/900s, internal-only)
  • Live Agent: Gemini Live API WebSocket session with 25+ edit tools - music, effects, captions, image generation, timeline operations
  • Script Agent: Gemini 2.5 Pro ReAct agent with 14-turn reasoning loop, quality scoring (threshold ≥ 70/100), and async Cloud Tasks queue
  • Video Pipeline: 7-stage async pipeline - TTS (gemini-2.5-flash-tts), image generation (Gemini Image 3.1 Flash), visual QA (gemini-2.5-flash), FFmpeg composition, Twick renderer (Node.js/Puppeteer on Cloud Run)
  • Creative Director: Gemini 2.0 Flash interleaved text+image generation
  • Frontend: Next.js 15 App Router, Firebase Auth, Twick editor SDK
  • Infrastructure: Cloud Tasks, Cloud Firestore, Cloud Storage, Secret Manager, Cloud Build, Artifact Registry, Vertex AI

Challenges we ran into

  • Managing Gemini Live WebSocket state across interruptions and mode switches while keeping tool execution in sync with the editor
  • Building screen-aware context injection at 1fps without overloading the Live session with redundant frames
  • Veo 3 was too expensive for fully animated motion video, so we shifted to motion-picture style outputs instead

Accomplishments that we're proud of

  • A truly interruption-friendly live agent that maintains edit context across the full session
  • 25+ edit tools working reliably via voice with zero manual UI interaction
  • End-to-end video generation from voice input to published YouTube Short in one flow
  • Screen-aware editing - the agent understands what's visible without DOM access or APIs

What we learned

  • Gemini Live API's bidirectional streaming is genuinely suited for stateful, long-running creative sessions - not just Q&A
  • Interleaved generation changes how creative workflows feel - assets emerge with the narrative, not after it
  • Building reliable async pipelines on Cloud Run requires careful concurrency design (containerConcurrency: 1 on the worker was critical)

What's next for StoryLab

  • Multi-track editing support with scene-level agent awareness
  • Collaborative sessions - multiple editors, one agent
  • Direct TikTok and Instagram API publishing
  • Agent memory across sessions for brand voice consistency

Built With

  • artifact-registry
  • cloud-build
  • cloud-firestore
  • cloud-storage
  • fastapi
  • ffmpeg
  • firebase-auth
  • gemini-2.0-flash-(interleaved)
  • gemini-2.5-flash
  • gemini-2.5-flash-tts
  • gemini-2.5-pro
  • gemini-image-3.1-flash
  • gemini-live-api
  • google-cloud-run
  • google-cloud-tasks
  • google-genai-sdk
  • next.js
  • node.js
  • puppeteer
  • python
  • secret-manager
  • tailwind-css
  • twick-sdk
  • typescript
  • vertex-ai
  • websocket
Share this project:

Updates