Inspiration

Content creation often requires piecing together rough notes, images, and voice memos into cohesive marketing assets. We wanted to build a next-generation AI agent that moves beyond simple text-in/text-out interactions. Inspired by the challenge, we set out to build an agent that "thinks and creates like a creative director", capable of taking scattered multimodal inputs and weaving them into a single, cohesive campaign.

What it does

Content Storyteller is a multimodal storytelling platform that generates rich, mixed-media marketing assets in one go. By acting as an AI Creative Director, it seamlessly weaves together text, images, audio, and video in a single, fluid output stream.

Key features include:

  • Interleaved Multimodal Generation: Leverages Gemini to generate copy, visuals, storyboards, voiceover scripts, and short promo videos as a unified package.
  • Trend Analyzer: AI-powered trend discovery across social platforms with one-click handoff to the content pipeline.
  • Smart Pipeline Orchestration: An output-intent inference module that determines which mixed-media assets to generate based on user intent and context.
  • Live Agent Voice Assistant: Integrates voice interaction for natural, real-time creative direction.

How we built it

To meet the mandatory technical requirements, the platform is built entirely on Google Cloud:

  • AI Engine: We utilized Vertex AI (Gemini 2.5 Flash and Pro), specifically leveraging Gemini's native interleaved/mixed output capabilities to generate the cohesive flow of text, images, and video.
  • Backend Compute: An Express API Service and an asynchronous Worker Service, deployed as serverless containers on Cloud Run.
  • State & Storage: Firestore for real-time job state management (streamed to the frontend via SSE) and Cloud Storage for media uploads and final interleaved asset bundles.
  • Messaging: Pub/Sub handles asynchronous job dispatch between the API and Worker.

Challenges we ran into

Handling asynchronous, multi-stage AI generation workflows to create a truly "fluid output stream" required robust orchestration. We solved this by implementing a Pub/Sub-driven worker service combined with Firestore for state tracking, which allowed the frontend to stream real-time updates as the interleaved content (copy + visuals + video) was being generated.

Accomplishments that we're proud of

  • True Interleaved Generation: We successfully leveraged Vertex AI and Gemini 2.5 to move beyond simple text outputs, creating a system that seamlessly weaves together copy, visuals, storyboards, and video briefs into a single cohesive asset bundle.
  • Smart Pipeline Orchestration: We are proud of our "Output-Intent Inference" system, which intelligently scans user prompts, platform defaults, and trend context to dynamically determine the optimal mix of media (e.g., video + image vs. copy-only) on the fly.
  • Live Agent Integration: Successfully implementing the AI Creative Director voice assistant, complete with Vertex AI function calling, native audio output, and an animated equalizer, providing a truly interactive user experience.
  • Robust Cloud Architecture: Building a fully serverless, event-driven backend using Cloud Run, Pub/Sub, and Firestore that reliably handles asynchronous, multi-stage AI generation while streaming real-time status updates back to the client.

What we learned

  • Mastering Multimodal Prompting: We learned how to effectively structure prompts and utilize Gemini's native interleaved capabilities to ensure the AI maintained a consistent creative voice across text, image, and video generation stages.
  • Asynchronous AI Orchestration: Building this taught us how to manage long-running AI tasks gracefully. We learned how to use Pub/Sub to decouple our API from our worker service, and how to use Firestore listeners (SSE) to keep the frontend perfectly in sync with the backend's generation progress.
  • Infrastructure as Code (IaC): We deepened our understanding of GCP by using Terraform to declaratively provision our entire infrastructure, from Cloud Storage buckets to IAM service accounts and Secret Manager entries.

What's next for Content Storyteller

  • Direct Social Media Integration: Moving beyond the current ZIP export bundle to allow one-click publishing directly to platforms like LinkedIn, X/Twitter, and Instagram using their respective APIs.
  • Autonomous Trend Hijacking: Enhancing the Trend Analyzer to not just discover trends, but autonomously propose and draft complete, mixed-media campaigns based on momentum scoring, sending them to the user for final voice-approval via the Live Agent.
  • Expanded Asset Types: Integrating more specialized audio and video generation models into our centralized Model Router to render final, high-fidelity video files directly within the platform.

Built With

Share this project:

Updates