Inspiration

The Creative Storyteller category challenges us to build agents that act like creative directors — weaving text, images, audio, and video into a single, cohesive stream. It's a powerful vision. However, we believe the next step isn't about AI replacing the storyteller, but rather empowering the human telling the story.

Currently, low-quality AI-generated content (often referred to as "AI slop") is flooding every platform. One-click video generators produce thousands of identical outputs daily, and platforms are already pushing back with regulations. As a result, what becomes truly valuable isn't the sheer volume of AI output, but human judgment, perspective, and voice.

WeaveCastStudio was built on this exact principle. The name "WeaveCast" reflects our core belief: the most compelling stories are woven together by humans and AI. Gemini acts as a tireless co-producer — crawling multilingual sources 24/7, fact-checking via Google Grounding, and generating images, narration, and video briefings. But deciding how to use that material, which story to tell, and what angle to take remains entirely the human broadcaster's craft.

This is especially crucial in journalism. Major newsrooms face growing pressure from external influences, while small and independent media lack the resources to monitor dozens of sources across multiple languages in real time. WeaveCastStudio bridges this gap. A single journalist using WeaveCast can accomplish what used to require an entire newsroom: monitoring global sources, verifying facts, and going live with professional multimedia coverage.

Although we demonstrate this concept through crisis journalism, the architecture is universal. Any YouTuber broadcasting from a home studio — whether covering tech, finance, politics, or gaming — can use WeaveCast to transform their solo operation into a fully supported broadcast, complete with AI-powered research, real-time fact-checking, and voice-controlled content delivery. The storyteller should always be human. The AI is just there to give them superpowers.

What it does

WeaveCastStudio is an AI-powered live broadcast assistant that:

  • Continuously crawls user-defined trusted sources (running 24/7 on GCE).
  • Fact-checks and scores source credibility using Gemini and Google Grounding.
  • Auto-generates video briefings with AI imagery and narration.
  • Lets broadcasters control their stream via voice using the Gemini Live API. (e.g., "Show me the UN report" → The AI selects and plays the relevant video or image.) (e.g., "What is this Arabic post claiming?" → The AI translates it into the broadcaster's native language and replies audibly.)
  • Inserts breaking news alerts directly into a live broadcast.

How we built it

  • Gemini API (google-genai SDK): Used for analysis, image generation via Nano Banana, and TTS.
  • Gemini Live API: Handles real-time voice interaction during the broadcast.
  • Gemini Function Calling: Automatically triggers and plays the appropriate image or video based on user voice commands.
  • DrissionPage: Manages automated web crawling for source monitoring.
  • FFmpeg: Powers the video composition pipeline.
  • Tkinter + OBS: Handles the live broadcast display and streaming interface.
  • Python server + CSS: Renders on-screen text dynamically during the live stream.
  • Google Cloud GCE: Provides the backend for 24/7 data collection.
  • Google Cloud GCS: Ensures data is easily accessible from any environment.

Challenges we ran into

  • Handling asynchronous interrupts between voice commands and video playback.
  • Coordinating real-time Gemini Live API responses with OBS scene transitions.
  • Balancing automated features with the journalist's need for editorial control.
  • Managing hallucinations in images generated by Nano Banana. There is a risk of misinterpreting figures or notations in graphs and maps, potentially leading to misunderstandings. Similarly, the video generation AI Veo3 was difficult to use in news reporting settings. While we could have simply labeled them as "reference images" (a common practice in real-world journalism), we chose to prioritize factual accuracy over visual impact for this project demo.

Accomplishments that we're proud of

We successfully built a fully functional live broadcast system where a journalist can speak naturally, and the AI assistant instantly finds and displays the correct supporting content in real time.

What we learned

  • The Gemini Live API's function calling enabled incredibly powerful voice-driven workflows. For instance, when given a vague instruction like "Explain this," Gemini was able to sequentially display multiple relevant images while providing a coherent verbal explanation. The multi-function call support proved to be exceptionally robust.
  • Google Grounding proved invaluable — not just for tracking breaking news, but for understanding the broader context of events spanning the past few weeks.

What's next for WeaveCastStudio

We want to get WeaveCastStudio into the hands of real journalists, streamers, VTubers, and podcasters to gather practical user feedback. Since we run our own YouTube channel, we also plan to dogfood the product in our own broadcasts to continuously iterate and improve its quality.

Built With

Share this project:

Updates