Gemini DJ Hackathon Submission

Project Name: Gemini DJ: Interactive YouTube Chat-to-Music Live

Elevator Pitch: Empower your YouTube community to co-create live music and visuals through real-time AI generation.


Inspiration

Streaming music is often a passive experience. We noticed that YouTube live streamers and their communities wanted more than just a background playlist—they wanted to participate in the vibe. We were inspired to build a "Living Radio Station" where the AI isn't just a player, but a performer that listens to the crowd, reacts to their moods, and co-creates a unique multisensory journey in real-time.

What it does

Gemini DJ is a fully autonomous, AI-driven live DJ system.

  1. Interactive Requests: It monitors YouTube Live Chat for music requests or mood descriptions.
  2. AI Decision Making: A "DJ Brain" (Gemini 1.5) analyzes the chat to decide when and how to shift the music style.
  3. Real-time Generation: It leverages the Lyria RealTime API to generate seamless, high-fidelity music on the fly.
  4. Multimodal Output: Simultaneously, it generates contextual DJ voice announcements (TTS) and dynamic vinyl-style visual art that matches the new track's vibe.

How we built it

We chose a modular, pure-frontend architecture to ensure the lowest possible latency and ease of deployment.

  • Brain: Gemini 1.5 Flash handles semantic analysis of chat messages and generates creative track names/DJ scripts.
  • Music: The Lyria RealTime Engine serves as the core instrument, transforming AI prompts into live audio.
  • Visuals & Voice: Gemini’s multimodal capabilities generate the circular record artwork and the DJ's vocal persona.
  • Integration: Built with Vanilla JS and a custom OAuth helper (auth.html) to bridge the gap between secure Google authorization and OBS Browser Sources.

Challenges we ran into

  • Real-time Synchronization: Coordinating audio generation with visual updates and TTS voice-overs without a backend server was a complex state-management challenge.
  • OBS Integration: OBS browser sources have restricted environments. We had to design a unique "Authorization Helper" to securely pass OAuth tokens and API keys via URL parameters.
  • Vibe Preservation: We implemented a "contextual cooldown" logic where the AI creatively explains why the current vibe still fits the request, preventing chaotic music shifts while keeping the audience engaged.

Accomplishments that we're proud of

  • Serverless Architecture: We successfully built a high-performance AI agent that performs heavy multimodal tasks (Audio/Image/Text) entirely within the user's browser.
  • Seamless Interaction: Seeing the DJ "hear" a user request, reply with their name, and change the music style within seconds feels like magic.
  • Lyria Mastery: Crafting a knowledge-based prompting system that guides the AI to use Lyria's instruments and moods effectively.

What we learned

We deep-dived into the world of Real-time AI Audio Prompting. We learned how to structure musical prompts for Lyria to ensure consistency and how to use LLMs like Gemini to act as a bridge between human "vague" requests and machine-precise music parameters (BPM, density, tone).

What's next for Gemini DJ: Interactive YouTube Chat-to-Music Live

  • Enhanced Personalities: Allow streamers to choose different DJ personas (e.g., Chill Lo-fi Girl, Energetic Techno DJ).
  • Multi-platform Support: Expand beyond YouTube to Twitch and TikTok Live.
  • Crowd-Created Tracks: Allow users to export the "best moments" of a stream as high-quality, AI-generated MP3s for social sharing.
  • PWA Development: Transform the web app into a mobile-first Progressive Web App for DJs on the go.

Built With

  • audio
  • css3
  • github
  • html5
  • javascript
  • lyria-realtime-api
  • multimodal-ai
  • real-time
  • serverless
  • web-audio-api
  • youtube-data-api
  • youtube-oauth-2.0
Share this project:

Updates