Gemini DJ Hackathon Submission

Project Name: `Gemini DJ: Interactive YouTube Chat-to-Music Live`

Elevator Pitch: `Empower your YouTube community to co-create live music and visuals through real-time AI generation.`

Inspiration

Streaming music is often a passive experience. We noticed that YouTube live streamers and their communities wanted more than just a background playlist—they wanted to participate in the vibe. We were inspired to build a "Living Radio Station" where the AI isn't just a player, but a performer that listens to the crowd, reacts to their moods, and co-creates a unique multisensory journey in real-time.

What it does

Gemini DJ is a fully autonomous, AI-driven live DJ system.

Interactive Requests: It monitors YouTube Live Chat for music requests or mood descriptions.
AI Decision Making: A "DJ Brain" (Gemini 1.5) analyzes the chat to decide when and how to shift the music style.
Real-time Generation: It leverages the Lyria RealTime API to generate seamless, high-fidelity music on the fly.
Multimodal Output: Simultaneously, it generates contextual DJ voice announcements (TTS) and dynamic vinyl-style visual art that matches the new track's vibe.

How we built it

We chose a modular, pure-frontend architecture to ensure the lowest possible latency and ease of deployment.

Brain: Gemini 1.5 Flash handles semantic analysis of chat messages and generates creative track names/DJ scripts.
Music: The Lyria RealTime Engine serves as the core instrument, transforming AI prompts into live audio.
Visuals & Voice: Gemini’s multimodal capabilities generate the circular record artwork and the DJ's vocal persona.
Integration: Built with Vanilla JS and a custom OAuth helper (auth.html) to bridge the gap between secure Google authorization and OBS Browser Sources.

Challenges we ran into

Real-time Synchronization: Coordinating audio generation with visual updates and TTS voice-overs without a backend server was a complex state-management challenge.
OBS Integration: OBS browser sources have restricted environments. We had to design a unique "Authorization Helper" to securely pass OAuth tokens and API keys via URL parameters.
Vibe Preservation: We implemented a "contextual cooldown" logic where the AI creatively explains why the current vibe still fits the request, preventing chaotic music shifts while keeping the audience engaged.

Accomplishments that we're proud of

Serverless Architecture: We successfully built a high-performance AI agent that performs heavy multimodal tasks (Audio/Image/Text) entirely within the user's browser.
Seamless Interaction: Seeing the DJ "hear" a user request, reply with their name, and change the music style within seconds feels like magic.
Lyria Mastery: Crafting a knowledge-based prompting system that guides the AI to use Lyria's instruments and moods effectively.

What we learned

We deep-dived into the world of Real-time AI Audio Prompting. We learned how to structure musical prompts for Lyria to ensure consistency and how to use LLMs like Gemini to act as a bridge between human "vague" requests and machine-precise music parameters (BPM, density, tone).

What's next for Gemini DJ: Interactive YouTube Chat-to-Music Live

Enhanced Personalities: Allow streamers to choose different DJ personas (e.g., Chill Lo-fi Girl, Energetic Techno DJ).
Multi-platform Support: Expand beyond YouTube to Twitch and TikTok Live.
Crowd-Created Tracks: Allow users to export the "best moments" of a stream as high-quality, AI-generated MP3s for social sharing.
PWA Development: Transform the web app into a mobile-first Progressive Web App for DJs on the go.

Built With

audio
css3
github
html5
javascript
lyria-realtime-api
multimodal-ai
real-time
serverless
web-audio-api
youtube-data-api
youtube-oauth-2.0

Updates

建宏林 started this project — Jan 18, 2026 10:27 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.