Gemini DJ Hackathon Submission
Project Name: Gemini DJ: Interactive YouTube Chat-to-Music Live
Elevator Pitch: Empower your YouTube community to co-create live music and visuals through real-time AI generation.
Inspiration
Streaming music is often a passive experience. We noticed that YouTube live streamers and their communities wanted more than just a background playlist—they wanted to participate in the vibe. We were inspired to build a "Living Radio Station" where the AI isn't just a player, but a performer that listens to the crowd, reacts to their moods, and co-creates a unique multisensory journey in real-time.
What it does
Gemini DJ is a fully autonomous, AI-driven live DJ system.
- Interactive Requests: It monitors YouTube Live Chat for music requests or mood descriptions.
- AI Decision Making: A "DJ Brain" (Gemini 1.5) analyzes the chat to decide when and how to shift the music style.
- Real-time Generation: It leverages the Lyria RealTime API to generate seamless, high-fidelity music on the fly.
- Multimodal Output: Simultaneously, it generates contextual DJ voice announcements (TTS) and dynamic vinyl-style visual art that matches the new track's vibe.
How we built it
We chose a modular, pure-frontend architecture to ensure the lowest possible latency and ease of deployment.
- Brain: Gemini 1.5 Flash handles semantic analysis of chat messages and generates creative track names/DJ scripts.
- Music: The Lyria RealTime Engine serves as the core instrument, transforming AI prompts into live audio.
- Visuals & Voice: Gemini’s multimodal capabilities generate the circular record artwork and the DJ's vocal persona.
- Integration: Built with Vanilla JS and a custom OAuth helper (
auth.html) to bridge the gap between secure Google authorization and OBS Browser Sources.
Challenges we ran into
- Real-time Synchronization: Coordinating audio generation with visual updates and TTS voice-overs without a backend server was a complex state-management challenge.
- OBS Integration: OBS browser sources have restricted environments. We had to design a unique "Authorization Helper" to securely pass OAuth tokens and API keys via URL parameters.
- Vibe Preservation: We implemented a "contextual cooldown" logic where the AI creatively explains why the current vibe still fits the request, preventing chaotic music shifts while keeping the audience engaged.
Accomplishments that we're proud of
- Serverless Architecture: We successfully built a high-performance AI agent that performs heavy multimodal tasks (Audio/Image/Text) entirely within the user's browser.
- Seamless Interaction: Seeing the DJ "hear" a user request, reply with their name, and change the music style within seconds feels like magic.
- Lyria Mastery: Crafting a knowledge-based prompting system that guides the AI to use Lyria's instruments and moods effectively.
What we learned
We deep-dived into the world of Real-time AI Audio Prompting. We learned how to structure musical prompts for Lyria to ensure consistency and how to use LLMs like Gemini to act as a bridge between human "vague" requests and machine-precise music parameters (BPM, density, tone).
What's next for Gemini DJ: Interactive YouTube Chat-to-Music Live
- Enhanced Personalities: Allow streamers to choose different DJ personas (e.g., Chill Lo-fi Girl, Energetic Techno DJ).
- Multi-platform Support: Expand beyond YouTube to Twitch and TikTok Live.
- Crowd-Created Tracks: Allow users to export the "best moments" of a stream as high-quality, AI-generated MP3s for social sharing.
- PWA Development: Transform the web app into a mobile-first Progressive Web App for DJs on the go.
Built With
- audio
- css3
- github
- html5
- javascript
- lyria-realtime-api
- multimodal-ai
- real-time
- serverless
- web-audio-api
- youtube-data-api
- youtube-oauth-2.0
Log in or sign up for Devpost to join the conversation.