Inspiration
Independent music production often stalls at the "blank page" phase. While tools exist to generate songs, they lack a cohesive creative vision. I wanted to build a world-class Creative Producer who acts as a true studio partner—someone who understands music history, tracks current trends, and builds deep narratives before a single note is played.
What it does
The Creative Producer is a multimodal AI agent that provides real-time, face-to-face creative consultation:
Real-time Brainstorming: Using the Gemini Multimodal Live API and Simli, she engages in bidirectional voice dialogue with near-zero latency.
Vector-Backed Memory: She cross-references every new idea against a 768-dimensional streaming index in Vertex AI. This "Gap Analysis" ensures that every proposed album concept is mathematically unique compared to the existing repertoire.
Structured Output: Once a concept is finalized, she "prints" a full production package, including narrative, musical style tags, and visual art prompts.
How we built it
To ensure a production-grade experience, I focused on a stateful, high-throughput architecture:
Frontend: Built with React and the Simli-SDK for lifelike avatar interaction.
Agent Core: Utilizes the Gemini Multimodal Live API for real-time reasoning and Gemini 3.1 Pro for complex creative planning.
The "Brain" (Vector Search): I implemented a Vertex AI Streaming Index. This allows for sub-6ms latency during similarity checks, allowing the producer to "remember" its entire history instantly.
Public/Private Split: The public SimliGemini repository serves as the core agent component, utilizing mocked data for recent concepts to demonstrate the interface while maintaining a real, live connection to the Vertex AI vector memory.
Memory Depth: 768 Dimensions. Latency: 6ms average similarity lookup. Current Repertoire: ~100 unique concepts indexed (albums).
Challenges we ran into
Streaming vs. Batch: We migrated from a static Batch index to a Streaming Index to allow the producer to learn and update its memory in real-time without downtime.
Quota Management: Gemini 3.1 Pro vs Gemini 2.5 Pro
Accomplishments that we're proud of
Successfully creating a "human-in-the-loop" experience where a complex tool call (printing album) feels like a natural part of a conversation. The synergy between the Google Search tool and the creative persona results in surprisingly deep genre-fusions, like "Post-Digital Cyber-Folk".
What we learned
We learned that Native Audio tokens are far more expressive than standard TTS, allowing for emotional nuance that is critical for a creative partner. We also gained deep experience in Web Audio API thread management and stateful WebSocket protocols.
What's next for Creative Producer for Radio AI
The current state of the Creative Producer marks just the beginning of a fully autonomous music label. Our roadmap focuses on deepening the agent's multimodal capabilities and expanding her creative reach:
Automated Video "Printing" with Veo: We are integrating Veo to automatically generate high-fidelity, 4K music videos for every "printed" album concept. By mapping the semantic mood from our Vertex AI similarity checks directly to video prompts, we will create a 24/7 visual radio experience.
Multi-Lingual Global Personas: Using Gemini Live, the Producer will soon be able to engage with artists in their native languages—from Russian to Thai—while maintaining a consistent creative memory across all dialects.
Advanced Repertoire Synthesis: We plan to evolve beyond simple "Gap Analysis" into "Fusion Discovery." The agent will intentionally identify the two most distant concepts in her 768-dimensional memory (e.g., Symphony of Fading Light vs. Neon Oasis) and propose a hybrid genre that has never existed before.
Built With
- audioworklet
- audioworklets
- gemini-multimodal-live-api
- react
- simli-sdk
- typescript
- vector-search
- vertex
- web-audio-api
- webrtc
Log in or sign up for Devpost to join the conversation.