Inspiration
Music is everywhere — But every musician shares one frustration: you can't always find a band or sometime single instrumentalist to rehearse with. Singers practice alone. Guitarists jam without rhythm sections. Street performers who'd sound incredible with accompaniment play solo.
When we saw Gemini 3's real-time multimodal reasoning, the idea was instant: what if AI could listen to you and play along — like a bandmate who never cancels?
What it does
JamPilot Music Mate is a real-time AI accompanist. You select a genre (Highlife, Afrobeats, Jazz, Reggae, Blues, Amapiano, Hip-Hop, Gospel), step onto the Live Stage, and start singing or playing.
JamPilot:
- Listens — captures your audio via the Web Audio API and extracts spectral features (peak frequency, spectral centroid, amplitude, zero-crossing rate)
- Reasons — sends features to Gemini 3, which applies music theory to detect your key, tempo, and mood
- Plays along — Tone.js generates genre-authentic accompaniment with proper chord voicings, rhythm patterns, and swing — adapting in real-time as you shift keys or change feel
Beyond live jamming, you can upload a reference track and Gemini will clone its instruments, swing, and style to shape your accompaniment. Every session is recorded (raw vocal + mixed), saved to your browser, and available for playback and download from the My Recordings tab.
How we built it
Single-file web app. Zero dependencies beyond Tone.js and the Gemini API.
The core loop:
$$\text{Mic} \xrightarrow{\text{FFT}} \text{Spectral Features} \xrightarrow{\text{Gemini 3}} \text{Key + Mood} \xrightarrow{\text{Tone.js}} \text{Live Band}$$
- Gemini 3 API (
gemini-2.5-flash,thinkingBudget: 0) — musical key detection with structured JSON output, genre-aware reasoning, and audio file analysis for sound cloning - Tone.js — FM synths, membrane synths, noise synths generating guitar, bass, piano, drums, and full band across 8 genres with hand-crafted rhythm patterns and chord progressions
- Web Audio API — real-time FFT analysis +
MediaRecorderfor dual-track recording (vocal + mixed) - IndexedDB — persistent storage for session recordings; localStorage for API key
- Smart API throttling — adaptive polling (8s → 30s), audio-change detection, 4-model fallback chain (
2.5-flash→2.0-flash→2.0-flash-lite→1.5-flash), exponential backoff on 429s
Challenges we ran into
The JSON wall. gemini-2.5-flash is a thinking model — it kept returning "Here is the JSON..." instead of clean JSON, breaking our pipeline. Four iterations later, we discovered thinkingConfig: { thinkingBudget: 0 } disables the thinking phase and unlocks reliable response_mime_type: 'application/json'. This single fix changed everything.
Free tier rate limits. A naive 4-second polling loop burned through quotas in minutes. We engineered a smart throttling system: skip calls when audio hasn't changed ($\Delta f_{\text{peak}} < 15\%$), slow polling when the key stabilizes, and cascade across 4 models automatically. Reduced API calls by ~70%.
Mixed recording. Merging microphone input and Tone.js synthesizer output into one downloadable track required wiring two separate audio graphs through createMediaStreamDestination — not straightforward, but essential for the user experience.
Accomplishments that we're proud of
- It actually jams. The moment Gemini detects your key and the band kicks in — hearing AI-generated Highlife guitar respond to your voice in real-time — is genuinely magical.
- 8 culturally authentic genres. Not generic Western chord progressions. Hand-crafted profiles for Highlife, Afrobeats, Amapiano, Gospel, Jazz, Reggae, Blues, and Hip-Hop with genre-specific swing, voicings, and rhythmic feel.
- Simple setup. Opens in any browser, insert your API key and it works!
- Sound cloning. Upload a reference track, Gemini analyzes it, and your accompaniment adapts — bridging the gap between "AI backing track" and "AI that understands your sound."
- Full recording pipeline. Walk away from every session with a downloadable vocal track and a mixed track. Persistent history across sessions.
What we learned
- Thinking models need
thinkingBudget: 0for structured output. This isn't documented prominently but is critical for any app that needs reliable JSON fromgemini-2.5-flash. - Rate limiting is an architecture problem, not a retry problem. Smart throttling + model fallback chains > simple exponential backoff.
- Genre authenticity requires domain research. Highlife swing ($0.2$) feels completely different from Jazz swing ($0.6$). The math matters: $\text{swing} \in [0, 1]$ directly controls Tone.js groove.
- The Web Audio API is underrated. Computing spectral centroid from raw FFT bins ($\bar{f} = \frac{\sum A_i \cdot f_i}{\sum A_i}$) gives Gemini enough signal to reason about musical key — even from a laptop mic.
What's next for JamPilot Music Mate
- Gemini Live API integration — stream audio directly to Gemini for true real-time (<1s) key detection and mid-phrase adaptation
- Multi-player Live Stage — two musicians in different locations jam together with JamPilot filling in the missing instruments via WebRTC
- AI chord suggestions — Gemini proposes creative modulations mid-session ("try a $\text{Dm7} \rightarrow \text{G7}$ turnaround here")
- Sample-based instruments — replace synths with recorded African percussion, Highlife guitar tones, and church organ for studio-quality output
- Mobile app — React Native build for buskers, street performers, and music students across Africa who rehearse on their phones
- Music education mode — Gemini explains what key you're in, what chords it's playing and why, turning every jam into a theory lesson
Log in or sign up for Devpost to join the conversation.