HumBox

Inspiration

Everyone has a melody stuck in their head. Almost no one has the tools to get it out. DAWs assume you already know what a MIDI clip is. Music theory assumes you already know what a ii-V-I is. We wanted to remove every layer between "I can hear it" and "I can play it back."

So we built HumBox around one question: what if the human is the artist and the computer is just the instrument? Not an AI co-writer. Not a generative model filling in your blanks. An instrument, one that happens to understand humming, tapping, and plain English.

What it does

HumBox is a 16-step sequencer with three input modalities, all of them human-driven:

🎤 Hum, Hum into your mic. Autocorrelation pitch detection converts your voice to notes in real time and drops them onto the grid.
👆 Tap, Tap a rhythm. HumBox extracts your BPM from the intervals and snaps your taps to a 16th-note grid.
💬 Type, Tell Claude what you want: "add a warm pad", "flute melody", "melancholic chord progression in A minor." Claude returns structured JSON that mutates the sequencer state.

Everything plays through 46 real instrument soundfonts, guitars, pianos, strings, winds, brass, synths, world instruments, so your humming can come out as a cello, a trumpet, or a steel drum.

The one rule: Claude never composes

This was the hard line we drew from day one. Claude is infrastructure, not a creative partner. It parses, it does not compose. It never suggests melodies, never fills in gaps, never auto-generates anything without a direct human trigger.

Every sound you hear traces back to a specific human input: a hum, a tap, or a typed instruction. This matters because the hackathon theme is Creative Flourishing, and we believe the difference between amplifying creativity and replacing it is exactly this: does the human still own the idea?

How we built it

Single-file vanilla HTML/CSS/JS. No framework, no bundler, no backend. Runs entirely in the browser.

Pitch detection: autocorrelation over a 2048-sample PCM buffer from getUserMedia, with an RMS silence gate and 60–1200 Hz bandpass. We explicitly avoided FFT peak detection, autocorrelation is far more accurate for monophonic pitched input like a human voice.
Sequencer: Web Audio API lookahead scheduler, ~120 ms ahead of audioCtx.currentTime, stepping through a 16-slot grid at (60 / bpm) / 4 seconds per step. Lookahead scheduling is the only way to get jitter-free timing in a browser, setTimeout alone drifts audibly.
Instruments: soundfont-player with the MusyngKite soundfont set, lazy-loaded per track so we don't block on 46 samples at startup.
BPM extraction: rolling mean of the last 8 tap intervals, clamped to 40–200 BPM, with tap timestamps snapped to the nearest 16th note.
Claude integration: claude-sonnet-4-20250514, called only on natural-language submission. The system prompt injects the current track state and the exact list of valid chord/instrument IDs so Claude physically cannot hallucinate a note that doesn't exist. Output is strict JSON; parsing is wrapped in try/catch.

Challenges we ran into

Humming is noisy. Our first pitch detector triggered on room tone, breath, and lip smacks. We tuned the RMS threshold and added a 600 ms debounce per unique note before anything enters the sequencer.
Browser audio scheduling is a minefield. The first version used setTimeout and drifted within a few bars. The fix was the classic two-clock pattern: JavaScript schedules events 100+ ms into the future using audioCtx.currentTime, and Web Audio handles the actual sample-accurate playback.
Keeping Claude on a leash. Early prompts let Claude invent instrument names and note names that didn't exist in our library. We solved it by dynamically injecting the live instrument manifest and track state into every system prompt, and making the output schema narrow enough that invalid responses fail loudly rather than silently.
Soundfont loading. 46 instruments × ~100 KB each is not a fast cold start. We load one default instrument at boot and fetch the rest on demand when a track picks them.

What we learned

Real-time pitch detection in the browser is shockingly achievable, if you pick the right algorithm. Autocorrelation in ~30 lines of JavaScript beats anything FFT-based for this use case.
Scoping AI down is often the interesting design choice. It would have been trivial to let Claude compose melodies. Refusing to do that is what makes the tool feel like yours.
Web Audio's scheduler model is worth the learning curve. Once you internalize the "schedule ahead, let the audio clock drive time" pattern, glitch-free sequencing becomes straightforward.

What's next for HumBox

Export to MIDI and WAV
Longer patterns (32 and 64 steps, multi-bar arrangements)
Save/load sessions
Polyphonic humming, detecting intervals and chords from voice, not just single notes
Collaborative mode: two people hum into the same box over WebRTC