Inspiration
Suraj is part of an acapella team, where arranging pieces is incredibly time consuming. Our captain last year spent 30 hours a week during the summer to arrange our current set. This could be rapidly sped up if there was a tool that auto creates basic chords and turns a lead audio into notes.
What it does
You hum or upload a recording of some melody, which gets transcribed to a MIDI format and creates a SATB arrangement for acapella and choir.
How we built it
We split the work into a Next.js frontend and a FastAPI backend. The browser records or uploads audio and talks to the API over HTTP. The backend stores each run in SQLite, converts audio when needed (often with FFmpeg), then runs transcription (NeuralNote when the CLI is available, with Basic Pitch as a fallback) to get a list of notes and times. From there we use Python + librosa/NumPy for tempo and audio helpers, rule-based code for key and chords and SATB-style voice assignment, and mido where we need proper MIDI handling. The user can review or edit notes in the UI, play a preview with Tone.js, download MIDI built with midi-writer-js, and get MusicXML for MuseScore. Optional ElevenLabs-based choir audio sits on top of that pipeline for a richer demo.
Challenges we ran into
Real humming is noisy. Models return extra notes, overlaps, and junk; we had to add cleanup and merging so the melody looks like one human line, not a burst of MIDI dust. Browser audio formats do not always match what transcoders expect, so FFmpeg and consistent WAV settings mattered. Wiring NeuralNote meant treating a native CLI as part of the product (paths, builds, failures) instead of only calling a Python package. Keeping one “note shape” everywhere—transcription, DB, frontend, export—avoided bugs but took discipline. Long steps (transcribe, harmonize) forced us to think in terms of status polling and background work instead of one quick request–response.
Accomplishments that we're proud of
We went from idea to end-to-end flow: hum → notes → arrangement → something you can open in notation software. We combined ML listening with music-theory-style logic so the output feels closer to a real arranger’s workflow than a generic “AI text” toy. The stack is understandable: a clear API, a real database for sessions, and a frontend that can play and export what the user hears. Building something that speaks to actual acapella time cost (your captain’s summer) makes the demo feel grounded, not random.
What we learned
The model is only the first step. Most of the “does it feel right?” work is post-processing and UX after transcription. Glue code (env, paths, fallbacks, same JSON shape everywhere) is as important as the clever algorithm. Audio + the web have sharp edges: permissions, formats, and waiting states need clear handling. Shipping beats perfect: a working path from upload to MusicXML teaches more than polishing one isolated piece.
What's next for Hum2Harmony
- Clearer setup: one-command or documented path for NeuralNote + FFmpeg so new users do not get stuck.
- README and copy aligned with NeuralNote-first and current frontend versions.
- Arrangement controls: user-chosen key, simpler vs richer voicing, maybe PDF export.
- Performance: faster passes or streaming updates so the wait feels shorter.
Longer term, human-in-the-loop tools arranger captains would actually trust for a first draft—not a full replacement for craft, but a serious head start on those 30-hour weeks.
Built With
- basic-pitch
- elevenlabs
- fastapi
- librosa
- mido
- neuralnote
- next.js
- numpy
- python
- react
- sqlite
- tailwind
- tone.js
- typescript
- uvicorn
Log in or sign up for Devpost to join the conversation.