Inspiration

We built Continuum because every music producer knows the pain of "loop-itis"—getting stuck with a brilliant 8-bar loop but having no idea how to turn it into a full track. We wanted to build an AI collaborator that doesn't just generate random audio, but actually listens to your specific idea and helps you break through creative blocks by offering structurally sound, musically coherent next steps.

What it does

After inputting a song, Continuum will first analyze the song, finding its genre, key, tempo, and other details. It will then ask Gemini to create several different prompts to continue the song. Then, these prompts are sent to HeartMuLa, which uses both the text-based prompt and several seconds of context to generate a new continuation. Then, we use Modal to run multiple different instances of HeartMuLa with different prompts, creating several unique continuations. Lastly, Gemini reviews these different continuations and picks a winner!

How we built it

We utilized Modal for our serverless GPU backend, allowing us to spin up multiple parallel instances of HeartMuLa (running on A100s) to generate continuations simultaneously. For the audio analysis, we used librosa to extract deep features like RMS energy, spectral centroids, and even Harmonic-Percussive Source Separation (HPSS) to detect vocals. We integrated Gemini as the "brain" of the operation—acting first as a specialized prompt engineer to steer HeartMuLa, and finally as an autonomous critic to score and rank the resulting audio files. Claude and Gemini were also heavily utilized for framework generation and debugging.

Challenges we ran into

Getting the AI continuation to sound like a natural part of the original track was incredibly difficult. Initially, the transition seam between the original audio and the AI audio caused noticeable "double-beats" and phasing; we had to engineer a precise 1-second fade to fix the phrasing. Additionally, HeartMuLa has a strict 30-second context window. To allow users to upload full songs, we couldn't just feed it the whole file. We had to build a smart energy-based extraction algorithm that mathematically scans the track to find the loudest, most recognizable section (usually the chorus) to use as the perfect audio seed.

Accomplishments that we're proud of

We are incredibly proud of building a fully parallelized pipeline that slashes generation time. Running multiple heavy audio models simultaneously on Modal and seamlessly stitching the audio together on the server side was a massive win. We're also proud of the multi-agent design—having Gemini act as both the creative director (writing the prompts) and the executive producer (judging the final tracks) makes the system feel genuinely intelligent.

What we learned

We learned a ton about the intricate quirks of audio-conditioned models. Specifically, we discovered how hyperparameter tuning (like clamping the temperature and guidance scale) is absolute magic for forcing an AI to respect the original audio's genre rather than hallucinating wild new instruments. We also learned a lot about digital signal processing, particularly how to manipulate audio arrays and calculate crossfades purely in code.

What's next for Continuum

Next up, we want to integrate Continuum directly into Digital Audio Workstations (DAWs) as a VST plugin so producers can use it natively in their workflow. We also plan to add a feature that allows users to manually select which AI continuation they prefer, using that choice to fine-tune the model's future suggestions.

Built With

Share this project:

Updates