Inspiration

We wanted to explore how AI could help musicians and hobbyists create acapella mashups effortlessly. Manually harmonizing songs and arranging SATB parts is time-consuming, so we imagined a tool that could do it automatically while still sounding musical and coherent.

What it does

Vox lets users combine two songs into a single acapella arrangement. It analyzes the melodies and chords, aligns the songs in key and timing, and generates full SATB harmonies using a Transformer model trained on professional choir data. Users can download the resulting mashup as MIDI or audio.

How we built it

We trained a Transformer-based harmonizer on the CocoChorales dataset to predict SATB parts from a lead melody and chord sequence. We built an ETL pipeline to extract melody and chord information from MIDI files, align song lengths, transpose keys, and prepare data for the model. The backend is built in Python with Flask, serving the mashup inference as an API.

Challenges we ran into

We trained a Transformer-based harmonizer on the CocoChorales dataset to predict SATB parts from a lead melody and chord sequence. We built an ETL pipeline to extract melody and chord information from MIDI files, align song lengths, transpose keys, and prepare data for the model. The backend is built in Python with Flask, serving the mashup inference as an API.

Accomplishments that we're proud of

Built an end-to-end pipeline from raw MIDI to fully harmonized SATB mashups. Successfully fused two songs in latent space to generate coherent, singable arrangements. Designed a backend capable of processing and returning mashups in real-time.

What we learned

Add support for more than two songs and larger ensemble arrangements. Implement similarity search to suggest mashup combinations automatically. Improve audio rendering quality and provide real-time playback for user experimentation.

What's next for Vox

Add support for more than two songs and larger ensemble arrangements. Implement similarity search to suggest mashup combinations automatically. Improve audio rendering quality and provide real-time playback for user experimentation.

Built With

Share this project:

Updates