Inspiration
We wanted to explore how AI could help musicians and hobbyists create acapella mashups effortlessly. Manually harmonizing songs and arranging SATB parts is time-consuming, so we imagined a tool that could do it automatically while still sounding musical and coherent.
What it does
Vox lets users combine two songs into a single acapella arrangement. It analyzes the melodies and chords, aligns the songs in key and timing, and generates full SATB harmonies using a Transformer model trained on professional choir data. Users can download the resulting mashup as MIDI or audio.
How we built it
We trained a Transformer-based harmonizer on the CocoChorales dataset to predict SATB parts from a lead melody and chord sequence. We built an ETL pipeline to extract melody and chord information from MIDI files, align song lengths, transpose keys, and prepare data for the model. The backend is built in Python with Flask, serving the mashup inference as an API.
Challenges we ran into
We trained a Transformer-based harmonizer on the CocoChorales dataset to predict SATB parts from a lead melody and chord sequence. We built an ETL pipeline to extract melody and chord information from MIDI files, align song lengths, transpose keys, and prepare data for the model. The backend is built in Python with Flask, serving the mashup inference as an API.
Accomplishments that we're proud of
Built an end-to-end pipeline from raw MIDI to fully harmonized SATB mashups. Successfully fused two songs in latent space to generate coherent, singable arrangements. Designed a backend capable of processing and returning mashups in real-time.
What we learned
Add support for more than two songs and larger ensemble arrangements. Implement similarity search to suggest mashup combinations automatically. Improve audio rendering quality and provide real-time playback for user experimentation.
What's next for Vox
Add support for more than two songs and larger ensemble arrangements. Implement similarity search to suggest mashup combinations automatically. Improve audio rendering quality and provide real-time playback for user experimentation.
Log in or sign up for Devpost to join the conversation.