Music tracks are dynamic, often changing the tone and mood dramatically over the course of the song. An ominous intro may lead to a raucous chorus and finish with a harmonious, uplifting outro. When people think "more like this!" it may be a very specific section of music they're thinking of, rather than the song in its aggregate entirety. This project attempts to provide a more specific, music-context-sensitive search that recognizes the dynamic nature of music. Our goal is to help music fans, DJs, video creators, and more discover new music containing elements that pique their interest, fit their use case, or just make them smile.
How It Works
Music recommendation systems usually make you input an entire song. We don't.
Within any song, there's plenty of variation.
The intro sounds different from a verse, which sounds different from the chorus...
Maybe you like the way the bridge of a song sounds, but you don't really like the rest of it.
We created a search algorithm that lets you find sections of other songs that sound like a specific section that you like from a song.
Step 1: Break a song into it's component parts
We do this algorithmically (with Laplacian Segmentation) by determining when the rate of change in audio from one fraction of a second to the next is high.
These seperations often line up with verse/chorus seperations of songs, but by emulating how our ears process raw audio, we can develop a more nuanced model of the way our brains understand the degrees of similarity between segments of a song.
Step 2: Determine the properties of sound for each of these parts
- What notes appear during this audio segment?
- How percussive is this segment?
- How harmonic is it?
We collect data like this for each part of each song using a cocktail of Music Information Retrieval (MIR) algorithms.
Step 3: Group similar audio clips from different songs using machine learning
We feed an audio segment to our search algorithm (currently K-Nearest Neighbors) and it spits out segments of other songs in the music library that match it.
To maximize the effectiveness our search, we messed around with the extent that the algorithm relies on each audio property, giving more weight to properties that are more predictive of similarity.
The roadmap for taking our proof-of-concept into a full-fleged product will require:
During the hackathon, our analysis took place on our own computers, and was limited to the first 100 songs in the 7digital database.
By moving our computing to the cloud using AWS, we'll be able to analyze the entire 7digital catalog (and more!), allowing us to deliver better recommendations across an exponentially wider set of songs.
Steps 2 and 3 worked sufficiently well in our prototype on the 100 songs we analyzed, but with an influx of data, we'll need to make significant improvements to the way we model properties of audio as well as the way we search for similarities between the audio properties of different segments.
Alexa and Web UI
The search experience should integrate seamlessly with users' workflows.
We plan on integrating with the Amazon Alexa, as well as providing a web interface to make that happen.
We use an algorithm to segment songs into musically relevant parts and then analyze the segments individually using a cocktail of Music Information Retrieval (MIR) algorithms. A Machine Learning algorithm clusters the segments by similarity, and the results are stored in a database of slices. This database can then be querried for songs in a context-sensitive way.
We would like to improve the search system we developed in the following ways:
- Provide a UI (web/Alexa) to enable quick testing and improve demo-ability.
- Move the processing into the cloud (AWS) for massive parallelism.
- Improve analysis algorithm cocktail (algorithms and implementations) to improve search.
The Road We Took
We set out with the goal of creating a simple prototype to see if the algorithm we proposed provided useful information or not. The initial design was relatively complex, including the following components:
- Download the 7digital library into a private S3 bucket for analysis.
- Use the AWS Lambda service to run the analysis functions in parallel.
- Implement a music segmentation algorithm to identify segments using MIR.
- Use Librosa, Numpy, and Scikit to analyze and classify audio segments.
- Store the analysis results in a MongoDB or PostgreSQL database in AWS.
- Provide a local web server to both serve a UI to a browser (HTML) and access the AWS database for search queries.
- Add an Alexa skill capable of interfacing with the search engine.
With these in mind, we began breaking up work and distributing it to the team:
- Eric began by implementing the 7digital APIs to access the music.
- Alivia began exploring a solution for the AWS architecture.
- Paulo began working with the MIR andy Python tools to get Segmentation and Analysis working.
Bumps In the Road
Almost immediately we ran into issues with the 7digital APIs. Their use of OAuth 1.0 for authentication meant that third-party tools were difficult to find (most are on OAuth 2.0 now). We started using an implementation prepared by Cloudinary for the Hackathon to speed things up, but had to fix a few issues to make it work for our purposes.
It soon became clear that creating a system that would work entirely in AWS was beyond the scope of the 24hour Hackathon. As a result, we dropped it as a goal partway through the first day and rescaled the designs to work without it (instead of analyzing all 14k songs in the dataset, we would pick the first 100 to process locally). That said, Alivia did continue to pursue the architecture such that we would have a plan for implementation provided we could continue development.
A misconfigured alarm clock during a nap resulted in Paulo missing a few hours of development. To compensate, Eric switched from finishing a frontend UI implementation to finishing up the audio search implementation. While Eric was able to complete the implementation of the search, the UI (both web and Alexa) had to be dropped. This resulted in a UI-less demo during the pitch, wherein actual results of the search were shown.
What We Learned
We were excited when the first results came back as they showed an array of results across different sections of different songs. The prototype successfully showed that there is promise to a context-based music search system; that it works.
Further, everyone on the team learned more about MIR and music analysis, as well as the various AWS services available. We are excited by what we learned on this project and eager to dig deeper.
Special thanks to Capitol Music Group, 7digital, and Cloudinary for allowing us to use their music and APIs to implement the prototype. A big shoutout to the Amazon AWS team for their interest in our prototype and providing us with guidance with respect to the vast array of services available in AWS.