Music is one of the few universal joys that can transcend barriers of language or distance. After all, bonds can be made or enriched by a mutual love for certain musical genres or artists. Armed with this knowledge and an enthusiasm for good tunes, our team set out to pioneer new ways to make music we love, while also enabling users to share new, unique melodies with friends -- all copyright- and access-fee free.
What it does
Our website allows you to generate music and send it to people! The user can select two .midi files from a set. Afterwards, they can enter a phone number and submit what will be the foundations of a new song. The selected .midi files are used as "inspiration" for the music generator, and the phone number provided gets called with a unique audio file for whoever picks up to enjoy!
How we built it
We utilized the Magenta Python Library -- a machine learning library built using TensorFlow; it expands artificial intelligence into artificial creativity with music and art. We used Magenta's MusicVAE Interpolate method, which takes two melodies as inputs. It then generates clips, which combine the qualities of these two inputs -- whether it is merging musical ideas or creating a smooth blend between them. Interpolate also uses a Variational Autoencoder (VAE). This is a mapping from MIDI to a compressed space where similar musical patterns are clustered together. Each of the input patterns is represented by a position on this map. Interpolate draws a line between these positions and returns clips along this line.
To train our model, we chose an 8MB set of classical piano music in the form of MIDI files. These files are broken down into note sequences, or small melody segments, which are then fed into a recurrent neural network (RNN). RNN's are a class of artificial neural network where connections between nodes form a directed graph along a sequence. RNNs use their internal memory to process sequences of inputs, and due to their pattern matching capabilities, they are also used for tasks such as handwriting and speech recognition.
The model is then fed two MIDI files to use for generating a new composition. It uses multiple different modes of interactive musical creation, including: random sampling from the prior distribution, interpolation between existing sequences, and manipulation of existing sequences via attribute vectors or a latent constraint model.
The resulting MIDI file is converted to a wav audio file using the midi2audio Python library, and an outbound call containing the audio file is sent to the phone number provided using the Twilio REST API.
Challenges we ran into
Our team ran into a lot of trouble with finding the right .midi files. Not only did we need specifically just .midi files (not .mp3s, or .wavs, or anything else), the music also couldn't change tempo or time signatures at any point. The .midi files also needed to be a single layer. Because of this, just getting enough data to train our model was troublesome.
We also had to shift goals halfway through our session -- our original objective was to train a model to blend two starkly different genres together, such as jazz and EDM. When we found that .midi files couldn't support this venture, we shifted into focusing on classical piano pieces. This genre had many .midi files available for use, and tempo changes or time signature shifts were no longer an obstacle. Even as we changed goals, we had to make the tough decision to kill a model learning process that had been running for nearly 12 hours -- we had lost time on this investment, and it was then up to us to use what data was salvageable from that process to make our music generator.
Accomplishments that we're proud of
How can we deny the most obvious thing we're happy about? The fact that our generator even works makes us proud. It can create cohesive, melodic songs that can't be immediately identified as made by a computer, and mobile numbers can be successfully dialed to bring tunes to everyone. Behind this achievement are the great strides that everyone on the team made -- three of us picked up Tensorflow after never having used it before, and one of us learned and built front-end after never having made a single website in HTML. We are also glad we could adapt to the changes that were necessary to our goals, enabling us to build a music generator that could be demoed before the end of the hackathon.
What we learned
Of course, we all learned new hard skills in coding and programming. But we also learned that you should never try to train a machine learning model on a local server... especially on a laptop. We are grateful that no hard drives or batteries exploded during this session.
What's next for music.U
We hope to see music.U reach the heights of our original goal -- to be able to blend two totally different genres, and create a catchy song out of their coalescence. Who wouldn't like to see the next radical joining of genres that follows in the footsteps of electroswing?