We wanted to see if we could effectively parse useful information from an audio file and make something fun out of it while learning more about digital signal processing along the way.
What it does
It replaces the melody at any given time during the song with an audio clip of a screaming goat at the correct pitch. The resulting audio can either be played alone or alongside an instrumental version of the song. Goatify can be accessed through http://goatifymysong.com, where users can input their own songs with instrumental versions for goatification.
How we built it
We used many python data science and audio libraries to analyze the input sound data and create the output data. More specifically, we used Essentia to parse the melody using a Short-Time Fourier Transform with a Blackman-Harris window function. We then used Librosa to pitch-shift the goat to the appropriate pitch, memoizing each shifted goat clip as we encountered them. We then used ffmpeg for various audio processing tasks including converting file formats and overlaying the instrumental background.
We then made a Flask web server for our website. We used it to serve an html page that used bootstrap and jQuery. When a user uploads their files, it uses ajax to run the goatification on the backend, and then gives the output audio blob back to be dynamically played in an Audio element, right in the page. This can then be downloaded.
Challenges we ran into
Adjusting all of the necessary parameters in the STFT was a challenge, because a very specific set of parameters was needed for the songs to be recognizable, so we had to change the hop size, pitch continuity, and voice guessing parameters of the STFT many times, as well as how our algorithm handled zeroes and repeated pitches, among other things.
Accomplishments that we're proud of
Getting songs to sound recognizable after about six hours of only getting jumbled beeps. Eliminating most of the random beeps after a long period of tweaking our code.
What we learned
Determining the melody from an audio file is highly nontrivial, and even with songs with very clear melodies, separating it is incredibly difficult Windowing functions are very important when doing a Fourier transform on parts of a signal, because otherwise the sharp edges will introduce unwanted frequencies into your data. Listening to goats scream at various pitches for ten hours straight will drive you insane
What's next for Goatify
More animal noises/other noises Using multiple randomized noises in the same song