Presenting at Table 32!
We wanted to be able to run Google's Deep Dream Neural Network on songs without having to retrain the neural net on song files. So instead, our quest was born to build a conversion from song to picture so we could "listen" to various picture effects, including Deep Dream.
What it does
Our app allows users to upload songs (or any music file), see the image representation of it (or just upload an image), apply different transformations to the generated images, and then listen to how they sound when converted back.
How we built it
Our application is a fairly complicated stack. The front end is build with NodeJS, React, and MongoDB (among many others) to handle requests, uploading files, and the user interaction. Once an .mp3 has been uploaded, it is handed off to ffmpeg to convert it to a .wav, which is run through sox to convert to a .raw file. The .raw file can then be parsed into a .png (for lossless compression) using imagemagick, which can be opened, read, and edited by our python script (using PIL and other manual image transformations we wrote). Python then hands the edited .png file back to imagemagick to convert it back to a .raw file, from which we can convert back to .wav and then .mp3, and then return it to the user. Our full stack is the following: NodeJSExpressBabelES2015MongooseReactSocketIOWebpackAWSMongoDBPythonBashFFMPEGSoXImageMagickPILSciKitKerasDeepDreamISMB, also known as the NEBEMRSWAMPBFSIPSKDISMB Stack :)
Challenges we ran into
We ran into many initial challenges in the conversion between .mp3 and .png (and getting it to go back). There are many, many intricacies in getting the exact right bitrate, sampling rate, and dimensionality of the files to ensure that each of them maintain the integrity of the data perfectly. Additionally, wrapping our heads around conversions of pixel-level transformations being mapped to sample-level transformations proved very difficult, because they aren't 1:1!
Accomplishments that we're proud of
We're proud we were able to successfully convert a song into an image, run it through Google's Deep Dream Neural Network (and other image transformations), and then listen to what it did! We're proud that we now know what "red" sounds like, can listen to a neural network think, and can hear a literal blue-shifted song. We've done some of the coolest scientific dimensional analysis yet, and successfully learned how to convert seemingly unrelated pairs of values, such as converting RBG to seconds, pixels to samples, and more. We're excited we were able to stand up an efficient, NodeJS web app on top of our enormous stack that could function fluidly with all of our components.
What we learned
We learned a ton about song data, audio files, image files, and how they actually work under the hood. We learned how to deploy React, and how to integrate a full MEN stack with a Python audio stack.
What's next for Synesthesia
Adding new transformations, and giving the user additional cool options of what to do to their files.