Have you ever made a video for a school project and then spent hours trying to look for the perfect background music to go along with it? Well that's now a thing of the past with fuji, which quickly generates unique music that fits perfectly with any video you submit.
What it does
fuji takes video files that the user inputs, and by analyzing said video, generates unique background music to fit the pacing and mood of the inputted video's visuals.
How it works
Whenever a user uploads a video, fuji samples individual frames from the video at a specified time interval, and performs k-means clustering on the frames to identify dominant colors and extract valence and arousal values (data on image mood). This is combined with Microsoft Azure's Face API to consider emotions expressed by faces. Once that's complete, this data is processed through a restricted Boltzmann machine to generate a MIDI file representative of the video's visuals.
How we built it
Challenges we ran into
Because one of us had never used Flask and the other had never used Polymer, we initially struggled to connect the front-end and back-end of the application. However, with some time and effort, we eventually managed to make these two halves communicate seamlessly.
We also got stuck on music generation for a while, and had a lot of trouble with running tensorflow from within flask, but we were ultimately able to resolve it.
Accomplishments that we're proud of
We're primarily proud of our use of computer vision, machine learning, and music generation to create music from scratch just based on a video. Having each of these moving parts working in tandem took quit a bit of effort and we're glad we got it to work. Beyond that, we're also proud of the sleek but interactive UI design that's been optimized for user experience.
What we learned
Not only did we learn more about both Polymer and Flask, we also learned a lot about computer vision and machine learning based music generation.
What's next for fuji
While our music generation is currently functioning, we think that we can improve our process by either taking in more sophisticated forms of frame analysis or training our RBM on larger datasets with more variety. We'd also like to expand the application to accept other kinds of video files besides mp4s.