Have you ever made a video for a school project and then spent hours trying to look for the perfect background music to go along with it? Well that's now a thing of the past with fuji, which quickly generates unique music that fits perfectly with any video you submit.

What it does

fuji takes video files that the user inputs, and by analyzing said video, generates unique background music to fit the pacing and mood of the inputted video's visuals.

How it works

Whenever a user uploads a video, fuji samples individual frames from the video at a specified time interval, and performs k-means clustering on the frames to identify dominant colors and extract valence and arousal values (data on image mood). This is combined with Microsoft Azure's Face API to consider emotions expressed by faces. Once that's complete, this data is processed through a restricted Boltzmann machine to generate a MIDI file representative of the video's visuals.

How we built it

The entire front-end of the application was built in Polymer (a JavaScript library from Google for creating custom web components) while the back-end was developed with Flask and Python, with the assistance of moviepy, scikit-learn, TensorFlow, OpenCV, and Microsoft Azure's Face APIs.

Challenges we ran into

Because one of us had never used Flask and the other had never used Polymer, we initially struggled to connect the front-end and back-end of the application. However, with some time and effort, we eventually managed to make these two halves communicate seamlessly.

We also got stuck on music generation for a while, and had a lot of trouble with running tensorflow from within flask, but we were ultimately able to resolve it.

Accomplishments that we're proud of

We're primarily proud of our use of computer vision, machine learning, and music generation to create music from scratch just based on a video. Having each of these moving parts working in tandem took quit a bit of effort and we're glad we got it to work. Beyond that, we're also proud of the sleek but interactive UI design that's been optimized for user experience.

What we learned

Not only did we learn more about both Polymer and Flask, we also learned a lot about computer vision and machine learning based music generation.

What's next for fuji

While our music generation is currently functioning, we think that we can improve our process by either taking in more sophisticated forms of frame analysis or training our RBM on larger datasets with more variety. We'd also like to expand the application to accept other kinds of video files besides mp4s.

Share this project: