MoodSwing

architecture
featurei
featureii
featureiii
featureiv
usedwhile

Inspiration

We're all lovers of music, and we were all interested in building with CV. So, why not use CV to give us new music recs? Boom.

What it does

MoodSwing is a real-time AI DJ that uses CV to analyze your facial emotions and uses Gemini to recommend you a relevant song with reasoning. An ElevenLabs powered DJ even narrates in any voice style you choose! We have direct Spotify API integration, so MoodSwing can trigger playback on all of your devices, trigger another facial analysis at the end of the current song to queue more songs, and even add every recommendation to a personalized playlist so you can revisit them at any time.

How we built it

We combined multiple AI models, libraries, and APIs. YOLOv8 detects faces, and DeepFace identifies emotions. OpenCV then takes this data and overlays the info on the live webcam feed. Google Gemini generates awesome song recommendations with reasoning, which feeds into ElevenLabs to create natural, human-like DJ narration. Finally, we use the Spotify API (and the SpotiPy library) to enable playback on our devices, queue songs, and add to playlists.

Challenges we ran into

The auto-queuing was super hard, since each song has a variable song length. Our solution was to get the song length from the Spotify API, standardize the time of the ElevenLabs DJ, and ensure that the emotion analysis triggered exactly 16-23 seconds before the song ended so that the transition was seamless, exactly like how Spotify transitions work. Getting all of the API integrations in general to work with each other was also quite a challenge, since we were creating data pipelines directly between our CV system, Gemini, ElevenLabs, and Spotify to run all at once.

Accomplishments that we're proud of

We have a TON of cool features. We added a face isolation feature for groups! Basically, we wrote an algorithm that detects the largest face in the frame and analyzes it exclusively. We also added a queuing system that triggers an emotion analysis 16-23 seconds before the song is about to end and finds the next song to play seamelssly after the previous one. We're also really proud of the Spotify API integration, since that took a very long time. We were able to get playback working across all of our devices and even created a shared playlist between all of us that MoodSwing added songs to while we were working. One last cool thing was that we managed to pull the album cover art from Spotify and integrate it into the interface.

What we learned

Since we're beginners to Hackathons, we learned a lot about collaborative programming including how to use Git. We also learned how to integrate APIs to work with each other to create a fully functioning, multi-faceted app. We also learned that vibe-coding does not solve all of your problems.

What's next for MoodSwing

We want to add multi-face balancing for groups (take the average emotional values of a group), personalized learning to remember user preferences, and more nuanced emotion and genre options. We could take intensity of the emotion into account, for example, and add new genres like jazz and classical.