Inspiration
The spark for DJ Vibes came from the observation that the energy of a room is fluid, but background music is often static. We wanted to bridge the gap between human emotion and digital audio. By using AI as a "digital eye," we aimed to create a system that doesn't just play a playlist, but actually feels the room—adjusting the tempo, genre, and intensity of the music in real-time based on the collective facial expressions and body language of the crowd.
What it does
DJ Vibes is an automated, AI-powered DJ system that reads a room’s "vibe" and generates original music to match it. It uses a continuous feedback loop: a live video feed captures the audience, Gemini analyzes the visual data for sentiment and energy levels, and the Loudly API generates custom 30-second music clips based on those descriptions. The result is a living soundtrack that evolves alongside the party.
How we built it
We architected a real-time data pipeline using a mix of high-level orchestration and specialized APIs:
- Multi-modal Intelligence: We integrated the Gemini 2.5 Pro LLM to process video clips of the room and environment captured by a laptop webcam, turning visual energy into descriptive text prompts.
- Audio Generation: We used the Loudly API to transform those text prompts into high-quality, 30-second MP3 tracks.
- Orchestration: N8N serves as our central nervous system, handling the webhooks that trigger the video capture, processing the data through Gemini, and fetching the generated audio from Loudly.
- Frontend: A React-based interface provides the user with start/stop controls and a visual representation of the current "vibe" being detected.
Challenges we ran into
The biggest hurdle was the "live" aspect of the video. We initially explored using Vimeo and YouTube Live, but ran into technical roadblocks with webpage-only reading and 24-hour approval delays. We had to pivot quickly to a local webcam capture system using video clip capture every 30 seconds.
Additionally, automating the audio playback was tricky; while N8N could generate the music URLs, making them "auto-play" without manual intervention required us to rethink our local server setup and frontend synchronization.
Accomplishments that we're proud of
We are incredibly proud of successfully closing the loop between a visual signal and an auditory response. Seeing Gemini correctly identify a "high-energy, dancing crowd" and hearing Loudly immediately produce an upbeat electronic track felt like magic. We also managed to get the automated looping to trigger every minute, creating a truly hands-off DJ experience.
What we learned
This project taught us a lot about the latency challenges of multimodal AI. We learned how to optimize image data for faster API processing and the importance of having a "Plan B" (like local webcam capture) when third-party streaming platforms have restrictive barriers. We also gained deep experience in workflow automation using N8N to connect disparate APIs into a single cohesive product.
What's next for DJ Vibes
The next step is to move DJ Vibes from a "frame-by-frame" analysis to a true streaming data model to reduce latency between a "vibe shift" and the music change. We also plan to integrate the remaining sponsored tools, build out a more robust local audio server for seamless cross-fading between tracks, and potentially add a "request" feature where users can influence the AI's direction via chat.
Built With
- gemini
- loudly
- n8n
Log in or sign up for Devpost to join the conversation.