First time main view
Main view + query guide
Telling your mood in the chat while playing music
Query response successfully while the music still goes on
Diagram 1
Diagram 2

Vibe.FM - Project Story

Inspiration

There are moments we can't explain and feelings we can't quite put into words. But music always seems to understand.

We've all been there—scrolling endlessly through playlists, unable to find music that matches our current emotional state. We wanted to build a bridge to that understanding. Not just another recommendation algorithm, but an intelligent system that could interpret the nuance of human emotions and translate them into the perfect soundtrack.

The idea came from a simple question: What if you could just tell an AI how you're feeling, and it would understand you?

We saw an opportunity to combine Google's cutting-edge Agent Development Kit (ADK) with the vast world of music streaming to create something truly special—a multi-agent system that doesn't just recommend music, but understands you.

What it does

Vibe.FM is an intelligent music agent that translates your emotional state into perfectly curated playlists.

Here's the magic:

You describe your vibe through a beautiful, interactive interface with dynamic color-shifting backgrounds. Just type how you're feeling—"a rainy afternoon," "energetic workout," "something tender that feels like starting over."
Our multi-agent system springs into action:
- The OrchestratorAgent analyzes your mood using Gemini's natural language understanding
- The ScoutAgent discovers new music from a database of 8M+ songs
- The PersonalizedAgent curates tracks from your own Spotify history
- The MergerAgent intelligently balances both lists for the perfect mix
In seconds, your playlist is ready and automatically added to your Spotify queue. You get full playback controls right in the app—play, pause, skip, all integrated seamlessly.

The result? A unique soundtrack that blends fresh discovery with the comfort of your personal favorites. Not just a playlist—a perfect capture of your current moment.

How we built it

Architecture Overview

We built a decoupled, cloud-native application with a clear separation of concerns:

Frontend (Next.js + React)

Interactive UI with Framer Motion animations
Dynamic color-shifting backgrounds that respond to user input
Real-time communication with the backend via Axios
Tailwind CSS for responsive, beautiful design

Backend (FastAPI + Python)

RESTful API with async support for high performance
Secure OAuth2 authentication flow with Spotify
Session-based state management for user profiles
Server-side caching to optimize repeated requests

Multi-Agent System (Google ADK + Gemini)

This is where the magic happens. We implemented a true multi-agent architecture:

# Simplified flow
OrchestratorAgent
  ├─> ScoutAgent (searches 8M+ songs in DuckDB)
  ├─> PersonalizedAgent (queries user's Spotify data)
  └─> MergerAgent (balances and validates final playlist)

Each agent is a specialist:

OrchestratorAgent: Uses Gemini to interpret natural language mood descriptions and coordinate the workflow
ScoutAgent: Performs lightning-fast searches across millions of tracks using DuckDB's analytics capabilities
PersonalizedAgent: Accesses the user's Spotify data (top tracks, artists, playlists) to find familiar favorites
MergerAgent: Intelligently combines both lists, balancing discovery with comfort

Data Layer

DuckDB: High-performance analytics database for 8M+ songs with audio features
Spotify API: Real-time data fetching and playback control via Spotipy
User-specific tables: We create dedicated tables for each user's liked songs on first use

Deployment

Google Cloud Run: Serverless, auto-scaling deployment
Docker: Containerized for consistency across environments
Stateless design: Each request is independent, perfect for serverless

Key Technical Decisions

Why DuckDB? We needed sub-second queries on millions of songs. DuckDB's columnar storage and analytics optimization made it perfect for our use case.
Why multi-agent? A single agent couldn't balance discovery vs. personalization effectively. By specializing agents, we achieve better results and can optimize each independently.
Why Cloud Run? Auto-scaling, pay-per-use, and fast cold starts made it ideal for a hackathon project that could scale to real users.

Challenges we ran into

1. Agent Coordination Complexity

Getting three agents to work together seamlessly was harder than expected. Our first attempts had:

Race conditions between parallel agents
Inconsistent output formats
Difficulty balancing "new" vs "familiar" music

Solution: We implemented a clear orchestration pattern with structured tool outputs and a dedicated MergerAgent to handle the final decision-making.

2. Spotify Rate Limits

Fetching all spotify songs was a no go, thats why we looked for some public catalogs in Kaggle. This are not up to date, but it's nice enough

3. Prompt Engineering for Mood Understanding

Getting Gemini to consistently understand nuanced emotional descriptions was tricky.

Solution: Extensive prompt iteration with examples and structured outputs. We also added context from the user's music history to improve accuracy.

Accomplishments that we're proud of

🎯 True Multi-Agent Architecture: We didn't just call it multi-agent—we built a real orchestrated system with specialized agents working in parallel.

🎨 Beautiful UX: The animated, responsive interface makes the AI's work feel magical rather than mechanical.

🔍 8M+ Song Database: We integrated a massive dataset and made it queryable in real-time.

🎵 Perfect Balance: Our agents successfully blend discovery with personalization—users get new music they love, not just what they already know.

☁️ Cloud-Native Design: Fully serverless, auto-scaling, and production-ready on Google Cloud Run.

What we learned

Technical Skills

Google ADK: Deep dive into building production multi-agent systems
Gemini API: Effective prompt engineering for natural language understanding
DuckDB: High-performance analytics on large datasets
Cloud Run: Serverless deployment patterns and optimization
Parallel Processing: Coordinating multiple agents for optimal performance

Product Insights

Users want context-aware recommendations, not just algorithmic ones
The balance between discovery and familiarity is critical for music satisfaction
Natural language interfaces for music selection feel more intuitive than filters and toggles
Visual feedback during AI processing builds trust and engagement

Team Collaboration

Clear API contracts between frontend and backend enabled parallel development
Agent specialization made testing and iteration much easier
Docker made our local environments consistent and deployment smooth

What's next for Vibe.FM

Short-term (Next Month)

Enhanced Multi-Agent Recommendation System

We're evolving from our current 3-agent system to a sophisticated 7-agent architecture for dramatically better playlists:

A1. Query Understanding Agent (NLU): Convert natural language into structured intent with emotion extraction, genre detection, and constraint parsing
A2. Emotion-to-Audio Mapper: Translate emotions into precise Spotify audio feature ranges (valence, energy, tempo, danceability)
A3. Candidate Retriever: Enhanced retrieval with 200-500 candidates using seeds, vector embeddings, and multi-source fetching
A4. Reranker/Set Builder: Optimize track selection with hard constraints (explicit content, region availability) and soft optimization (diversity, novelty balance)
A5. Sequencer Agent: Create natural flow with energy curves, BPM transitions, and strategic placement for maximum emotional impact
A6. Critic/Validator: Final QA pass ensuring regional availability, no duplicates, and intent consistency
A7. Learning/Profile Agent: Capture user feedback (skips, likes, replays) to personalize future recommendations

This architecture will give us:

Better emotion understanding: "nostalgic but hopeful" accurately mapped to audio features
Smoother flow: Playlists that build, peak, and resolve like a curated mixtape
Smarter diversity: MMR (Maximal Marginal Relevance) for optimal novelty vs. familiarity
Regional intelligence: Proper market filtering and explicit content handling

Additional Features

Voice Input: Speak your mood instead of typing
Playlist History: Save and revisit your past moods and their soundtracks
Collaborative Sessions: Let friends contribute to mood-based playlists

Medium-term (3-6 Months)

Contextual Intelligence

Activity Detection: Integrate with fitness trackers, calendars, weather APIs for automatic mood inference
Temporal Patterns: Learn your emotional rhythms throughout the day/week
Social Features: Share your vibe and discover friends' moods

Multi-Platform Expansion

Apple Music integration
YouTube Music support
Cross-platform synchronization

Advanced Analytics

Emotional music pattern visualization
Mood journey tracking over time
Personalized insights dashboard

Long-term Vision

Predictive & Adaptive

Anticipatory Playlists: Pre-generate playlists based on time, context, and behavioral patterns
Live Mood Mixing: Real-time playlist adjustment as your emotional state shifts during listening
AI Mood Coach: Suggest music to help you reach desired emotional states (e.g., "music to help you focus" or "transition from stress to calm")

Wellness Integration

Mood Journaling: Combine music with emotional wellness tracking
Therapeutic Playlists: Collaborate with music therapists for evidence-based emotional support
Biometric Integration: Heart rate, stress levels, sleep quality to inform recommendations

Enhanced Personalization

Few-shot Learning: Adapt to individual music taste with minimal feedback
Explainable AI: Show users exactly why each track was selected
Cultural Context: Region-specific emotional-music mappings for global accuracy

We believe Vibe.FM is just the beginning. Music has always been humanity's emotional language—we're building the most sophisticated translator, one agent at a time.