Inspiration
Always having a dry palylist for situations. We wanted to answer a simple question: what does a moment feel like? Not what it looks like or sounds like — but the full sensory vibe. We were inspired by how people share short video clips to capture a mood (golden hour walks, chaotic concert footage, cozy rainy days) but there's no tool that actually translates that feeling into something tangible — a color palette, a playlist, a visual moodboard. We wanted to build the bridge between a raw video clip and its aesthetic identity. What it does
ViBerry takes a short video clip (5-30 seconds) and translates its "vibe" into three creative outputs:
- A color palette + font pairing — 6 harmonized colors and a typographic style that match the mood
- A Spotify playlist — 6-8 curated tracks that feel like the video sounds
- An AI-generated moodboard — 4 images that extend the visual world of the clip
The results page dynamically re-themes itself to match the detected vibe — background tint, accent colors, typography, and all — so the experience of viewing your results feels like the video you uploaded.
How we built it
- Next.js + TypeScript for the full-stack app with API routes
- ffmpeg on the server to extract key frames and audio from uploaded videos
- Google Gemini Flash as the multimodal AI brain — we feed it frames + audio in a single call and get back a structured "VibeProfile" with mood analysis, color data, song suggestions, and image prompts
- A two-agent architecture with a structured handoff contract: the Video Agent produces the VibeProfile, then a Spotify Agent (powered by Gemini function calling) takes the mood/energy/era data and curates a real playlist by searching the Spotify API
- Imagen 3 for generating moodboard images from the vibe prompts
- Dynamic CSS theming using custom properties set at runtime from the palette data, with Google Fonts loaded on-the-fly
Challenges we ran into
- Spotify's Recommendations API was deprecated (Nov 2024) — we had to completely rethink our music curation approach. Instead of passing numeric audio features to an API, we built an agentic loop where Gemini acts as a music curator, searching Spotify iteratively and building a cohesive playlist.
- Keeping the handoff contract clean between agents was harder than expected. We went through multiple schema iterations before landing on a flat VibeProfile with typed Pick<> subsets for each downstream agent.
- Video processing on a serverless-style environment — managing temp files, ffmpeg extraction, and cleanup while keeping response times reasonable.
- Hydration mismatches from dynamic fonts and theming that differ between server and client renders.
Accomplishments that we're proud of
- The results page genuinely transforms based on the video — it's not just data displayed on a page, it's an experience that shifts to match the mood
- The two-agent handoff pattern is clean and extensible — each agent is a pure function that can be tested, swapped, or upgraded independently
- Gemini reliably produces structured, creative output from raw video in a single multimodal call — no prompt chains or retries needed
- The whole thing works end-to-end: upload a video, wait, and get a fully themed page with real Spotify tracks and generated artwork
What we learned
- Multimodal AI is shockingly good at "feeling" a video — Gemini picks up on lighting, movement, color temperature, audio tone, and translates it into coherent creative direction
- Agentic patterns (function calling loops) are worth the complexity when you need an AI to make judgment calls, not just return data
- Designing type-safe handoff contracts between AI agents is a real engineering problem — treating agent boundaries like API boundaries (with schemas, validation, and typed interfaces) keeps things from falling apart
- Dynamic theming is powerful but tricky — small color math decisions (like how much to lighten a palette color for a background) have outsized impact on the feel
What's next for ViBerry
- Live camera mode — point your phone camera at a scene and get a real-time vibe read
- Shareable vibe cards — generate a static image/link you can share on social media
- Vibe history — save past translations and compare how your vibe shifts over time
- Audio-only mode — drop in a song or voice memo instead of a video
- Collaborative vibes — multiple people upload clips from the same moment, merge them into one unified vibe profile
Built With
- architecture
- gemini
- multi-agentic
- node.js
- typescript
Log in or sign up for Devpost to join the conversation.