ViBerry

Inspiration

Always having a dry palylist for situations. We wanted to answer a simple question: what does a moment feel like? Not what it looks like or sounds like — but the full sensory vibe. We were inspired by how people share short video clips to capture a mood (golden hour walks, chaotic concert footage, cozy rainy days) but there's no tool that actually translates that feeling into something tangible — a color palette, a playlist, a visual moodboard. We wanted to build the bridge between a raw video clip and its aesthetic identity. What it does

ViBerry takes a short video clip (5-30 seconds) and translates its "vibe" into three creative outputs:

A color palette + font pairing — 6 harmonized colors and a typographic style that match the mood
A Spotify playlist — 6-8 curated tracks that feel like the video sounds
An AI-generated moodboard — 4 images that extend the visual world of the clip

The results page dynamically re-themes itself to match the detected vibe — background tint, accent colors, typography, and all — so the experience of viewing your results feels like the video you uploaded.

How we built it

Next.js + TypeScript for the full-stack app with API routes
ffmpeg on the server to extract key frames and audio from uploaded videos
Google Gemini Flash as the multimodal AI brain — we feed it frames + audio in a single call and get back a structured "VibeProfile" with mood analysis, color data, song suggestions, and image prompts
A two-agent architecture with a structured handoff contract: the Video Agent produces the VibeProfile, then a Spotify Agent (powered by Gemini function calling) takes the mood/energy/era data and curates a real playlist by searching the Spotify API
Imagen 3 for generating moodboard images from the vibe prompts
Dynamic CSS theming using custom properties set at runtime from the palette data, with Google Fonts loaded on-the-fly

Challenges we ran into

Spotify's Recommendations API was deprecated (Nov 2024) — we had to completely rethink our music curation approach. Instead of passing numeric audio features to an API, we built an agentic loop where Gemini acts as a music curator, searching Spotify iteratively and building a cohesive playlist.
Keeping the handoff contract clean between agents was harder than expected. We went through multiple schema iterations before landing on a flat VibeProfile with typed Pick<> subsets for each downstream agent.
Video processing on a serverless-style environment — managing temp files, ffmpeg extraction, and cleanup while keeping response times reasonable.
Hydration mismatches from dynamic fonts and theming that differ between server and client renders.

Accomplishments that we're proud of

The results page genuinely transforms based on the video — it's not just data displayed on a page, it's an experience that shifts to match the mood
The two-agent handoff pattern is clean and extensible — each agent is a pure function that can be tested, swapped, or upgraded independently
Gemini reliably produces structured, creative output from raw video in a single multimodal call — no prompt chains or retries needed
The whole thing works end-to-end: upload a video, wait, and get a fully themed page with real Spotify tracks and generated artwork

What we learned

Multimodal AI is shockingly good at "feeling" a video — Gemini picks up on lighting, movement, color temperature, audio tone, and translates it into coherent creative direction
Agentic patterns (function calling loops) are worth the complexity when you need an AI to make judgment calls, not just return data
Designing type-safe handoff contracts between AI agents is a real engineering problem — treating agent boundaries like API boundaries (with schemas, validation, and typed interfaces) keeps things from falling apart
Dynamic theming is powerful but tricky — small color math decisions (like how much to lighten a palette color for a background) have outsized impact on the feel

What's next for ViBerry

Live camera mode — point your phone camera at a scene and get a real-time vibe read
Shareable vibe cards — generate a static image/link you can share on social media
Vibe history — save past translations and compare how your vibe shifts over time
Audio-only mode — drop in a song or voice memo instead of a video
Collaborative vibes — multiple people upload clips from the same moment, merge them into one unified vibe profile

Built With

architecture
gemini
multi-agentic
node.js
typescript

Updates

PushpalPatil Patil started this project — Mar 28, 2026 07:21 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.