Inspiration

Always having a dry palylist for situations. We wanted to answer a simple question: what does a moment feel like? Not what it looks like or sounds like — but the full sensory vibe. We were inspired by how people share short video clips to capture a mood (golden hour walks, chaotic concert footage, cozy rainy days) but there's no tool that actually translates that feeling into something tangible — a color palette, a playlist, a visual moodboard. We wanted to build the bridge between a raw video clip and its aesthetic identity. What it does

ViBerry takes a short video clip (5-30 seconds) and translates its "vibe" into three creative outputs:

  • A color palette + font pairing — 6 harmonized colors and a typographic style that match the mood
  • A Spotify playlist — 6-8 curated tracks that feel like the video sounds
  • An AI-generated moodboard — 4 images that extend the visual world of the clip

The results page dynamically re-themes itself to match the detected vibe — background tint, accent colors, typography, and all — so the experience of viewing your results feels like the video you uploaded.

How we built it

  • Next.js + TypeScript for the full-stack app with API routes
  • ffmpeg on the server to extract key frames and audio from uploaded videos
  • Google Gemini Flash as the multimodal AI brain — we feed it frames + audio in a single call and get back a structured "VibeProfile" with mood analysis, color data, song suggestions, and image prompts
  • A two-agent architecture with a structured handoff contract: the Video Agent produces the VibeProfile, then a Spotify Agent (powered by Gemini function calling) takes the mood/energy/era data and curates a real playlist by searching the Spotify API
  • Imagen 3 for generating moodboard images from the vibe prompts
  • Dynamic CSS theming using custom properties set at runtime from the palette data, with Google Fonts loaded on-the-fly

Challenges we ran into

  • Spotify's Recommendations API was deprecated (Nov 2024) — we had to completely rethink our music curation approach. Instead of passing numeric audio features to an API, we built an agentic loop where Gemini acts as a music curator, searching Spotify iteratively and building a cohesive playlist.
  • Keeping the handoff contract clean between agents was harder than expected. We went through multiple schema iterations before landing on a flat VibeProfile with typed Pick<> subsets for each downstream agent.
  • Video processing on a serverless-style environment — managing temp files, ffmpeg extraction, and cleanup while keeping response times reasonable.
  • Hydration mismatches from dynamic fonts and theming that differ between server and client renders.

Accomplishments that we're proud of

  • The results page genuinely transforms based on the video — it's not just data displayed on a page, it's an experience that shifts to match the mood
  • The two-agent handoff pattern is clean and extensible — each agent is a pure function that can be tested, swapped, or upgraded independently
  • Gemini reliably produces structured, creative output from raw video in a single multimodal call — no prompt chains or retries needed
  • The whole thing works end-to-end: upload a video, wait, and get a fully themed page with real Spotify tracks and generated artwork

What we learned

  • Multimodal AI is shockingly good at "feeling" a video — Gemini picks up on lighting, movement, color temperature, audio tone, and translates it into coherent creative direction
  • Agentic patterns (function calling loops) are worth the complexity when you need an AI to make judgment calls, not just return data
  • Designing type-safe handoff contracts between AI agents is a real engineering problem — treating agent boundaries like API boundaries (with schemas, validation, and typed interfaces) keeps things from falling apart
  • Dynamic theming is powerful but tricky — small color math decisions (like how much to lighten a palette color for a background) have outsized impact on the feel

What's next for ViBerry

  • Live camera mode — point your phone camera at a scene and get a real-time vibe read
  • Shareable vibe cards — generate a static image/link you can share on social media
  • Vibe history — save past translations and compare how your vibe shifts over time
  • Audio-only mode — drop in a song or voice memo instead of a video
  • Collaborative vibes — multiple people upload clips from the same moment, merge them into one unified vibe profile

Built With

Share this project:

Updates