The Inspiration

The reason I came up with the idea is because existing tools often fail when a video contains multiple songs or significant background noise. Most apps try to listen to the entire audio stream, which leads to confusion or no results. With Melodri, if you have a video with a song buried under other sounds, you can still find the music. This project allows you to input the precise part where the song is clearer than other parts, giving the AI a focused window to analyze. I wanted to build a tool that gives the user control over exactly what the system should identify.

What it does

  • Precision ID: Identifies music within video/audio files by targeting specific timestamps.
  • Noise Filtering: Solves the issue of background noise or multiple songs by focusing on a user-defined 30-second window.
  • Deep Links: Delivers track data with direct access to Spotify, YouTube, and Apple Music.

How we built it

  • Frontend: React and Tailwind CSS for a high-contrast, professional UI.
  • Backend: Orchestrated via n8n connecting a local worker to the Gemini 3 multimodal engine.
  • Processing: FFmpeg integration for millisecond-accurate "fast-seeking" and audio isolation.

Challenges we ran into

  • Resource Limits: Optimized FFmpeg commands to run on limited RAM instances without crashing (OOM prevention).
  • Type Safety: Unified HTMLAudioElement and HTMLVideoElement logic in TypeScript for a seamless dual-format experience.
  • Latency: Managed the data handshake between the browser and cloud workers to keep the UI responsive during heavy processing.

Accomplishments that we're proud of

  • Stable Pipeline: A fully functional bridge from local file upload to AI reasoning and back.
  • High Accuracy: The system successfully isolates and identifies tracks even in complex, noisy environments.

What we learned

  • Media Manipulation: Deepened knowledge of Blob URLs and browser memory management.
  • Multimodal Prompting: Learned how to guide Gemini 3 to focus specifically on auditory patterns and lyrical reasoning.
  • Pipeline Logic: Gained experience in managing asynchronous webhooks and fail-safe error handling.

What's next for Melodri

  • Social Integration: Direct song identification from social media URLs without requiring file uploads.
  • Auto-Detection: Implementing batch processing to automatically tag every song in a long-form video.
  • Enriched Data: Adding live tour dates and synced lyrics to the result cards.

Built With

Share this project:

Updates