Inspiration

We wanted to simplify music discovery so anyone can find songs effortlessly, whether they remember a tune, lyrics, or just a humming snippet. Music should be easy to find, no matter how vague your memory is.

What it does

SingN’Seek lets users search for songs via text, lyrics, or humming, delivering accurate results using semantic AI and Elasticsearch. It provides instant, relevant results for a seamless search experience.

How we built it

We built SingN'Seek as a modular multimodal retrieval system that combines scalable search, AI embeddings, and a clean interface.

We used Google Vertex AI for 768-dimensional text embeddings and query parsing using Gemini 2.5 Flash Lite. For audio, we used OpenMuQ (MuQ-large-msd-iter) to create 1024-dimensional audio embeddings that capture melody, rhythm, and tone. The search backbone is Elasticsearch, which performs hybrid search by combining BM25 keyword relevance with vector similarity for semantic accuracy. The Streamlit UI provides smooth interaction for both text and audio search.

Architecture Overview: User input flows through the Streamlit interface. Text and audio are processed through different embedding pipelines. Parsed queries and embeddings are sent to Elasticsearch, which combines BM25 and vector search results to return ranked matches.

Indexing Flow: Metadata from the dataset is parsed. Text fields are embedded using Vertex AI. Audio files are processed using MuQ to generate embeddings. All combined data is stored in Elasticsearch for scalable retrieval.

Search Flow: Text or audio queries are received from the user. Natural language queries are parsed by Gemini to extract filters like artist or genre. Text queries generate Vertex embeddings; audio queries generate MuQ embeddings. Both are sent to Elasticsearch for hybrid search and ranked scoring. Results are displayed instantly in the Streamlit UI.

Key System Highlights: Hybrid Retrieval: Blends lexical (BM25) and semantic (vector) similarity for robust ranking. Multimodal Indexing: Each song stores both text and audio vectors for flexible search. Noise Tolerance: Works with humming, partial clips, or noisy input. Scalable Design: Built to handle large datasets with horizontal scalability. Cloud Native: Uses managed services like Vertex AI and Elastic Cloud for reliability.

This architecture allows SingN'Seek to handle natural language, lyrics, and audio-based queries seamlessly in one unified system.

Challenges we ran into

Parsing vague user input and matching audio snippets to the correct song was difficult. Optimizing search relevance and performance across multimodal inputs was also tricky.

Accomplishments that we're proud of

Successfully implemented hybrid search combining keyword and vector embeddings and enabled humming-based song search with real-time, accurate results.

What we learned

We gained deep insights into multimodal search, semantic embeddings, and integrating AI models with Elasticsearch at scale.

What's next for SingN'Seek

Improve audio-to-text matching, expand language support, and refine UI and UX for a more seamless global music discovery experience.

Built With

Share this project:

Updates