Inspiration
Walking through Paris on a rainy night, I felt an emotion that words couldn't describe and photos couldn't fully capture. I realized that a visual memory is incomplete without its emotional counterpart: sound.
8,200 kilometers away in Beijing, my teammate shared the same feeling. After a long workday, she struggled to find an existing song that matched her specific mood in that moment. Her headphones were ready, but the right melody was missing.
Melody Snap was born from a simple but powerful desire: to use Gemini 3 to interpret the vibe of our reality and generate a music in real-time, allowing us to share not just what we see, but how we feel, expressed through the language of sound.
What it does
MelodySnap — an image-based AI music generation mobile app that turns any photo into a one-of-a-kind song. Powered by Gemini 3 Flash and Google Lyria RealTime, you can simply take a photo, and Gemini will transform the visual mood into a unique song only tailored for this moment.
Beyond just listening, you can merge the music and photo into a video card. Send it as a digital gift with a greeting (eg. Happy Birthday) to share with your loved ones.
How we built it
Our team split into two roles: one person handled product design, frontend, and UI; the other person built the backend pipeline.
Frontend: Built with React Native/Expo, utilizing
expo-camerafor capture andreact-native-reanimatedfor the immersive playback UI.Backend: The backend is a FastAPI service that orchestrates a two-stage AI pipeline — Gemini 3 Flash analyzes the image, then its structured output drives Google Lyria RealTime to compose the music.
Image Analyzing by Gemini 3 Flash: Firstly, We designed a detailed system prompt that casts Gemini 3 Flash as a "professional music composer", analyzing each photo's color, lighting, and emotion through a structured Chain-of-Thought (CoT) pipeline. Then, Gemini 3 Flash outputs a strict JSON — style, BPM, instruments, tags, optional lyrics — which feeds directly into Google Lyria RealTime for real-time music generation.
Music Generation by Google Lyria RealTime: The core design workflow involves receiving structured JSON output from Gemini 3 Flash, parsing it into parameters compatible with Google Lyria RealTime, generating real-time streaming audio via WebSocket, and finally assembling the data into a standard WAV file.
Challenges we ran into
Music generation latency per request was our biggest challenge. Our initial integration with a third-party music generation API took about 2 minutes per request to generate music — unacceptable for an app built around capturing real-time emotion.
Accomplishments that we're proud of
We brought the total music generation time per request from 2 minutes down to under 40 seconds by integration with Google Lyria RealTime API, image preprocessing, redis cache, streaming audio processing. Moreover, A complete pipeline is functional end-to-end, with polished UI animations and real-time progress feedback throughout. One teammate with zero coding background designed UI and built the entire frontend code base with AI-IDE.
What we learned
What surprised us most is that we can now create mobile app that genuinely touch people emotionally by leveraging Gemini3. Multimodal capabilities of Gemini 3 opened possibilities we hadn't imagined — turning a simple photo into music that captures a feeling is something that didn't exist before. And perhaps the most meaningful takeaway: one of our teammates had limited coding experience, yet was able to bring a long-held idea to life using Gemini 3 and other AI tools. That's the real promise of this technology — it lets more people build things that make the world a little warmer.
What's next for Melody Snap
Context-aware generation: Combine users' real-time location and weather data with the photo to generate the music.
Global Community: build a global community feed where users discover and share music from around the world.
Built With
- fastapi
- gemini3
- lyria
- python
- reactive
Log in or sign up for Devpost to join the conversation.