Inspiration
The spark came from love, literally. I was born and raised in China, and my American boyfriend has been wanting to learn Chinese to connect with my parents, understand my jokes, and sing along to the C-Pop songs he loves. I searched everywhere for a Christmas gift that fits his needs, but I found that existing Chinese learning resources all provide a systematic and structured learning process. While this is perfect for people who are invested in learning the language from the very start, it is often limited and slow for people who are interested in Chinese culture or have particular use cases. I realized there was a gap for a tool that bridges cultural interest to language learning. So I built him a custom app, Slay Mandarin, using the Gemini ecosystem as his Christmas gift.
What it does
Slay Mandarin is an app powered by the Gemini ecosystem. It features four core pillars:
- Slay Lyrics C-Pop Lookup: Users can upload a screenshot of their music player (e.g., Spotify) to look up songs. The app returns the full lyrics line-by-line (in Chinese, Pinyin, and English translation) along with a "Vibe Analysis" section that explains the backstory of the song. Users can also press individual lines to hear and practice the pronunciation.
- Slay Live Speaking Partner: A live agent that acts as a low-latency speaking partner. It corrects pronunciation in real-time and helps users practice speaking in any contexts without the social pressure of a human tutor.
- Practice Test Infinite HSK Prep: The app generates infinite, adaptive practice tests for the HSK exam (Chinese Proficiency Test) from levels 1 - 6 on the fly. It provides results with detailed explanation of what the question is asking and why a specific answer is wrong.
- Slay Mandarin Dictionary: A look up tool for words, phrases, and short sentences. Beyond standard definitions, it specializes in slang and buzzwords that are trendy in the Chinese network, providing Pinyin, English equivalents, and example sentences.
Note: Search results from HSK practice tests and Dictionary can be saved in My Collection and reviewed as a list or flashcards. Discovered songs can also be saved to the Slay Lyrics library for easy access.
How we built it
I utilized the full Gemini ecosystem to power specific features:
- Gemini-3-pro-image-preview: Used for the 'Visual Search' feature to parse text from music player screenshots.
- Gemini-3-pro + Google Search Grounding: Used to verify song lyrics against real-world sources. This prevents hallucinations and ensures the lyrics are correct and complete.
- Gemini-2.5-flash-native-audio: Powers the Slay Live agent to provide a low-latency, emotionally resonant voice interaction.
- Gemini-3-flash: I forced the model to output structured JSON to generate dynamic HSK quizzes and handle application logic.
- Gemini-2.5-flash-preview-tts: Used for high-fidelity audio in word lookups, HSK listening exercises, and lyrics pronunciation.
I used Google AI Studio for rapid prototyping and prompt engineering, and Cursor to handle the deployment logic to Cloudflare as a PWA.
Challenges we ran into
- Prevent AI Hallucination: One major challenge was ensuring accuracy for song and lyric lookups. Early versions would make up songs and lyrics. I solved this by implementing Google Search Grounding to verify every line against the web and prompted the model to return complete and accurate lyrics.
- Balance Speed vs. Intelligence: Another challenge was managing the wait time for AI responses. This is particularly evident in HSK practice tests, song and lyrics lookups. For HSK practice tests, I used gemini-3-flash instead of gemini-3-pro. Since generating practice questions is a structured logic task rather than a creative one, Flash delivers instant results without sacrificing quality. For song and lyric lookups, I kept gemini-3-pro. While slower, its superior reasoning is required to filter out bad search results and format complex Pinyin correctly. To optimize speed, I tuned the thinkingConfig, reducing lyric retrieval time from ~90 seconds to ~50 seconds.
- The Input Friction & Iteration: I initially designed the search to rely on text input, but quickly hit a wall: music apps don't allow users to copy-paste song titles or artist names. Recognizing this gap, I integrated gemini-3-pro-image to enable image search.
Accomplishments that we're proud of
- From Personal to Universal: I am proud of transforming a personal idea into a functional app. The four features I built successfully address learning gaps that existing apps ignore. While tailored to my boyfriend's specific needs, conversations with friends have confirmed that these struggles are universal, validating that the app has real utility for the wider language-learning community.
- The Multimodal Workflow: The ability to execute a seamless workflow, going from a music player screenshot to verified lyrics and the backstory, is my proudest technical achievement. This would not be possible without Gemini's unique native access to both Google Image and Text Search.
What we learned
- Explicit Communication with AI: I learned that I need to be explicit with the AI. For example, I initially assumed that asking the AI to change one feature would leave the rest intact, but it would sometimes remove elements to "optimize" the code. I learned to always state "Do not change existing code" to maintain stability.
- The Difficulty of Development: I learned that developing a polished app is incredibly hard. Even with AI Studio's help, the process of concreting features, communicating needs, and managing the file setup was a rigorous journey. Google AI Studio made the unattainable possible, but bridging the gap from idea to an app still required significant effort.
What's next for Slay Mandarin
- Technical Upgrade: I will continue exploring caching strategies to further shorten loading times for a better user experience.
- Multi-language Support: The architecture is language-agnostic. This app can be expanded to "Slay Spanish", "Slay Korean" for K-Pop fans, or other languages to benefit more people.
- Scenario Practice: I plan to leverage the Gemini Live API to pre-set specific social scenarios, like asking for directions, greeting friends, meeting partner's family, with the agent characterizing different personas (e.g., seniors, friends, strangers) to help users practice in exact contexts.

Log in or sign up for Devpost to join the conversation.