Inspiration

After moving to Brazil, I found myself in the first country I'd ever lived in where English wasn't the default. There was a language barrier — not just a practical one, but an emotional one. I kept trying to speak Portuguese, kept getting blank stares, and quietly started speaking less.

Then I started taking dance lessons. I picked up Portuguese there — not from drills or flashcards, but from my instructor, the music, and the other students. I greeted the class on my second session. I learned how to ask what song was playing. Something clicked: I was willing to make mistakes here because I cared about the context. The stakes were low. The motivation was high. The language followed.

Most apps treat speaking as the final boss — something you unlock after months of grammar exercises. I wanted to flip that. What if you could start talking on day one, about something you already love?

What it does

Talk with Me matches you with AI chat friends who are native speakers of your target language, passionate about the same things you are — K-Pop, football, anime, cooking, whatever. They don't talk at you in a foreign language. They chat with you in a natural mix of your native language and the language you're learning, the way bilingual friends actually text each other.

Pick your interests and proficiency level, and Gemini generates 3 unique chat personas — including a special cultural figure connected to your chosen language and topics. Each conversation naturally weaves target-language words and phrases into familiar sentences. After every message, learning cards surface the new words with pronunciation, translation, and type (vocabulary / phrase / sentence). You record yourself saying each word; the app checks your pronunciation before you can reply. Everything you learn gets saved to a personal Notebook. At the end of a session, Gemini writes a short paragraph in 100% target language using every word you picked up — proof of what you just learned.

How I built it

The entire app runs in the browser with no custom backend server. React 18 + Vite + Tailwind CSS on the frontend, Zustand for state, Supabase (PostgreSQL + Realtime) for persistence.

Gemini powers four distinct features:

  • Persona generation — Gemini 2.0 Flash generates 3 AI chat personas (name, emoji avatar, character description, speaking style) from the user's chosen language and interests, returned as structured JSON
  • Conversational replies — each reply is a JSON object with a message (target language naturally mixed in at the user's proficiency ratio) and learning_points (the specific words/phrases used). All structured output is enforced through prompt design alone, with a fallback parser for edge cases
  • Text-to-speech — Gemini TTS API (gemini-2.5-flash-preview-tts) provides language-native pronunciation. The API returns raw PCM audio; we construct a valid WAV file client-side by synthesizing a 44-byte RIFF header before playback via the Audio element
  • Session summary — at the end of a conversation cycle, Gemini composes a short paragraph using every vocabulary word learned that session, paired with a full translation in the user's native language

Pronunciation checking uses the Web Speech API (SpeechRecognition) with Sørensen-Dice bigram similarity scoring. Supabase Realtime keeps chat in sync across sessions.

Challenges we ran into

Keeping the language ratio faithful under conversational pressure. The hardest prompt engineering problem wasn't generating good responses — it was stopping the agent from drifting. When a user starts writing more in the target language, the model interprets that as a signal to respond in kind, gradually abandoning the native-language scaffold entirely. We had to add explicit ratio enforcement and level-specific examples directly into the system prompt to hold the line, and even then it required iteration to make the constraint survive across a long conversation history.

Making the conversation feel genuinely alive. Early versions of the personas had a habit of steering exchanges toward flat, repetitive patterns — peppering the user with questions without offering anything in return, or looping back to the same topic hooks. A chat buddy who only asks "so what do you think?" gets boring fast. We solved this by injecting far more vivid character detail into the identity prompts: specific opinions, memories, habits, and conversational quirks. The personas needed to feel like they had their own inner world, not just a role to play.

Mixed-language speech recognition is not ready yet. An early goal was full voice input — recording yourself mid-conversation and having your message transcribed. In practice, code-switched audio (a sentence that moves between Portuguese and English mid-clause, for example) breaks most speech recognition engines badly. Accuracy dropped to the point where conversations couldn't progress. For this version we made the call to keep input text-based and focus voice on what it does well: pronunciation playback and practice via Gemini TTS.

What's next for Talk with Me

Full voice conversation. There are already models emerging that handle mixed-language audio with far better accuracy. The most important next step is getting voice input right and enable the audio output from personas.

Personas grounded in the real world. Right now personas exist in a kind of cultural generality. The next version will give them access to live, local context — a coffee shop that just opened nearby, a concert coming up this weekend for example. With these context, personas will better serve the needs of most updated chat topics and language elements.

Persona memory and spaced review. The Notebook already captures every word users’ learned. The next step is closing the loop: personas that remember what's in your notebook, and proactively bring those words back into conversation twice a week as a natural review process.

Real human connection. The endgame isn't AI. It's the real person on the other side. Personas will eventually serve as a bridge — matching real bilingual users with each other based on shared interests and complementary language goals, and connecting them to a live chat when both sides are ready.

Built With

Share this project:

Updates