Inspiration

Magic AI was inspired by a very real and personal problem.
I wanted to learn a language that many of my friends speak fluently, but I struggled with actually speaking it. I could understand some words and phrases, but when it came time to respond in real conversations, I hesitated. I worried about pronunciation, grammar, and sounding awkward.

Most language-learning tools focus on vocabulary drills or scripted lessons, but they don’t help with the moment that matters most: talking to real people. I wanted something that could give me proper help—practice with me patiently, and then support me live when I was speaking with my friends. That gap between learning and real-world conversation is what inspired Magic AI.


What it does

Magic AI is a voice-first conversational system designed to help people confidently communicate across languages.

It has two clearly separated modes:

  • Coach Mode helps users practice speaking a new language. It keeps conversations going naturally, gently corrects mistakes, and encourages users to respond in the target language so they can build confidence.
  • Translator Mode acts as a real-time interpreter between two people. It automatically understands the language being spoken and translates the meaning into the listener’s language, without teaching or interrupting the conversation.

Magic listens to natural speech—including stutters, mixed languages, and accents—and responds appropriately. The goal is simple: help users learn privately, then speak confidently in real conversations.


How we built it

Magic AI is built as a polyglot, voice-first pipeline:

  1. Voice input is captured directly from the browser, allowing hands-free interaction.
  2. Server-side speech recognition transcribes audio and detects the spoken language, even when users mix languages or switch mid-sentence.
  3. Conversation intelligence routes each turn based on mode:
    • Coach Mode focuses on learning, corrections, and conversation flow.
    • Translator Mode focuses on accurate, meaning-based translation and safety.
  4. Text-to-speech output speaks responses back in the correct language for the listener.
  5. Session memory keeps track of context so conversations feel continuous instead of fragmented.

The system is designed to reason about meaning, not just words, which is critical for real-world conversations.


Challenges we ran into

  • Language ambiguity: People don’t speak cleanly. Handling stutters, partial sentences, and code-switching required careful logic to avoid guessing or hallucinating details.
  • Performance and cost: Each voice interaction can involve speech recognition, reasoning, and voice synthesis. Optimizing for responsiveness without excessive cost was a major challenge.
  • Reliability: Rate limits, network issues, and model instability can break voice apps easily. We had to design fallback behaviors so the app never feels “stuck.”
  • Mode separation: Learning and translating are fundamentally different tasks. Ensuring Coach Mode and Translator Mode never leak behaviors into each other required strict design discipline.

Accomplishments that we’re proud of

  • Building a fully voice-first experience that works without screens or typing.
  • Successfully separating learning from translation, so each mode feels intentional and trustworthy.
  • Supporting automatic language detection and mixed-language speech.
  • Creating an experience that prioritizes user confidence and real conversations over demos or gimmicks.

What we learned

  • Voice-first systems require much more attention to trust, latency, and failure handling than text-based apps.
  • Translation accuracy alone isn’t enough—knowing when to ask for clarification is just as important.
  • Users care deeply about confidence and flow, not perfect grammar explanations.
  • Designing for real human speech is very different from designing for clean, ideal inputs.

What’s next for Magic AI

Next, we want to make Magic AI even more reliable and scalable by:

  • Improving performance and cost efficiency
  • Strengthening long-term conversational memory
  • Expanding language support
  • Refining pronunciation feedback and real-time translation safety

Magic AI started as a personal solution to learning my friends’ language, but it has grown into a platform that can help anyone feel more confident speaking across languages.

Built With

  • and-backed-by-cloud-firestore-with-jwt-based-authentication-for-secure
  • cloud-firestore
  • deployed-on-google-cloud-run
  • elevenlabs
  • elevenlabs-for-natural-text-to-speech
  • flask
  • gemini
  • google-cloud-run
  • javascript
  • jwt-based-authentication
  • powered-by-google-gemini-for-multilingual-speech-recognition-and-reasoning
  • python
  • webaudio
Share this project:

Updates