Inspiration
I noticed a gap in language learning: apps like Duolingo teach vocabulary, but don’t prepare users for real conversations, while tools like ChatGPT allow conversation but assume prior knowledge. For beginners, this creates a frustrating barrier, you can’t learn by speaking if you don’t know what to say. We wanted to build something that removes that fear and lets anyone start speaking from day one.
What it does
Milo is an AI-powered language tutor that enables real-time, guided conversations for beginners. Users can speak naturally, even in English, and Milo converts their input into the target language, responds back, and provides live translations, suggested replies, and corrections. This creates a safe, interactive environment where users learn by doing, not memorizing.
How we built it
I built Milo using a combination of large language models for conversational intelligence and feedback generation, real-time translation pipelines, and voice synthesis (via ElevenLabs) for immersive interaction. The frontend delivers a chat-based experience with live translations, quick reply suggestions, and correction overlays, while the backend orchestrates input processing, translation, and structured AI responses.
Challenges we ran into
One of our biggest challenges was designing an experience that feels natural while still guiding the user. Pure conversation wasn’t enough, beginners would get stuck, so I had to rethink the flow to include translations, suggestions, and corrections without overwhelming the user. Another challenge was structuring AI outputs reliably (responses, translations, corrections) in real-time for a smooth UX.
Accomplishments that we're proud of
I successfully built a system where a complete beginner can start a conversation in a new language within seconds. Instead of feeling stuck, users are continuously supported through suggestions, corrections, and translations. I'm especially proud of turning a traditionally difficult and intimidating process into something intuitive and engaging.
What we learned
I learned that building a great AI product isn’t just about powerful models, it’s about designing the right user experience around them. The difference between a chatbot and a real product is guidance, feedback, and clarity. We also learned how to structure AI outputs and orchestrate multiple systems (translation, voice, conversation) into a seamless interaction.
What's next for milo
Next, I plan to add personalization and memory so Milo can adapt to each user’s learning journey, track progress, and revisit mistakes over time. We also want to introduce scenario-based learning (e.g., ordering food, interviews), gamification (XP, streaks), and deeper speaking analysis (fluency and pronunciation scoring). Our long-term vision is to become the most natural and effective way to learn languages through conversation.
Built With
- drei-**ai-&-voice-apis:**-openai-gpt-4o-(conversation-engine)
- elevenlabs
- elevenlabs-(multilingual-text-to-speech)-**real-time-communication:**-websockets-(fastapi-native)-**database:**-sqlite-via-sqlalchemy-**authentication:**-jwt-tokens-with-bcrypt-password-hashing-**platform/runtime:**-node.js
- fastapi
- fiber
- jwt
- next.js
- node.js
- openai
- openai-whisper-(speech-to-text)
- python
- python-**frameworks:**-next.js-14
- python-3.11+
- react
- react-18
- react-three-fiber
- sqlite
- tailwind
- tailwind-css-**3d/graphics:**-three.js
- typescript
- uvicorn
- websocket
Log in or sign up for Devpost to join the conversation.