Inspiration
Learning a new language is difficult because most apps focus on memorizing vocabulary and grammar instead of practicing real conversations. Many learners struggle when they finally need to speak in real-world situations, such as ordering food, asking for directions, or buying a ticket.
We wanted to build something that feels closer to actually traveling to another country and talking to people. Inspired by the capabilities of real-time AI conversations using the Google Gemini, we decided to create an immersive role-playing experience where users can practice speaking with an AI that behaves like a real person.
That idea became Immergo — a conversational AI language learning experience.
What it does
Immergo is an immersive language learning app that lets users practice speaking through interactive role-play missions.
Instead of passive lessons, users enter real-world scenarios such as:
- Buying a bus ticket
- Ordering coffee
- Meeting a neighbor
- Asking for directions
The AI plays a specific character (for example a bus driver or barista) and responds naturally to the user's speech.
Immergo offers two learning modes:
Teacher Mode
- The AI explains phrases and grammar in the user’s native language
- Provides translations and helpful guidance
Immersive Mode
- A strict “No Free Rides” policy
- The AI only accepts the target language
- Forces users to practice real conversational fluency
The app also includes performance scoring, grading the user’s fluency as:
- Tiro (Beginner)
- Proficiens (Intermediate)
- Peritus (Advanced)
How we built it
Immergo uses a real-time AI architecture combining modern web and AI technologies.
Frontend
- Vanilla JavaScript
- Vite
- Web Audio API for capturing microphone input
- WebSockets for real-time streaming
Backend
- Python with FastAPI
- AI integration through the Google Gemini Live SDK
Infrastructure
- Real-time audio streaming using WebSockets
- AI responses generated and returned as low-latency speech
- Optional deployment via Google Cloud Run
This architecture allows natural conversations where the AI listens and responds almost instantly.
Challenges we ran into
One major challenge was handling real-time voice interaction. Streaming audio between the browser and the AI model required careful coordination to avoid delays or broken conversations.
Another challenge was designing AI personas that stay consistent within a scenario. For example, a bus driver should only respond like a bus driver and not suddenly become a teacher.
We also had to balance helpful feedback with immersion so that the learning experience still feels like a real conversation rather than a quiz.
Accomplishments that we're proud of
We are proud that Immergo successfully creates a real-time conversational language practice environment rather than a traditional lesson-based app.
Some key achievements include:
- Real-time voice conversations with AI
- Role-play scenarios that simulate real-world situations
- Two learning modes that support both beginners and advanced learners
- A clean architecture using modern web technologies
Most importantly, the experience feels like talking to a real person, which is a huge step forward compared to traditional language learning tools.
What we learned
During this project we learned a lot about:
- Real-time audio streaming in web applications
- Building conversational AI systems
- Designing AI prompts that maintain consistent roles
- Creating immersive user experiences using AI
We also discovered how powerful real-time AI models can be for education and interactive learning.
What's next for Immergo
Our vision is to expand Immergo into a full conversational language learning platform.
Future plans include:
- More missions such as airport travel, hotels, and business meetings
- Support for many more languages
- Multiplayer conversations between learners
- AI pronunciation coaching
- Progress tracking and personalized learning paths
- Mobile apps for iOS and Android
Built With
- bigquery
- gcp
- google-genai-sdk
- javascript
- python
Log in or sign up for Devpost to join the conversation.