Inspiration
I'm a Canadian living in Finland. I need to learn the Finnish language. It's one of the hardest languages in the world.
The problem is, it's hard to find learning resources that adapt to my level, to keep me challenged and to keep me learning. Things are either too easy or too hard.
But when I first tried out Gemini's speech-to-speech model, I immediately knew this may be what can actually help me. It did a great job speaking to me in my language at my level, and I figured: what if we turn this into an AI agent? What if we constantly understand the user's level even during the conversation, just like a real human would, and adapt to their level so that they can keep learning?
What it does
So that's why I built LearnFast. It is a team of two AI agents:
- The tutor
- The backup teacher The backup teacher's job is to simply understand the learner and their entire history. This lets the model focus on speaking to the user in their language, and the Backup Teacher updates the tutor live during the conversation based on the learner's entire history so that the tutor can constantly adjust in real time.
How we built it
I used a speech-to-speech audio model, specifically Gemini Flash Live 3.1. But since speech-to-speech models have limited context windows, and we need to keep the voice model constantly adjusting the difficulty level to match the users, we have a backup teacher. This is another LLM that takes the full traces from our eyes and acts as an LLM judge to constantly evaluate in real time the user's interaction with the main tutor agent. Then, when the voice model calls a tool to log each exchange, it receives a response of how it can adjust its teaching style for the immediate next turn.
Challenges we ran into
The key challenge was how to keep the latency down. Sometimes, especially when there's a lot of evals to analyze, the tool call might take too long. So we solve this by making it so that the LLM as a judge doesn't have to respond on the same exact tool call. It might respond with the previous turns' analysis. This means that the tutor agent is generally adjusting their style, not in the very next sentence, but in one or two sentences after. In reality, this still gives us the exact behavior we need because it keeps the conversation natural, but it gives the voice agent time to naturally transition the conversation back to the trouble spots.
Accomplishments that we're proud of
I genuinely began learning with this tool better than a lot of apps I have actually used. I've already started learning more new words and retaining words better, and I'm finding myself using my own app now more and more to learn Finnish.
What we learned
I learned a lot about speech-to-speech models. This was the first time I used it. I'm genuinely impressed with quality, speed, and accuracy in various languages. It's even impressive that we can actually build speech-to-speech AI agents that can do complex tasks while you're speaking with them.
What's next for Learn Fast
The tool is open source, and I will probably share it with anybody who is learning languages. For me, I'm going to actually continue using this tool. I'm going to continue to improve it. I hope that it can help me actually pass my Finnish language exam that I need to get my permanent residency.

Log in or sign up for Devpost to join the conversation.