-
-
Self pacing learning process with five pre-defined levels.
-
Gemini TTS powered practice phrase generation.
-
Personalized voice profiling with confidence score.
-
Granular pronunciation analysis that provides vital feedbacks.
-
Additional targeted practice powered by Gemini 3.
-
Additional targeted practice powered by Gemini 3.
Inspiration
Inspired by this YouTube video This is a Breakthrough....
What it does
First, the user records an audio clip of speaking the practice phrase. Then the app provides the Gemini Analyzed Pronunciation Analysis Results for the user, with additional practice phrase and pronunciation guidance. The main advantage of this app is its cost-efficiency and relatively high accuracy.
How we built it
The React SPA is vibe coded by AI Studio. The Synthetic benchmark pipeline (Python) is developed in VS Code. The core function is built upon Gemini 3 family's native audio understanding.
Challenges we ran into
- Using synthetic data to conduct sanity check to make sure that the hallucination is minimal.
- Reduce the false positive errors for the pronunciation analysis result. It is quite hard to tackle the false positive issue.
Accomplishments that we're proud of
Using synthetic data to test Gemini 3's native audio understanding capability.
What we learned
Processing audio data with Gemini 3. Mitigate this type of false positive errors.
What's next for FluentEcho
- Gamification the levels and more nuanced practice generation.
- Gather data (both synthetic and organic) for a rigorous benchmarking the multi modal LLMs' audio understanding.
- After the false positive can be solved, get rid of all the phrases.
- Speech training that utilizes Gemini's visual understanding ability.
Log in or sign up for Devpost to join the conversation.