Inspiration

Inspired by this YouTube video This is a Breakthrough....

What it does

First, the user records an audio clip of speaking the practice phrase. Then the app provides the Gemini Analyzed Pronunciation Analysis Results for the user, with additional practice phrase and pronunciation guidance. The main advantage of this app is its cost-efficiency and relatively high accuracy.

How we built it

The React SPA is vibe coded by AI Studio. The Synthetic benchmark pipeline (Python) is developed in VS Code. The core function is built upon Gemini 3 family's native audio understanding.

Challenges we ran into

  1. Using synthetic data to conduct sanity check to make sure that the hallucination is minimal.
  2. Reduce the false positive errors for the pronunciation analysis result. It is quite hard to tackle the false positive issue.

Accomplishments that we're proud of

Using synthetic data to test Gemini 3's native audio understanding capability.

What we learned

Processing audio data with Gemini 3. Mitigate this type of false positive errors.

What's next for FluentEcho

  1. Gamification the levels and more nuanced practice generation.
  2. Gather data (both synthetic and organic) for a rigorous benchmarking the multi modal LLMs' audio understanding.
  3. After the false positive can be solved, get rid of all the phrases.
  4. Speech training that utilizes Gemini's visual understanding ability.
Share this project:

Updates