Nounce

Inspiration

Learning pronunciation is hard when you can't see what you're doing wrong. Traditional language apps only listen to your voice, but pronunciation is also about how you move your mouth. We wanted to build something that actually shows learners the physical mechanics of speech.

What it does

Nounce helps people improve their English pronunciation through video analysis. Users record themselves saying words, and the app analyzes both their audio and visual articulation. It breaks down each syllable, shows how their mouth movements compare to native speakers, and provides specific feedback on what to adjust. The system considers the user's native language to identify common pronunciation challenges and offers personalized recommendations.

Gemini 3 Uses

Multimodal video analysis - Processes recorded videos to evaluate mouth shapes, lip positions, and jaw movements synchronized with audio pronunciation
Syllable-by-syllable scoring - Analyzes each part of the word separately, scoring articulation quality from 25-100 based on visual and audio patterns
Country-specific insights - Generates tailored feedback based on the user's native language and common pronunciation difficulties
Structured linguistic analysis - Extracts IPA notation, phonetic breakdowns, and difficulty ratings for each word

How we built it

We built Nounce with Next.js and TypeScript for the frontend, Supabase for data storage and authentication, and integrated Gemini 3 Flash for AI analysis. The system records user videos, uploads them to Supabase storage, then sends them to Gemini for comprehensive pronunciation evaluation. We also use ElevenLabs for generating native speaker reference audio.