Inspiration
Millions of ESL learners struggle with pronunciation, especially without access to real-time feedback. Most tools focus on grammar or vocabulary, but speaking confidently is often the hardest part. We wanted to create a tool that is approachable, interactive, and even a little fun, so we built an AI-powered pronunciation coach with Harold the goose as a mascot!
What it does
SpeakSpeak Goose is an AI pronunciation coach that listens to users as they speak and provides instant feedback on how to improve. Users are given a phrase to say or have the option to provide a specific one, and the app analyzes their pronunciation, highlights mistakes, and offers simple, actionable suggestions. The goose mascot makes the experience more engaging and less intimidating, especially for learners who may feel self-conscious practicing speech.
How we built it
We built a simple full-stack application with a focus on speed and usability. Focused on creating a clean interface with recording functionality and real-time feedback display for out frontend, while our backend handles audio/text input, processes it, and connects to AI services By implementing speech-to-text, we were able to convert user speech into text for comparison Our main focus was AI feedback by using a language model to generate clear pronunciation advice based on differences between expected and actual speech. Through these features were able to build a fast pipeline from input to transcription to analysis and finally feedback.
Challenges we ran into
Handling audio reliably in a short amount of time was difficult as recording, processing, and transcribing speech introduced latency and debugging challenges. Another challenge was getting useful feedback from the AI. Early outputs were too generic, so it was necessary to refine prompts to produce clear, specific suggestions. Perhaps the biggest consideration was time constraints and as it forced us to balance our ambition and focus on building a realistic, working core feature.
Accomplishments that we're proud of
We built a working end-to-end experience where users can speak and receive meaningful feedback almost instantly, simplifying a complex problem into a clear and intuitive user experience. We’re also proud of creating something that feels approachable and fun, using the goose mascot to make language learning less intimidating.
What we learned
We learned how to integrate multiple technologies, such as speech recognition, AI, and frontend design, into one cohesive product. We also learned how important it is to scope projects carefully and prioritize a strong core feature. Additionally, we saw how much thoughtful design and personality can improve the impact of a technical project.
What's next for SpeakSpeakGoose
Next, we would expand from single phrases to full conversations and more personalized learning experiences. We would improve accuracy with deeper phonetic analysis and more advanced speech evaluation. Future features could include progress tracking, accent-specific coaching, and gamified practice to keep users engaged. If possible we also do not want to be limited to English but instead allow the user to pick other languages and aim to improve their proficiency in that language as well. Our most immediate and primary goal however is to turn SpeakSpeakGoose into a fun and effective tool that can help anyone build real confidence in speaking English.
Built With
- anthropic
- elevenlabs
- express.js
- gemini
- html
- javascript
- node.js
- postgresql
Log in or sign up for Devpost to join the conversation.