Inspiration
We built Utterr from a simple but common problem that people often know what they want to say, but struggle with how to say it, especially in high-pressure situations like interviews, presentations, or formal conversations.
Most existing tools focus on grammar or speech correction, but ignore something crucial: tone and context. Speaking to a professor, pitching an idea, or introducing yourself all require different tones, structures, and delivery styles.
We wanted to create a platform that helps people practice real-life conversations in a safe, judgment-free space turning hesitation into confidence.
What it does
Utterr is a context-aware AI platform that helps users practice and improve their speaking.
- Users provide a scenario (e.g., interview, presentation) and a rough script
- The system refines their content using AI and asks smart clarifying questions
- Users record themselves speaking
- The platform analyzes their delivery including pacing, tone, pauses, and fluency
It then:
- Gives actionable feedback (where to pause, emphasize, or improve)
- Annotates the script with expressive cues
- Generates an ideal version of the speech for comparison
How we built it
We designed Utterr as a two-phase system:
1. Content Refinement (Text Phase)
- Built using Gemini API
- Handles context understanding, question generation, and iterative refinement
2. Speech Analysis (Audio Phase)
- Input Audio processed using ElevenLabs API
- Returns detailed speech metadata (timing, pauses, etc.)
- Gemini analyzes this data to generate feedback and annotated scripts
- Output of audio using ElevenLabs API with human audio
Tech Stack:
- Backend: Django (Python)
- Frontend: Django templates, Javascript, CSS
- Database: SQLite
- APIs: Gemini API, ElevenLabs API
Challenges we ran into
- Getting meaningful context: Designing prompts so the AI asks relevant, non-generic questions
- Interpreting speech data: Converting raw audio metadata into useful human feedback. ElevenLabs provides API that gives exactly this.
- Text–speech alignment: Handling differences between written scripts and actual spoken delivery. We observed that other AI than ElevenLab
- Keeping UX simple: Supporting iterative refinement without overwhelming the user
Accomplishments that we're proud of
- Building a complete end-to-end pipeline from idea → refined script → spoken analysis
- Creating context-aware feedback, not just generic speech suggestions
- Successfully integrating multiple AI systems into a seamless experience
- Delivering a working prototype within a hackathon timeframe
What we learned
- Context plays a huge role in communication — more than just wording
- Iterative feedback loops significantly improve user outcomes
- Speech is more than words — timing, pauses, and tone matter deeply
- Combining multiple AI tools requires careful orchestration and prompt design
What's next for Utterr
- More realistic AI roleplay conversations (interactive scenarios)
- Personalized coaching based on user progress over time
- Expanded voice and tone customization
- Better real-time feedback during speaking
Our goal is to make Utterr a full communication coach — helping people speak confidently in any situation.
Turn “nevers” into confidence.
Log in or sign up for Devpost to join the conversation.