Utterr

Inspiration

We built Utterr from a simple but common problem that people often know what they want to say, but struggle with how to say it, especially in high-pressure situations like interviews, presentations, or formal conversations.

Most existing tools focus on grammar or speech correction, but ignore something crucial: tone and context. Speaking to a professor, pitching an idea, or introducing yourself all require different tones, structures, and delivery styles.

We wanted to create a platform that helps people practice real-life conversations in a safe, judgment-free space turning hesitation into confidence.

What it does

Utterr is a context-aware AI platform that helps users practice and improve their speaking.

Users provide a scenario (e.g., interview, presentation) and a rough script
The system refines their content using AI and asks smart clarifying questions
Users record themselves speaking
The platform analyzes their delivery including pacing, tone, pauses, and fluency

It then:

Gives actionable feedback (where to pause, emphasize, or improve)
Annotates the script with expressive cues
Generates an ideal version of the speech for comparison

How we built it

We designed Utterr as a two-phase system:

1. Content Refinement (Text Phase)

Built using Gemini API
Handles context understanding, question generation, and iterative refinement

2. Speech Analysis (Audio Phase)

Input Audio processed using ElevenLabs API
Returns detailed speech metadata (timing, pauses, etc.)
Gemini analyzes this data to generate feedback and annotated scripts
Output of audio using ElevenLabs API with human audio

Tech Stack:

Backend: Django (Python)
Frontend: Django templates, Javascript, CSS
Database: SQLite
APIs: Gemini API, ElevenLabs API

Challenges we ran into

Getting meaningful context: Designing prompts so the AI asks relevant, non-generic questions
Interpreting speech data: Converting raw audio metadata into useful human feedback. ElevenLabs provides API that gives exactly this.
Text–speech alignment: Handling differences between written scripts and actual spoken delivery. We observed that other AI than ElevenLab
Keeping UX simple: Supporting iterative refinement without overwhelming the user

Accomplishments that we're proud of

Building a complete end-to-end pipeline from idea → refined script → spoken analysis
Creating context-aware feedback, not just generic speech suggestions
Successfully integrating multiple AI systems into a seamless experience
Delivering a working prototype within a hackathon timeframe