Inspiration

We built Utterr from a simple but common problem that people often know what they want to say, but struggle with how to say it, especially in high-pressure situations like interviews, presentations, or formal conversations.

Most existing tools focus on grammar or speech correction, but ignore something crucial: tone and context. Speaking to a professor, pitching an idea, or introducing yourself all require different tones, structures, and delivery styles.

We wanted to create a platform that helps people practice real-life conversations in a safe, judgment-free space turning hesitation into confidence.


What it does

Utterr is a context-aware AI platform that helps users practice and improve their speaking.

  • Users provide a scenario (e.g., interview, presentation) and a rough script
  • The system refines their content using AI and asks smart clarifying questions
  • Users record themselves speaking
  • The platform analyzes their delivery including pacing, tone, pauses, and fluency

It then:

  • Gives actionable feedback (where to pause, emphasize, or improve)
  • Annotates the script with expressive cues
  • Generates an ideal version of the speech for comparison

How we built it

We designed Utterr as a two-phase system:

1. Content Refinement (Text Phase)

  • Built using Gemini API
  • Handles context understanding, question generation, and iterative refinement

2. Speech Analysis (Audio Phase)

  • Input Audio processed using ElevenLabs API
  • Returns detailed speech metadata (timing, pauses, etc.)
  • Gemini analyzes this data to generate feedback and annotated scripts
  • Output of audio using ElevenLabs API with human audio

Tech Stack:

  • Backend: Django (Python)
  • Frontend: Django templates, Javascript, CSS
  • Database: SQLite
  • APIs: Gemini API, ElevenLabs API

Challenges we ran into

  • Getting meaningful context: Designing prompts so the AI asks relevant, non-generic questions
  • Interpreting speech data: Converting raw audio metadata into useful human feedback. ElevenLabs provides API that gives exactly this.
  • Text–speech alignment: Handling differences between written scripts and actual spoken delivery. We observed that other AI than ElevenLab
  • Keeping UX simple: Supporting iterative refinement without overwhelming the user

Accomplishments that we're proud of

  • Building a complete end-to-end pipeline from idea → refined script → spoken analysis
  • Creating context-aware feedback, not just generic speech suggestions
  • Successfully integrating multiple AI systems into a seamless experience
  • Delivering a working prototype within a hackathon timeframe

What we learned

  • Context plays a huge role in communication — more than just wording
  • Iterative feedback loops significantly improve user outcomes
  • Speech is more than words — timing, pauses, and tone matter deeply
  • Combining multiple AI tools requires careful orchestration and prompt design

What's next for Utterr

  • More realistic AI roleplay conversations (interactive scenarios)
  • Personalized coaching based on user progress over time
  • Expanded voice and tone customization
  • Better real-time feedback during speaking

Our goal is to make Utterr a full communication coach — helping people speak confidently in any situation.


Turn “nevers” into confidence.

https://www.canva.com/design/DAHFXPEitjA/Rde2bI1mgFYPeAL6FDgCXw/edit?utm_content=DAHFXPEitjA&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton

Share this project:

Updates