SpeakEasy

Architecture Diagram
Homepage
Reading Assessment
PDF Summary pt 1.
PDF Summary pt 2.

Inspiration

SpeakEasy started with a simple problem: speech assessments can be hard to access, expensive, inconsistent, and difficult to track over time.

We were especially motivated by rare disease and neurological communities, where speech changes are often one of the earliest and most frustrating symptoms. Conditions like ataxia often rely on clinician-rated scales such as SARA, where speech is judged during short in-person visits. Patients may travel long distances, wait weeks between appointments, and still leave without a clear way to measure progress.

We wanted to build something that gives patients more clarity, providers better data, and both sides a smoother path to care.

What it does

SpeakEasy turns a short voice session into useful, measurable insights.

Users complete three guided tasks:

Read a short sentence
Repeat syllables for rhythm and motor speech testing
Have a short natural conversation

From those recordings, SpeakEasy evaluates five core areas:

Fluency
Clarity
Rhythm
Prosody
Pronunciation

The platform then provides:

A composite speech score
Visual charts and progress trends
Personalized strengths and focus areas
A clinician-ready PDF report
Session history tracking
A guardrailed ElevenLabs voice assistant that explains results and gives coaching based only on validated data

How we built it

We built SpeakEasy as a layered AI healthcare system.

Frontend

React
Vite
Tailwind CSS
Browser recording with the MediaRecorder API

Backend

FastAPI for API orchestration
faster-whisper for transcription
librosa + parselmouth for feature extraction
reportlab + matplotlib for reports and charts

AI Agent Architecture

We used uAgents + Agentverse to coordinate specialized agents.

Assessment Agent

Receives speech metrics from the backend.

Report Agent

Turns technical results into clear summaries and clinician-ready PDFs.

Progress Tracker

Compares new sessions with historical sessions to spot trends over time.

Therapist Agent

Builds structured prompts for an ElevenLabs voice assistant that explains results and gives personalized coaching.

Safety Guardrails

The voice assistant is limited to the data it receives. It cannot invent diagnoses, give unsupported medical advice, or go beyond the validated results.

Challenges we ran into

1. Turning speech into useful metrics

Speech quality is complex. We had to identify signals that were both measurable and meaningful, including:

Words per minute
Pause frequency
Pronunciation confidence
Pitch variation
Rhythm consistency

Many ideas worked in theory but became noisy on everyday microphones.

2. Avoiding misleading AI outputs

Because this is healthcare-adjacent, safety mattered from the start. We built strict limits so the assistant could support users without pretending to replace a clinician.

3. Making results feel human

Numbers alone are not enough. We wanted people to feel informed and encouraged, so we focused on visuals, plain-language explanations, and progress tracking.

4. Coordinating multiple agents

Getting several agents to reliably handle reporting, history comparison, and coaching required careful system design and dependable message passing.

What we learned

We learned that strong healthcare technology is not only about model accuracy.

It is also about trust, accessibility, usability, and empathy.

Users do not just want scores. They want answers to questions like:

Am I improving?
What should I work on next?
Can I share this with my provider?
Do I have a clearer path forward?

We also learned that multi-agent systems work best when each agent has a clear role instead of asking one model to do everything.

What's next

We see SpeakEasy growing into:

Remote monitoring for speech therapy patients
Neurological condition progress tracking
Early speech screening for underserved communities
Public speaking and education coaching
Provider dashboards for long-term review

Our long-term vision is simple:

Healthcare should not begin only when someone reaches the clinic. It should begin wherever they live, speak, and grow.

We see SpeakEasy evolving beyond one-time assessments into a personalized rehabilitation platform where each session helps guide the next. Exercises in fluency, pronunciation, rhythm, and prosody could be adapted based on user progress, creating a continuous cycle of assessment, coaching, and support.