Bright Speech

Inspiration

I, Roaid, was hanging out with a friend who’s a speech and language therapist when I noticed a few pages of symbols in her notes, and honestly, it looked like hieroglyphics. I asked what it was, and that’s when she introduced me to the International Phonetic Alphabet (IPA). I assumed there was some tool that could convert speech into IPA automatically, but she told me the reality is the opposite: in each session, she has to listen carefully and manually transcribe what the child says, phoneme by phoneme. That moment made the bottleneck obvious. If therapists are spending so much time on transcription, that’s time not spent on intervention, especially when working with children, where speech is more variable and harder for standard speech recognition. That’s what sparked the idea: a tool that can transcribe speech to IPA automatically, designed specifically around the complexity of pediatric and disordered speech.

What it does

Bright Speech is a privacy-first tool that helps speech and language therapists turn children’s speaking practice into structured phonological evidence. In pediatric therapy, clinicians often rely on repeated word productions and manual IPA (International Phonetic Alphabet) transcription to identify sound errors, detect phonological processes, and track progress. Bright Speech reduces that burden by guiding a child through short, targeted tasks—phoneme drills, minimal pairs, and controlled word prompts—and generating phoneme/IPA output for each attempt. Because the target word is known in advance, the system can compare expected versus produced sequences and summarize clinically relevant patterns such as substitutions, deletions, and insertions, while conservatively flagging low-confidence “atypical productions” for human review rather than pretending certainty. Over time, the app aggregates attempts into therapist-ready summaries: which sounds are affected, how often errors occur, where they occur (initial/medial/final positions), and how patterns change across sessions. Reports are generated for therapists (clear phonological summaries that reduce manual documentation). By shifting repetitive data acquisition to supervised home practice and standardizing pattern summaries, Bright Speech aims to support earlier, more focused intervention and increase clinical throughput without replacing the therapist.

How we built it

We built Bright Speech as a structured, word-level MVP that mirrors how therapy is typically administered: controlled elicitation where the expected phonetic target is known. The app displays a target word on screen, records the child’s production (parent-supervised), and runs speech-to-phoneme/IPA analysis to produce a phonetic output that can be compared directly to the expected transcription. This controlled design makes alignment and error detection more reliable than open-ended speech and enables simple, interpretable reporting of substitutions, deletions, and insertions, while treating distortions more cautiously using confidence and “atypical production” flags. To preserve trust and compliance, the system follows privacy-by-design principles: processing is intended to run on-device, no raw audio is stored by default, and only derived phonetic metadata and summary statistics are retained, with clear consent and deletion controls. We also designed the roadmap to address known deployment challenges: capturing dialect/accent during onboarding (e.g., Irish English) to reduce false error flags, validating against therapist inter-rater baselines in trials, iteratively refining distortion handling (given SSD ASR error rates around 8–11%), and exploring integration with existing therapy tools (e.g., Phonological Processes) to support a hybrid workflow.

Challenges we ran into

We faced three key limitations in this version. First, dataset access: because we had no suitable pediatric disordered-speech dataset, the tool was tested on adult speakers without speech difficulties, which limits how representative the results are for the intended clinical population. Second, accuracy verification: without access to therapist-validated transcriptions, we could not benchmark outputs against a clinician’s IPA or establish inter-rater comparisons, so accuracy claims remain preliminary. Third, dialect handling: we used an American English IPA target set, with no mechanism to adapt targets across dialects or IPA conventions, meaning the system may mis-handle regional pronunciations and cannot yet distinguish dialectal variation from clinically relevant error patterns.

Accomplishments that we're proud of

We implemented a working MVP within the hackathon timeframe, and it performed well in preliminary tests. (While those early results are encouraging, they were obtained under constrained conditions (limited data and mostly adult speakers), so the MVP should be viewed as a proof of concept that validates feasibility and workflow and not as a clinically validated system yet.)

What we learned

This project pushed us to build real domain knowledge and showed how AI can go beyond its “traditional” use cases when you start from a real workflow problem. It also reinforced the importance of multidisciplinary collaboration—a clinically useful solution is only realistic when engineers, therapists, and researchers co-design it together. Finally, we learned firsthand that audio AI has hard limits, especially for child and disordered speech, where data scarcity, accent variation, and noisy environments can quickly break models without careful guardrails and validation.

What's next for Bright Speech

Right now, Bright Speech works in a controlled format: it shows a single target word on screen, records the child’s production, and generates the corresponding phonetic output. Going forward, our goal is to evolve this into a full speech-to-IPA system with strong guardrails for accuracy and clinician trust—expanding from isolated words to phrases and connected speech while clearly flagging low-confidence segments for review. In parallel, we want to add longitudinal tracking that records recurring error patterns over time and compares historical sessions to recent ones, so therapists can see whether a child’s speech is improving, stagnating, or regressing in a structured, measurable way.