HEARO

Inspiration

Alice, one of our team members, has a younger brother named Eric who was born with congenital hearing loss. Although he underwent the cochlear implants surgery, which placed small electronic devices in his ears to provide a sense of sound, learning to speak clearly remains a long and challenging process, requiring continuous speech therapy. Eric attends speech therapy twice a week, but each session involves nearly three hours of round-trip travel, and the costs are substantial. In addition, his parents must consistently support his practice at home. Repeating a single word or sentence hundreds of times is often necessary. Unfortunately, because many parents are not trained in speech instruction, parents may unintentionally demonstrate incorrect pronunciations that hinder their child's speech development. Home practice is also emotionally taxing. Children can easily lose focus, and parents may become impatient. While platforms like YouTube offer educational videos, these are generally designed for children with normal hearing. The pacing is often too fast, and the content lacks interaction, making it difficult for children with hearing impairments to follow or stay engaged. Without interactive, personalized feedback, the benefits of these tools are limited.

Families like Alice’s face tremendous emotional and financial strain. Parents are particularly anxious because their child’s communication skills impact future independence and social development. Our team’s research, supported by data from the World Health Organization, shows that over 34 million children worldwide have disabling hearing loss. Speech therapy is essential for these children, yet access to long-term, consistent, and affordable treatment remains a widespread challenge. Additionally, clinicians often manage many patients at once, making it difficult to track individual progress between sessions.

What it does

This app addresses the challenges faced by children with hearing impairments by designing an interactive application that supports listening and speaking practice at home, complementing formal speech therapy. Unlike passive video-based learning, the app encourages active participation: children press buttons, listen carefully, respond out loud, and receive real-time feedback through speech recognition. AI models evaluate their pronunciation accuracy and prompt repeated practice for frequently mispronounced words, reinforcing memory and learning outcomes. AI is used to provide consistent and accurate pronunciation support. During each exercise, children first hear AI-generated audio or video that models the correct

pronunciation. This offers a reliable reference, avoiding the variability that can occur when relying on different speakers such as parents or teachers. After listening, children record their own voice in the app. The recording is analyzed by an AI-based speech assessment model, which returns assessment scores. This immediate feedback helps both children and parents identify errors and make adjustments in real time. The app continuously tracks learning progress. Words that present ongoing difficulty are automatically reintroduced for extra practice. Progress summaries help parents and clinicians monitor both improvements and persistent challenges, enabling more informed support. By leveraging AI for guidance and assessment, the system reduces frustration, provides motivational rewards, supports flexible scheduling, and lightens the burden on families—all while improving the overall effectiveness of home practice.

How we built it

Our app has two login portals: one for patients and one for doctors. The patient portal has five features.

Listening Practice The app plays an audio clip and presents four similar word options. Immediate feedback is given for correct or incorrect answers. After a correct response, reward animations (such as candy animations or encouraging phrases like“Great job!”) appear to motivate children. Practice can be repeated to reinforce memory.
Speaking Practice An AI-generated virtual instructor provides clear vocal models for the child to emulate. Upon recording, the AI analyzes the audio to provide instant accuracy scores and targeted feedback for improvement.
Lip Movement Practice (Pilot) While the user speaks, the app captures video to perform a visual-spatial analysis of their lip movements. The AI then provides real-time feedback and visual cues to help the user adjust their articulatory positioning.
My Progress Users can filter data by week, month, or year and switch between metrics such as accuracy, time spent, points, and completion rate. Visual charts display strengths and areas needing improvement.
My Feedback Users can view feedback from doctors and reply by typing messages to ask questions or seek clarification. The doctor portal has four features.
Message Parents Doctors can directly communicate with parents. 4
Therapy Plan Doctors can assign daily practice goals and activities.
Patient’s Progress Doctors can load different patients and review progress by week, month, or year across accuracy, time, points, and completion metrics.
Send Feedback Doctors can select a patient, rate performance across four areas using a 1–5 star scale, leave comments, and review patient responses. Design Considerations User experience was a central design focus. We incorporated visual progress indicators that show improvement without making children feel like they're being tested. These visuals are intuitive and accessible to both children and doctors. The app is designed to support, not pressure, its users. Encouraging animations, brief practice sessions, real-time interaction, and small rewards help sustain motivation, lengthen attention spans, and make home practice more enjoyable and productive for children with hearing impairments. Our solution uses AI to provide objective, real-time feedback that extends therapy beyond the clinic. It adopts two main AI technologies: Azure Speech Services for speech assessment and MediaPipe FaceMesh for visual articulation tracking. For speech practice, the app records a child’s voice and uses Azure Speech Pronunciation Assessment to evaluate pronunciation quality. It analyzes features such as accuracy, fluency, completeness, and prosody, then provides immediate feedback that helps children and parents identify errors and make corrections in real time. AI is essential because speech is highly variable. Differences in voice, pace, accent, and sound production make rule-based systems too limited for accurate evaluation. Human feedback at home can also be subjective and is often neither precise nor quantitative, especially when parents are not trained in speech instruction. In addition, effective home practice requires sustained attention, patience, and consistency from parents, which can be difficult to maintain over time. By contrast, AI models trained on large and diverse speech datasets can recognize patterns in spoken language and deliver more objective, consistent, and measurable assessments. For lip movement practice, the app uses MediaPipe FaceMesh to analyze video input from the child and track facial landmarks around the lips and jaw. This allows the system to evaluate how mouth shapes are formed during speech and provide visual feedback on articulation. AI is important because children naturally move their heads and faces while speaking, and a simple algorithm would struggle to track these changes reliably in real time. AI does not replace therapists. However, therapists are not available for the daily, repetitive practice children need to make progress. Our AI makes that practice more accessible by providing immediate, consistent, and scalable feedback anytime. ## Challenges we ran into Throughout development, we faced several technical and design challenges. One major obstacle involved experimenting with advanced speech assessment models. Limited training data, inaccurate results and integration difficulties made some models unsuitable for our needs. After careful research and evaluation, we chose to use Azure Speech Services, which offers a reliable, production-ready pronunciation assessment well-suited for consistent home use. 8 For the lip movement practice feature, we leveraged MediaPipe FaceMesh to track facial and lip landmarks. However, evaluating speech through lip movement alone remains a challenge, as many distinct phonetic sounds share similar visual patterns. Another challenge was integrating independently developed components—speech analysis, lip-movement detection, progress tracking, and messaging—into a single cohesive application. As features accumulated, small code changes sometimes caused unexpected behavior elsewhere in the system. We addressed this by restructuring the codebase, testing modules incrementally, and using PyCharm and Google Colab to isolate issues. ChatGPT was also used to help interpret error messages, refactor complex logic, and improve overall code organization. Designing for children introduced additional constraints. Early versions included overly long sessions and excessive feedback, which reduced engagement. Through repeated observation and iteration, we shortened activities, simplified instructions, and emphasized encouragement over correction. These challenges reinforced the importance of testing, organization, and realistic design decisions when building AI tools for real users.

Accomplishments that we're proud of

The innovation lies in applying AI in a practical, human-centered way that connects children, parents, and doctors, making speech therapy more consistent and accessible across different environments —one of the greatest challenges faced by families of children with hearing impairments. Our app transforms home speech practice by merging interactive AI with live feedback. Moving beyond passive video-watching, children actively engage by recording their voices or videos to receive clear, personalized guidance. This active loop sustains motivation and accelerates progress far more effectively than traditional methods. Crucially, it also relieves parents from the pressure of correcting speech, empowering them to focus on encouragement and support. The app converts practice data into clear visual charts, allowing doctors to track speech development trends over time, beyond isolated clinic visits. This longitudinal data enables clinicians to remotely assess progress, pinpoint persistent challenges, and adjust treatment plans—reducing the need for frequent in-person appointments. By enabling effective remote monitoring, the app helps mitigate the widespread shortage of speech therapy specialists and the difficulty many families face in securing timely appointments. Doctors can oversee more patients efficiently while maintaining continuity of care, and families benefit from consistent professional guidance even when access to in-person therapy is limited.

What we learned

This project taught us how to apply AI to address real-world problems in the community, and deepened our understanding of how it can support long-term rehabilitation in practical, everyday contexts. The project also strengthened our commitment to user-centered design. We learned how vital it is to consider the perspectives of children, parents, and clinicians at every stage to ensure the solution is meaningful, responsible, and accessible. For example, although AI-generated feedback reduces inconsistency, we found that presenting feedback in a clear, understandable, and engaging way—especially for children—was crucial to maintaining motivation and supporting learning. Through building and testing the app, we discovered that AI is most effective as an assistive tool—not a replacement for professional therapy. The pronunciation assessment model delivers consistent feedback, but it does not diagnose or make independent decisions. Human therapists and professional guidance remain essential, with AI serving to support children, parents, and doctors. Finally, the experience highlighted the value of teamwork. Research, coding, design, and testing required close collaboration. Dividing responsibilities and supporting one another helped us overcome setbacks and avoid burnout. Facing challenges together 9 gave us not only confidence, but also a sense of purpose—showing us that collaboration is not just helpful, but essential for building meaningful technology.

What's next for HEARO

We plan to continue optimizing and iterating on the platform in the future, with enhancing the user experience remaining the core focus of our design. We have incorporated visual progress indicators designed to intuitively showcase progress without imposing the psychological pressure on children that they are "being tested." These visual elements are intuitive and easy to grasp—making the application effortless to understand and use for both children and clinicians alike. The fundamental design philosophy behind this application is to provide support to users, rather than to impose pressure. By employing encouraging animations, short practice sessions, real-time interactive features, and timely reward mechanisms, we have successfully helped hearing-impaired children sustain their motivation and extend their attention spans—thereby making home-based practice both more enjoyable and highly effective.

Built With

analysis
and
apis
assessment
azure
backend
by
colab
face
facial
for
google
host
landmark
lip
mediapipe
mesh:
movement
our
python:
replit:
service:
speech
to
used
write

Updates

qing li started this project — May 11, 2026 08:31 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.