Learning to read is one of the most challenging milestones in a child's life, and many children in lower to middle income countries do not get equal literary opportunity. Classrooms in countries like South Sudan have classrooms with a student-teacher ratio of 75:1, meaning children often do not get the necessary attention. Furthermore, many young girls do not get the opportunity to learn at all, suppressing their freedoms later in life. It is projected that $21 trillion of future income will be lost as a result of the growing illiteracy, but most adults in these countries have access to mobile phones. Introducing Echho.
What it does
Echho is an interactive AI reading tutor.
Story Generation: It creates unique, personalized stories based on any topic the child loves (e.g., "Space Adventure", "Dinosaurs") at their specific reading level. Live Listening: The child reads the story aloud. Echho listens in real-time. ** Precision Analysis**: It analyzes every phoneme using advanced speech recognition to detect specific errors (e.g., missing the 's' in 'stars' or the 'th' in 'the'). Coach-Like Feedback: Instead of generic scores, it provides specific, child-friendly advice (e.g., "Make a snake sound: ssss") and an encouraging summary. Accessibility: Everything—including the feedback and tips—can be read aloud by the AI tutor for children who aren't yet reading fluently.
How we built it
We built Echho using a modern web stack powered by three distinct AI layers:
Frontend: Built with React and Tailwind CSS for a clean, accessible interface. We focused heavily on visual feedback, using "pill" style tips and simple icons. Speech Analysis Core: We utilized Azure Speech Services (Pronunciation Assessment) for the heavy lifting. This gives us granular data down to the syllable and phoneme level. The Brain: OpenAI (GPT-4o) acts as the pedagogue. It generates the stories and, crucially, transforms Azure's raw technical data into friendly, "teacher-voice" feedback. We pioneered a specific prompting strategy to prevent vague "AI fluff" and ensure actionable advice. The Voice: ElevenLabs provides the high-quality Text-to-Speech (TTS) voices (like "Rachel" or "Nicole") that make the tutor feel like a real person, not a robot.
Challenges we ran into
The "Fluff" Problem: Initially, our AI feedback was too generic ("Great job, keep practicing!"). It felt like a participant ribbon. We had to iterate heavily on our prompt engineering to force the model to identify specific trouble sounds and give concrete mechanical advice (e.g., "Open your mouth wider"). Data Granularity: Connecting the dots between Azure's complex phoneme JSON output and a simple sentence a 6-year-old can understand was tricky. We had to write custom logic to filter "bad" phonemes and pass them effectively to the LLM. Accessibility vs. UI: Designing a UI that is readable for children who can't read well yet is a paradox. Adding audio buttons to every piece of text was a crucial pivot we made late in the process.
Accomplishments that we're proud of
Specific Phonetic Feedback: We're really proud that Echho doesn't just say "You mispronounced this." It says, "You had trouble with the 's' sound." That level of specificity is rare. The Simplicity: The app feels simple and playful, despite the heavy AI processing happening in the background. Seamless Audio Integration: Having the tutor read the story, then the child reads, then the tutor reads the feedback—it creates a complete auditory loop.
What we learned
Prompt Engineering for Kids: Writing prompts for a "child persona" requires specific constraints. You can't just say "be simple"; you have to forbid complex words and mandate playful examples. Audio Latency: Real-time audio processing requires careful handling of streams to avoid lag that breaks the user's flow. Empathy in Code: We learned that raw accuracy scores can be discouraging. converting a "40% accuracy" into "You're getting better at your 'r' sounds!" is a design choice that matters.
What's next for Echho
Gamification: Adding streaks, badges, and a "Reading Pet" that grows as you read more. Classroom Mode: A dashboard for teachers to see specific phonemic struggles across their entire class. "Echo" Mode: Allowing the child to hear their own recording played back immediately after the tutor's correct pronunciation for immediate self-correction.
Built With
- chatgpt
- elevenlabs
- react
- vite
Log in or sign up for Devpost to join the conversation.