EchoLearn

architecture diagram

Project title

EchoLearn

Tagline (one line, shown under title)

The interesting friend who makes your English better without you noticing.

Inspiration

Most English exam prep tools solve the wrong problem.

Standard TOEFL and IELTS preparation creates what researchers call a cognitive double-burden: students are forced to decode the language and engage with unfamiliar academic topics at the same time. The result is lower engagement, higher anxiety, and wasted cognitive bandwidth — not because the student lacks ability, but because the content has nothing to do with their life.

Then there's the emotional problem. Tools like Duolingo are built on guilt mechanics — streaks, punishing owls, notifications designed to shame you back into the app. They treat motivation as a problem to be exploited rather than energy to be channelled.

Duolingo is a clingy ex. EchoLearn is the interesting friend who makes you better without you noticing.

We asked a different question: what if speaking practice started from something the student actually cared about? What if the "tutor" already knew what was happening in your world, and just wanted to talk about it?

That question became EchoLearn.

Southeast Asia is the fastest-growing English exam market in the world. In Vietnam alone, over 7 million students are navigating IELTS and TOEFL preparation — mostly through rote repetition and content that has nothing to do with their lives. EchoLearn is built for them.

What it does

EchoLearn is an immersive speaking practice app where students build a personalised AI Echo — a character with its own world and personality, secretly built from the student's own interests. Each practice session is a ride: the student enters the Echo's world, they have a natural conversation sparked by something real from their interest universe, and the Echo guides them toward better English without it ever feeling like a test.

The Echo mechanic

The student doesn't fill out a profile form. They build a character. They give their Echo a name, choose a world aesthetic (cyberpunk, fantasy, sci-fi, anime), and pick a personality archetype (the Analyst, the Provocateur, the Sage). Every creative choice is secretly capturing their preferences and content universe — covert interest profiling disguised as character creation.

The Echo then uses that profile to find content the student genuinely cares about, and initiates conversation from it.

The first bait

Immediately after the Echo is created — before the student even leaves the app — the Echo says: "I already found something for you." The first conversation starts right there. The student never gets a chance to forget the app exists.

The session: a taxi ride

Every session is modelled on a taxi ride. The student gets in (enters the Echo's world). The Echo has something for them (a piece of real content from their interest universe). They talk naturally. They arrive (session ends, no guilt, no score). The destination — the big one, across many rides — is fluency and exam readiness.

Implicit correction

The Echo never flags grammar errors. It naturally echoes back corrected versions of what the student said within its own replies. Correction feels like conversation, not marking. The student internalises better phrasing without ever experiencing the anxiety of being corrected.

The passport

Progress is not shown as a score. It's visualised as a passport with stamps — each milestone is a new destination reached, a new stamp earned. Weekly, the Echo delivers a Spotify Wrapped-style recap: personal, visual, emotional. "You spoke for 23 minutes this week. Your sentence complexity is growing." Progress as a journey, never as a report card.

How we built it

EchoLearn's architecture connects four systems into a single, seamless loop: (please image)

BrightData — the content layer

The Echo's "baits" — the real-world content that sparks each conversation — are powered by BrightData. In production, BrightData scrapes content sources mapped to the student's interest profile: esports news, gaming patch notes, K-pop releases, anime drops. Standard scrapers get blocked by JavaScript-heavy, geo-restricted, and paywalled sources. BrightData's proxy network handles all of it. The result is a Echo that always has something new, because the student's world is always moving.

For demo stability, the content feed is seeded with a real, recent news item. The pipeline architecture is production-ready.

Echo profile → OpenAI system prompt

When a student builds their Echo, three fields are captured: name, world aesthetic, and personality archetype. The moment they hit "Summon", these are assembled into a GPT-4o-mini system prompt that governs every subsequent exchange. The personality archetype maps to a behavioural description (the Analyst is "cool, precise, speaks with calm authority"). The world aesthetic maps to a setting ("you inhabit a neon-lit city where information flows like electricity"). The implicit correction rule is baked in as a hard constraint: never flag errors — weave corrections naturally into your replies.

Three student choices. One complete pedagogical persona.

ElevenLabs — the Echo's voice

The Echo doesn't send text replies. It speaks. ElevenLabs' eleven_turbo_v2 model generates the Echo voice in real time, with voice settings tuned for the character: stable, expressive, natural. The voice is what makes the Echo feel like someone, not something. It's also what makes the implicit correction work emotionally — hearing the correct form spoken naturally is meaningfully different from reading it.

Web Speech API — the student's voice

The student speaks through the browser's native speech recognition. No external service, no latency overhead. Interim transcripts appear live as the student talks, giving immediate feedback that they're being heard. The mic locks during Echo's response — mirroring the natural rhythm of real conversation.

The full loop: student speaks → Web Speech API transcribes → OpenAI generates response → ElevenLabs voices it → student hears Echo speak → conversation continues.

Challenges we ran into

The cognitive double-burden is hard to solve without recreating it

Designing a speaking practice tool that doesn't feel like a test requires resisting every instinct to add structure. Every time we considered adding a score, a feedback panel, or a correction overlay, we had to ask: does this serve the student's fluency, or does it serve our anxiety as designers? The passport and implicit correction model emerged from that discipline — progress should be felt, not measured in the moment.

Keeping the Echo in character under implicit correction

The GPT system prompt needed to simultaneously maintain a distinct personality, sustain a natural conversation, and apply pedagogical correction — all without the student noticing the correction is happening. Getting the prompt architecture right required careful sequencing: persona first, then topic context, then the correction constraint as a hard rule. Early versions broke character whenever a correction was warranted. The final version handles it seamlessly.

UI vs. Echo aesthetic — a design architecture problem

The Echo's world changes per student — one student gets a cyberpunk cityscape, another gets a fantasy forest. But the app's UI chrome (buttons, typography, layout) needs to stay consistent. We had to design a clear separation: the UI is the stage, the Echo's world is the set design. The two layers are visually unified in the demo but architecturally independent — built to support any aesthetic without rebuilding the interface.

Accomplishments we're proud of

The demo loop works end-to-end in a single browser tab, no installation required. A judge can go from zero to a live conversation with a speaking AI Echo in under two minutes.

The implicit correction mechanic is invisible in the best possible way. In testing, users did not notice they were being corrected — they just found themselves speaking more carefully as the conversation progressed. That's the whole point.

The character creation flow produces genuine emotional investment. Users name their Echo, choose their world, and spend real time on the personality choice. They're not filling out a form — they're making a character. The app becomes theirs before the first conversation starts.

What we learned

Speaking is the hardest English skill to practice alone — and the one most neglected by existing tools — precisely because it requires another person. The AI voice layer isn't a feature, it's the entire product premise. Without ElevenLabs, EchoLearn is a chatbot. With it, it's a conversation.

We also learned that the most powerful pedagogical choices are the ones the student never consciously experiences. Implicit correction, content relevance, low-pressure session endings — none of these are features the student can point to. They just feel the difference.

What's next for EchoLearn

Live BrightData integration — moving from seeded content to a real-time scraping pipeline, triggering on recency signals from sources mapped to each student's interest profile.

Whisper for speech input — replacing Web Speech API with OpenAI Whisper for significantly better accuracy on Vietnamese-accented English. This matters enormously for our primary user base.

The full passport — expanding the milestone system with a visual passport that fills across sessions, a weekly Wrapped-style recap delivered in the Echo's voice, and a Echo whose world visually expands as fluency grows.

Multi-interest profiles — the current demo maps to a single interest domain (esports). The full product allows the Echo to range across multiple interest universes, so a student who loves both gaming and K-pop has a Echo that moves between both worlds.

IELTS/TOEFL speaking task formats — embedding specific task types (describe a graph, argue a position, tell a personal story) into the conversation arc naturally, so students practice exam formats without ever seeing the exam framing.