Inspiration
- Most speaking tools are post-session; we wanted corrective feedback during live speech.
- The core target was low-latency phrase-level coaching, not offline transcript summaries.
- We prioritized signals users can apply immediately (pace, fillers, confidence, drill prompts).
What it does
- Streams browser mic audio (16 kHz mono PCM) to backend for near-real-time ASR.
- On phrase boundaries (Deepgram final or ~800 ms silence), computes WPM, filler count, and low-confidence words.
- Returns short LLM coaching + drill text and AWS Polly TTS with barge-in (
tts_stop) when user resumes speaking.
How we built it
- Monorepo architecture:
apps/web(Next.js + TypeScript) andapps/server(Flask-SocketIO witheventlet). - Real-time pipeline orchestration on backend: Deepgram streaming ASR -> phrase metrics -> OpenAI tip/drill generation -> Elevenlabs TTS.
- Separate
model/package wraps CommonAccent XLSR; filters to 6 English accents and re-normalizes probabilities for downstream use.
Challenges we ran into
- Keeping end-to-end latency (ASR + LLM + TTS) low enough for uninterrupted practice.
- Stabilizing phrase segmentation under variable speaking cadence without premature or delayed triggers.
- Tuning debounce/rate limits to avoid over-calling LLMs while still feeling responsive.
Accomplishments that we're proud of
- Delivered a working end-to-end real-time coaching loop in a web app (capture -> ASR -> feedback -> TTS).
- Implemented barge-in behavior so users can interrupt playback and continue speaking naturally.
- Added an accent-classifier module with a clean API (
classify,get_embedding) for future personalization.
What we learned
- Real-time UX quality is dominated by orchestration and timing, not any single model component.
- Phrase-level processing is a practical latency/quality trade-off for actionable feedback.
What's next for Eloception
- Add persistent user/session history to track trends in pace, fillers, and confidence over time.
- Expand personalization using target-accent goals and longitudinal performance signals.
- Productionize the stack with auth, better observability, and systematic latency/quality benchmarks.
Log in or sign up for Devpost to join the conversation.