Inspiration

  • Most speaking tools are post-session; we wanted corrective feedback during live speech.
  • The core target was low-latency phrase-level coaching, not offline transcript summaries.
  • We prioritized signals users can apply immediately (pace, fillers, confidence, drill prompts).

What it does

  • Streams browser mic audio (16 kHz mono PCM) to backend for near-real-time ASR.
  • On phrase boundaries (Deepgram final or ~800 ms silence), computes WPM, filler count, and low-confidence words.
  • Returns short LLM coaching + drill text and AWS Polly TTS with barge-in (tts_stop) when user resumes speaking.

How we built it

  • Monorepo architecture: apps/web (Next.js + TypeScript) and apps/server (Flask-SocketIO with eventlet).
  • Real-time pipeline orchestration on backend: Deepgram streaming ASR -> phrase metrics -> OpenAI tip/drill generation -> Elevenlabs TTS.
  • Separate model/ package wraps CommonAccent XLSR; filters to 6 English accents and re-normalizes probabilities for downstream use.

Challenges we ran into

  • Keeping end-to-end latency (ASR + LLM + TTS) low enough for uninterrupted practice.
  • Stabilizing phrase segmentation under variable speaking cadence without premature or delayed triggers.
  • Tuning debounce/rate limits to avoid over-calling LLMs while still feeling responsive.

Accomplishments that we're proud of

  • Delivered a working end-to-end real-time coaching loop in a web app (capture -> ASR -> feedback -> TTS).
  • Implemented barge-in behavior so users can interrupt playback and continue speaking naturally.
  • Added an accent-classifier module with a clean API (classify, get_embedding) for future personalization.

What we learned

  • Real-time UX quality is dominated by orchestration and timing, not any single model component.
  • Phrase-level processing is a practical latency/quality trade-off for actionable feedback.

What's next for Eloception

  • Add persistent user/session history to track trends in pace, fillers, and confidence over time.
  • Expand personalization using target-accent goals and longitudinal performance signals.
  • Productionize the stack with auth, better observability, and systematic latency/quality benchmarks.

Built With

Share this project:

Updates