Inspiration

This project started with a simple argument between two cousins who love to box together. After every sparring session, we'd argue about who landed more clean shots — and there was no good way to settle it. No judge, no camera, no scorecard could give us an objective answer. We wanted a system that could feel every punch as it happened, count it accurately, and even talk back to us in real time, like having a coach in the corner who actually knows what's going on.

That simple frustration grew into AccuBox: a wearable impact sensor that detects, scores, and narrates every punch live in a real sportscaster voice, with a conversational AI coach you can have a real-time conversation with mid-workout.

What it does

AccuBox transforms boxing training into a fully voice-first experience. You punch — and a sportscaster announces every hit. You ask "How am I doing?" — and an AI coach answers based on your live training stats, in a natural human voice.

Under the hood: a custom impact sensor mounted on a boxing target generates an electrical signal on every hit. The signal flows through a custom analog frontend into an ESP32, which streams the data wirelessly via ESP-NOW to a base station connected to your laptop. A Python application runs the AI layer:

🎙️ Real-time voice announcer powered by ElevenLabs Flash v2.5 — calls out every punch with sub-300ms latency in a deep sportscaster voice (we picked Brian for the boxing-announcer vibe). Routine count callouts ("One!", "Two!", "Three!") cut to dynamic reactions — "Combo!" when you string punches together, "Big one!" on a heavy hit, milestone callouts at 5, 10, 25, 50 punches.

🎤 Conversational AI coach — push to talk, ask anything. OpenAI Whisper transcribes your voice, GPT-4o-mini reasons over your live training stats (punch count, recent rate, average power, time since last punch), and ElevenLabs speaks the coach's response back to you. The full loop — voice in → STT → LLM → TTS → voice out — completes in 1–2 seconds.

🥊 Live punch detection with a multi-condition algorithm that filters environmental noise from real impacts.

📊 Real-time visualization with peak markers, a running count, and dialogue bubbles for every coach interaction.

The killer feature is that the coach actually sees your data. Ask "How am I doing?" and instead of a generic "Keep going!", it says "23 punches in 45 seconds — solid pace! Push for 30 in a minute!" — grounded in your real session, not platitudes.

How we built it

The AI voice layer (the heart of the project). We chose ElevenLabs Flash v2.5 because sub-300ms latency was non-negotiable — anything slower and the live commentary illusion breaks. Voice quality was the other deciding factor: hearing "Combo!" in a deep, sportscaster-quality voice the moment you land three quick punches sells the project in a way no graph could. We picked the Brian voice (nPczCjzI2devNBz1zQrb) for that classic boxing-announcer vibe.

We built a priority queue around the TTS so different speech sources don't collide:

  • Priority 0 (must play): coach replies to user questions, milestones
  • Priority 1 (cuts in line): big hits, combos
  • Priority 2 (background): routine count callouts

Pressing space to ask the coach a question instantly mutes whatever lower-priority callout was running, plays the answer, and resumes normal callouts after.

We also built a disk-based cache keyed by (voice_id + text) hash. After the first session, repeated phrases like number callouts and stock reactions never hit the API again — making the system effectively free to run during heavy testing.

The conversational coach. This is where it gets fun. Push-to-talk recording captures your voice, Whisper transcribes it, and we send the transcript to GPT-4o-mini alongside a live snapshot of your training stats:

Built With

  • adc
  • c++
  • conversational-ai
  • elevenlabs
  • esp-now
  • esp32
  • gpt-4
  • numpy
  • op-amp
  • openai
  • python
  • real-time
  • sensor
  • sounddevice
  • voice-ai
  • whisper
Share this project:

Updates