Elevator Pitch

Physio is a voice-first rehab companion using CV MediaPipe, Raspberry Pi sensor fusion, and AI summaries for guided at-home physical therapy.


Inspiration

Physical therapy is powerful, but the hardest part often happens outside the clinic: repeating exercises consistently, moving through the correct range of motion, holding positions long enough, and understanding whether the session was actually controlled.

We built Physio to make at-home rehab practice more guided, measurable, and accessible. The goal is not to replace a licensed physical therapist. Physio is a companion layer that helps users follow prescribed exercises, understand movement quality, hear their results, and save structured session history they can review later.

Our core idea was to combine CV MediaPipe computer vision, Raspberry Pi sensor telemetry, local deterministic movement analysis, Gemini post-session reasoning, and ElevenLabs voice feedback into one physical therapy intelligence pipeline.


What it does

Physio guides users through rehab-style exercises and turns each session into understandable feedback.

The app supports:

  • Elbow Flexion / Extension, using CV MediaPipe to track shoulder, elbow, and wrist landmarks and measure range of motion.
  • Seated Forward Press / Chest Press, using CV MediaPipe plus Raspberry Pi distance sensing to measure push depth, steadiness, and arm extension.
  • Real-time movement analysis, including rep count, exercise phase, hold time, jitter, smoothness, and movement quality.
  • Post-session AI summaries, where Gemini turns structured session metrics into a readable recovery recap.
  • Voice-first feedback, where ElevenLabs reads the session summary aloud like a rehab coach speaking directly to the user.
  • Patient check-ins, where users can report fatigue, tightness, discomfort, or pain.
  • Session history and progress tracking, including rep breakdowns, replay graphs, and therapist-style notes.

The live exercise loop is local and deterministic. Gemini does not guess whether the user performed a rep correctly. The local analyzer owns the movement logic; Gemini explains the results afterward.


How we built it

Physio is a full-stack multimodal rehabilitation system.

The frontend uses React + Vite. The live tracking layer uses CV MediaPipe Tasks Vision to extract shoulder, elbow, wrist, hip, and hand landmarks from the webcam stream. From those landmarks, Physio computes joint angles, landmark confidence, movement phases, and rep metrics.

For the forward press exercise, we added a Raspberry Pi distance sensor. The Pi streams distance values through a WebSocket every 0.05 seconds. This gives Physio a smoother hardware signal for push depth and jitter sensing, while CV MediaPipe handles body geometry and joint alignment. CV MediaPipe is strong, but webcam landmarks are not perfect; lighting, occlusion, and frame rate can create noise. The sensor gives a second modality for linear motion.

The local analysis engine is the core of Physio. It uses:

  • CV MediaPipe landmark extraction
  • raw and smoothed angle traces
  • Raspberry Pi WebSocket telemetry
  • rolling median smoothing
  • exponential moving average filtering
  • hysteresis thresholds
  • confidence gating
  • phase-based state machines
  • grouped jitter detection
  • rep-level scoring
  • session recording

For elbow flexion, a rep must progress through:

start straight -> flexing -> target bend -> hold -> extending -> rep complete

For forward press, a rep must progress through:

start bent -> pushing -> target extension -> hold -> controlled return -> rep complete

This makes Physio more than a threshold checker. It validates the temporal structure of the exercise.

The backend uses FastAPI and SQLite to store live packets, recordings, rep breakdowns, session results, patient feedback, Gemini outputs, and progress history. We use Gemini through Vertex AI for post-session analysis and ElevenLabs for text-to-speech and voice check-ins.


Challenges we ran into

The biggest challenge was making noisy real-time movement data usable.

CV MediaPipe landmarks are powerful, but they are not perfectly stable. A smooth curl can still create small angle spikes, and a shaky movement can sometimes look deceptively smooth after filtering. We had to tune smoothing and jitter detection so the app would not punish normal movement while still catching real instability.

Rep counting was another hard problem. A rehab rep is not just “the angle changed.” The movement has to start correctly, enter the target position, hold, return, and remain trackable. We built deterministic state machines so Physio only counts complete reps.

Sensor fusion was also difficult. The Raspberry Pi sensor streams distance data separately from the CV MediaPipe webcam stream. We had to define clear roles: CV MediaPipe owns biomechanics and joint alignment, the Raspberry Pi sensor owns linear displacement and steadiness, and the local analyzer fuses both into rep metrics.

We also had to keep the AI safe. Since Physio is medical-adjacent, Gemini cannot diagnose, prescribe treatment, or replace a physical therapist. It only summarizes structured session data and reminds users to follow their therapist’s plan.


Accomplishments that we're proud of

We are proud that Physio became a complete physical therapy intelligence pipeline instead of just a webcam overlay.

We built:

  • real-time CV MediaPipe pose tracking
  • Raspberry Pi WebSocket distance telemetry
  • sensor-fused forward press analysis
  • elbow flexion range-of-motion tracking
  • deterministic movement state machines
  • smoothing and jitter detection
  • rep-by-rep breakdowns
  • replay graphs with raw vs. smoothed motion
  • Gemini post-session summaries through Vertex AI
  • ElevenLabs spoken feedback
  • patient check-ins
  • therapist-style session notes
  • saved history and progress tracking

The strongest architectural decision was separating movement truth from AI explanation:

local deterministic analyzer = movement correctness
Gemini = human-readable interpretation
ElevenLabs = voice-first accessibility

That made the system more reliable and easier to explain.


What we learned

We learned that real rehab intelligence is not just computer vision and not just generative AI. It needs both signal processing and human-centered communication.

CV MediaPipe gave us body landmarks, but we needed smoothing, hysteresis, confidence gates, and state machines to make those landmarks useful. The Raspberry Pi sensor helped us understand why hardware telemetry can strengthen computer vision for linear exercises like a forward press.

We also learned that generative AI works best when it is grounded. Instead of asking Gemini to judge raw motion, we gave it deterministic metrics and asked it to explain them clearly. ElevenLabs made that explanation easier to consume by letting the AI agent speak directly to the user.

The biggest product lesson was that accessibility is not only about collecting data. It is about making the data understandable.


What's next for Physio

Next, we want to expand Physio into a broader home rehab platform.

Future work includes:

  • more therapist-prescribed exercises
  • better automatic sensor calibration
  • mobile support
  • clinician dashboards
  • longitudinal progress analytics
  • improved hardware packaging
  • therapist-created exercise routines
  • stronger patient-reported outcome tracking
  • clinical review of scoring thresholds and feedback wording

The long-term vision is to help users practice prescribed rehab exercises at home, understand their movement quality, hear their results, and return to their care team with better session data.


Generative AI Usage

Yes. We implemented generative AI using Gemini through Vertex AI.

Gemini is used after the exercise session, not as the live biomechanics engine. During the session, Physio uses CV MediaPipe, Raspberry Pi distance telemetry, smoothing, and deterministic analyzers to compute reps, phase, jitter, range of motion, push depth, and movement quality.

After the session, Physio sends Gemini a structured packet containing aggregate metrics, rep-by-rep data, tracking quality, common issues, patient feedback, and local recommendations. Gemini turns that data into a readable summary with what went well, what to focus on next time, safe encouragement, and a reminder to follow the user’s therapist plan.

We then pass Gemini’s spoken summary to ElevenLabs, allowing the AI rehab agent to speak directly to the user. This makes the results more accessible than raw charts or technical metrics alone.

Built With

  • computervision
  • elevenlabs
  • fastapi
  • gemini
  • heygen
  • mediapipe
  • motionsensor
  • rasberrypi
  • speech-to-text
  • sqlite
  • text-to-speech
  • vertexai
  • vite
  • websockets
Share this project:

Updates