Inspiration

We've watched it happen to people we know: a torn ACL from a bad squat, a rotator cuff injury from months of poor form, chronic back pain from home workouts that were supposed to help. The home fitness boom exploded after the pandemic, but it brought a hidden crisis: millions of people now train alone in their living rooms with zero feedback, zero correction, and zero idea they're one bad rep away from injury.

The deeper problem is that most people simply don't know how to work out correctly. Nobody teaches you what a proper squat looks like, how far your knees should travel, or whether your back is rounding under load. You follow a YouTube video in your bedroom, mirror your best guess, and hope for the best. Without a coach in the room, you're training blind.

AI has transformed smart home thermostats, security cameras, and lighting — but the home mirror? Still just a reflection. Personal trainers charge $80–$150 an hour and can't be in your home every day. Two of us have personally felt this: training alone at home, unsure if our form was right, unable to afford consistent coaching. That frustration became the spark for Pur-Form.


What It Does

Pur-Form transforms any home mirror into an AI-powered personal trainer — a smart home fitness system that watches your form, counts your reps, and delivers instant haptic feedback through a custom wearable on your wrist. No subscriptions. No trainer. No cloud.

The smart mirror runs a complete pose estimation pipeline on an AMD KV260 FPGA fed by a Qualcomm web camera, detecting 14 body keypoints at ~20ms latency — a 50× improvement over cloud-based inference. It analyzes joint angles frame-by-frame, counts reps, detects form errors in real time, and streams a live annotated video feed to any device on your home network.

The custom ESP32 wearable bracelet fuses four sensor streams across two Arduino modules:

  • Arduino Movement module (accelerometer + gyroscope) — tracks rep speed, movement intensity, and 3D wrist orientation to catch form errors the camera can't see
  • Arduino Vibro module — delivers precisely timed haptic patterns directly to the wrist, the closest physical point to the action
  • Temperature sensor — monitors body heat for overexertion alerts
  • Humidity sensor — sweat analysis as a real-time proxy for exertion level

When the AI detects bad form, the wristband fires a triple-pulse haptic alert. Every completed rep gets a confirmation buzz. Haptic feedback is the only modality that reaches you exactly when you need it: physically, instantly, without breaking your focus or your rep.

The camera and IMU work together because neither is enough alone. Occluded joints, home lighting variation, and clothing all limit camera accuracy. The wrist IMU catches what the mirror misses — reps performed too fast, wrist rotation invisible from the front, and physiological overload that no camera can detect.


How We Built It

AMD KV260 FPGA Inference Pipeline

We deployed sp_net from the Vitis AI Model Zoo r2.5.0 onto the KV260's DPU (DPUCZDX8G, B4096 config) — chosen because its fingerprint exactly matches the KV260 B4096 configuration, enabling native INT8 execution entirely on FPGA fabric with zero CPU fallback and zero cloud dependency. The visual input comes from a Qualcomm web camera running at native 1920×1080, resized in software to 128×224 before DPU ingestion to avoid GStreamer backend conflicts.

The pipeline runs two DPU subgraphs with a CPU average pooling bridge:

$$\text{Frame} \xrightarrow{\times 0.5} \text{INT8} \xrightarrow{\text{DPU: Conv}} [1,7,4,184] \xrightarrow{\text{CPU: AvgPool}} [1,1,1,184] \xrightarrow{\times 8} \text{INT8} \xrightarrow{\text{DPU: FC}} [1,28] \xrightarrow{\times 4} 14\text{ keypoints}$$

Joint angles are computed inline using the dot-product formula:

$$\theta = \arccos\left(\frac{\vec{u} \cdot \vec{v}}{|\vec{u}||\vec{v}|}\right)$$

Rep counting uses exponential moving average smoothing with a 0.5s cooldown to prevent double-counting. Supported exercises include squats, bicep curls, and lateral raises, each with specific angle thresholds and form-check rules. Results stream via Flask to any browser on the home network and to the ESP32 wearable for haptic delivery.

ESP32 Wearable (Espressif + Arduino Modulino)

The wearable is built around an ESP32 microcontroller paired with two Arduino Modulino modules that handle sensing and actuation independently, keeping the firmware clean and modular.

Arduino Modulino Movement houses a 6-axis IMU providing:

  • 3-axis accelerometer (X, Y, Z) — measures the magnitude and direction of every movement. Fast, jerky reps show up immediately as acceleration spikes. Excessive forward lean during a squat registers as a shift in the gravity vector. This catches the kind of sloppy momentum-driven reps that look fine on camera but bypass the muscle entirely.
  • 3-axis gyroscope (roll, pitch, yaw) — measures rotational velocity in three planes. Wrist flare during a bicep curl, lateral tilt during a lateral raise, and twisting under load during a squat all produce distinctive gyro signatures the camera cannot see from the front. The gyro runs continuously and its readings are EMA-smoothed before being fused with visual keypoint data server-side.

Arduino Modulino Vibro is the haptic output engine. It drives a precision vibration motor with configurable intensity and duration, controlled entirely over I2C from the ESP32. We designed two distinct haptic patterns:

  • Single short buzz — rep confirmed. Clean, unambiguous, fires every time the angle threshold crosses in both directions.
  • Triple rapid pulse — form alert. Three 80ms bursts in quick succession, impossible to confuse with a rep confirmation. Fires the moment the FPGA detects a form violation.

The vibration sequencer runs as a non-blocking state machine — no delay() calls anywhere in the critical path. This means the ESP32 never stalls waiting for a buzz to finish, keeping sensor reads and WiFi polling continuous.

Temperature and humidity sensors complete the physiological picture. Body temperature creeping above 38.5°C triggers a rest alert. Humidity readings serve as a proxy for sweat rate and cumulative exertion — data the camera has no access to whatsoever. Together these turn the wristband into a true biometric sensor, not just a haptic output device.

The bracelet's web server exposes /data for live biometric streaming, /buzz for rep-confirmation haptics, /warn for form-alert haptics, and /wifi for system diagnostics. All sensor data is fused server-side with the FPGA's visual keypoint output to produce alerts neither system could generate independently.


Challenges We Ran Into

Loading the FPGA bitstream was our first wall. The KV260 auto-loads k26-starter-kits on boot, occupying the only PL slot. Every load attempt returned Error: -1 until we learned to always run xmutil unloadapp first — something no documentation made obvious. That alone cost hours.

VART's DMA memory model nearly broke the entire pipeline. On aarch64, VART pre-allocates physically contiguous memory for DMA. Allocating new NumPy arrays each frame moves them to different physical addresses, corrupting DMA pointers and triggering segfaults and bus errors with no useful trace. The fix was strict: pre-allocate every buffer once outside the loop, update in-place with np.copyto, and write the entire inference pipeline inline — no helper function calls inside the loop, ever.

Integrating all subsystems into one cohesive product was the hardest systems challenge. The KV260 runs vision. The ESP32 manages biometrics and haptics. Flask bridges them over the home network. Getting reliable sub-100ms round-trip communication across all three required non-blocking design at every layer simultaneously.

No confidence scores in sp_net. Occluded joints still output coordinates, generating phantom keypoints. We mitigated this with positional sanity checks, EMA smoothing, and leaning on the IMU to cover exactly the cases the camera misses.


Accomplishments That We're Proud Of

  • Achieving ~20ms inference latency on FPGA fabric — a 50× improvement over our original cloud pipeline — with zero internet dependency
  • Building a fully integrated hardware-software-ML stack from scratch: FPGA vision, embedded wearable, wireless communication, and haptic feedback all working together as a single system
  • Solving the aarch64 DMA memory corruption problem that took hours of blind debugging with no documentation to guide us
  • Designing meaningful haptic language — distinct patterns for rep confirmation vs. form correction — so the wristband communicates clearly without any screen or audio
  • Creating a smart home product that requires no gym, no subscription, and no cloud to deliver professional-grade coaching in your own home

What We Learned

  • How to deploy quantized neural networks on AMD FPGA fabric using the Vitis AI runtime, XRT, and the XIR graph model format
  • The two-subgraph architecture of sp_net and how to correctly sequence DPU runners with a CPU intermediate layer on the KV260
  • How DMA memory management works on aarch64 and why physical memory contiguity is critical for real-time FPGA inference
  • How to build a fully non-blocking embedded system on ESP32 handling sensors, WiFi, and haptics simultaneously without a single blocking call
  • How to fuse visual pose estimation with IMU data from the Arduino Modulino Movement to achieve accuracy neither modality can reach alone
  • That the hardest part of building a multi-system product is not any single component, it is making all of them work together reliably, in real time, under real conditions

What's Next for Pur-Form

  • Fine-tune sp_net on home workout footage for better accuracy across diverse lighting and room environments
  • Add deadlift and overhead press to the exercise library
  • Miniaturize the wearable into a sleek, production-ready form factor
  • Build a home dashboard for long-term workout history and progress tracking
  • Expand temperature and humidity analysis into a full biometric fatigue model that knows when to tell you to rest

Built With

Share this project:

Updates