Inspiration

We've all been there — wanting to get better at something physical but having no idea if we're actually doing it right. Most people who work out alone are guessing. They watch YouTube videos, copy what they see, and hope for the best. Bad form doesn't just slow your progress — it causes injuries that make people quit entirely. Personal trainers exist to solve this, but at $50–200 a session, they're not realistic for most people. We wanted to build something that puts a real coach in everyone's pocket, for free, forever.

What it does

FitForm is a fully offline AI coaching app that gives real-time form feedback for squats and basketball jump shots. Using Google LiteRT and MoveNet Lightning running on the Snapdragon 8 Elite's Hexagon NPU, the app analyzes your movement at 30 frames per second and overlays a color-coded skeleton — green for correct form, red when something needs fixing. After each set, you get a full breakdown: rep count, per-rep scores, what went wrong, and coaching cues to improve. Every session is recorded and saved for replay with the skeleton overlay synced to the video. No cloud. No internet. No subscription. Everything runs on the device.

How we built it

FitForm is built in Kotlin with Jetpack Compose for the UI and CameraX handling the camera pipeline. Pose estimation runs through Google LiteRT using the MoveNet Lightning INT8 model, routed to the Hexagon NPU via Android's NNAPI delegate. We pre-allocated inference buffers to eliminate garbage collection pressure on the hot path, implemented NPU warmup at startup to eliminate cold-start latency spikes, and used CameraX's KEEP_ONLY_LATEST frame strategy to ensure the UI always reflects the most current pose without memory backpressure building up. The INT8 quantized model runs at roughly 6ms per frame on the NPU, compared to 25–40ms on CPU, with a fraction of the power draw. A full fallback chain handles devices without NPU support automatically, and a MockPoseEstimator keeps the app running even without the model file loaded.

Challenges we ran into

The hardest problem was feedback timing. Our first version graded every single frame continuously, which meant that during the transition between squat positions — when your body is mid-movement — the analyzer would flash red even when nothing was wrong. The user was doing everything correctly, but the in-between frames didn't match any valid form pattern. The fix was phase-based grading: detect which phase of the movement the user is in and only evaluate form at the moments that actually matter — the bottom of the squat, the peak of the jump shot. That change made the feedback feel like a real coach instead of a broken counter. We also found that hardcoded angle thresholds don't generalize across body types. A threshold tuned for one athlete fails for someone with different proportions. This pushed us toward a more robust evaluation approach grounded in biomechanical research rather than a single fixed value.

Accomplishments that we're proud of

Getting sub-10ms inference on a live camera feed entirely on-device was the technical milestone we're most proud of. Seeing the NPU badge light up and watching latency numbers tick by at 5–8ms per frame in real time made the whole thing feel real. Beyond the performance, we're proud that the feedback actually feels useful — not just a number on a screen, but directional coaching cues that tell you what to fix and why. Building something that a non-technical person could pick up and immediately understand was just as important to us as the ML pipeline underneath it.

What we learned

We learned that on-device AI is only as good as the decisions you make around when and how to evaluate. Raw inference speed matters, but framing — knowing which frames to grade, at which moments, and how to translate keypoint data into something a real person can act on — is what separates a technical demo from a useful product. We also learned how much LiteRT simplifies the path from model to hardware acceleration. Enabling the NNAPI delegate is a few lines of code, and the runtime handles routing to the best available hardware automatically. That abstraction is genuinely powerful and makes the case for LiteRT as the right deployment layer for on-device AI.

What's next for FitForm

The immediate next steps are automatic rep detection to remove the manual START/END buttons, and expanding to additional exercises and sports. Architecturally, the PoseEstimator is built as a swappable interface, so adding new movements requires no changes to the UI or analysis layers. Longer term, the same LiteRT pipeline that powers FitForm today could run on AR glasses for hands-free coaching overlaid in your field of view, embedded IoT gym equipment, or drones for aerial athletic analysis. We're also interested in integrating on-device Gemma 4 via the LiteRT-LM API to generate natural language post-session coaching summaries — moving from structured cues to conversational feedback. FitForm started as a fitness app. The goal is to make it the foundation for on-device AI coaching anywhere movement happens.

The APK is in the "try it out" links section/on the GitHub release: https://github.com/adamvl7/Google-x-Qualcomm-Hackathon-2026/releases/tag/v1.0

Built With

Share this project:

Updates