BachaTrainer

AI-powered bachata dance instructor with real-time pose detection using PyTorch ExecuTorch


Inspiration

Learning to dance bachata is challenging. You watch videos, try to follow along, but have no idea if you're doing it right. Traditional dance classes are expensive and not always accessible. Online tutorials lack feedback.

I asked myself: What if your phone could be your dance instructor?

The vision was simple: point your camera at yourself, follow a reference video, and get instant AI-powered feedback on your movements. No cloud processing, no privacy concerns, no internet required — just you, your phone, and the music.

The Arm AI Developer Challenge was the perfect opportunity to build this. Modern Arm-based phones have incredible AI capabilities with NEON SIMD instructions, but most apps still send data to the cloud. We wanted to prove that sophisticated pose detection can run entirely on-device, privately and efficiently.

Bachata dancing presents the perfect gap in the market, difficulty level, accessibility, and has clear body movements that are perfect for pose detection.

Resucitated from a Just Dance style video game https://github.com/olincollege/just-dance using Kiro.


What it does

BachaTrainer is a mobile dance training game that:

  1. Shows a reference dance video — Professional bachata choreography plays on screen
  2. Captures your movements — Your phone's camera records you dancing
  3. Detects your pose in real-time — AI identifies 17 body keypoints using ExecuTorch
  4. Compares your pose to the reference — Calculates joint angles and matches them
  5. Scores you instantly — Perfect, Great, Good, or Miss feedback as you dance
  6. Runs 100% on-device — No cloud, no internet, complete privacy

Key Features

  • Dual Video View — See the reference dancer and yourself side-by-side
  • Real-Time Scoring — Instant feedback on every movement
  • 2 Bachata Songs — "30 Minutos" and "How Deep Is Your Love" by Prince Royce
  • Offline Mode — Works without internet connection
  • Arm Optimized — XNNPACK backend leverages Arm NEON for fast inference

The Experience

┌─────────────────────────────────────────┐
│  ┌─────────────┐  ┌─────────────┐      │
│  │  Reference  │  │   Your      │      │
│  │   Dancer    │  │   Camera    │      │
│  │             │  │             │      │
│  └─────────────┘  └─────────────┘      │
│                                         │
│         Score: 847 | Combo: 12x         │
│         ████████████░░░ 78%             │
│                                         │
│              ⭐ GREAT! ⭐                │
└─────────────────────────────────────────┘

How we built it

Tech Stack

Layer Technology
Mobile Framework React Native + Expo SDK 54
AI Runtime PyTorch ExecuTorch 1.0 GA
Arm Optimization XNNPACK (Arm NEON SIMD)
Native Modules Kotlin (Android), Objective-C++ (iOS)
Pose Preprocessing YOLOv8s-pose (Ultralytics)
State Management Zustand
Styling NativeWind (Tailwind CSS)

The Build Process

Step 1: Preprocessing Pipeline (Python)

We built tools to process dance videos:

  • download_youtube.py — Fetch dance videos from YouTube
  • preprocess_video_yolov8.py — Extract poses using YOLOv8s-pose
  • visualize_tracking.py — Verify the right dancer is being tracked
  • batch_process_yolov8.py — Process multiple videos at once

Each video produces a JSON file with 17 keypoints and 8 joint angles for every frame.

Step 2: ExecuTorch Native Module

We created custom native modules for both platforms:

// Android - ExecuTorchModule.kt
import org.pytorch.executorch.Module
import org.pytorch.executorch.EValue

module = Module.load(modelPath)
val outputs = module.forward(inputEValue)  // Real inference!

The module:

  • Loads the .pte model from app assets
  • Configures XNNPACK backend for Arm optimization
  • Runs inference on camera frames
  • Returns 17 keypoints with confidence scores

Step 3: React Native Integration

// ExecuTorchService.ts
const modelPath = await extractModelToFilesystem('pose.pte');
await ExecuTorchModule.loadModel(modelPath);
await ExecuTorchModule.setDelegate('xnnpack');  // Arm NEON!

const result = await ExecuTorchModule.runInference({ imageData });

Step 4: Scoring Algorithm

We compare poses using joint angles:

  • Calculate 8 angles: arms, elbows, thighs, legs
  • Compare user angles to reference angles
  • Weight by body part importance (arms matter more in bachata)
  • Apply confidence thresholds to filter noise

Challenges we ran into

1. ExecuTorch Model Loading

Problem: ExecuTorch expects a filesystem path, but React Native bundles assets differently.

Solution: We extract the model from bundled assets to the device's document directory at runtime:

const modelAsset = Asset.fromModule(require('./pose.pte'));
await modelAsset.downloadAsync();
await sourceFile.copy(destFile);

2. Metro Bundler Doesn't Know .pte Files

Problem: Metro refused to bundle our ExecuTorch model file.

Solution: Added .pte to asset extensions in metro.config.js:

config.resolver.assetExts.push('pte');

3. WSL + Android Development

Problem: Running Expo from WSL2 on Windows meant the phone couldn't reach the dev server (different network).

Solution: Used Expo's tunnel mode:

npx expo start --tunnel --dev-client

4. Multi-Person Videos

Problem: Dance videos often have multiple people. Which one do we track?

Solution: YOLOv8s-pose detects all people, and we select the one with highest confidence/largest bounding box. Our visualize_tracking.py tool lets us verify the right dancer is tracked.

5. Kotlin Type Mismatches

Problem: ExecuTorch's forward() method expects EValue, not Tensor.

Solution: Wrap tensors properly:

val inputEValue = EValue.from(inputTensor)
val outputs = module.forward(inputEValue)
val result = outputs[0]  // Array of EValue

6. expo-file-system API Changes

Problem: expo-file-system v19 completely changed its API (new File, Directory, Paths classes).

Solution: Rewrote the model extraction code to use the new API:

const modelsDir = new Directory(Paths.document, 'models');
const destFile = new File(modelsDir, 'pose.pte');

Accomplishments that we're proud of

🏆 Real ExecuTorch Integration

Not a stub, not mock data — actual PyTorch ExecuTorch 1.0 GA running on-device with XNNPACK backend. The native module loads the model, runs inference, and returns real keypoints.

🎯 State-of-the-Art Pose Detection

YOLOv8s-pose achieves 64.0 AP on COCO — that's 20% better than older MoveNet models. Our preprocessing pipeline extracts high-quality pose data from any dance video.

🔧 Complete Developer Tooling

We didn't just build an app — we built a content creation pipeline:

  • YouTube downloader with audio extraction
  • Batch video processor
  • Pose visualization tool
  • JSON validation scripts

Anyone can add new songs in minutes.

📱 Production-Quality App

  • Clean React Native + Expo architecture
  • Proper error handling with graceful fallbacks
  • TypeScript throughout
  • Native modules for both iOS and Android

🔒 Privacy-First Design

Zero cloud dependencies. Your camera feed never leaves your device. Dance in your underwear — we won't judge (or know).

⚡ Arm Optimization

XNNPACK backend specifically optimized for Arm NEON SIMD instructions. We're not just running on Arm — we're leveraging Arm's AI acceleration.


What we learned

ExecuTorch is Production-Ready

ExecuTorch 1.0 GA is genuinely impressive. The API is clean, the Maven artifacts work out of the box, and XNNPACK provides real performance gains on Arm devices.

React Native + Native Modules = Powerful

The bridge between JavaScript and native code is seamless. We can write our UI in React Native while running optimized Kotlin/Objective-C++ for AI inference.

Preprocessing > Choreographies embeddings are embedded in the app

For reference poses, pre-computing with YOLOv8s-pose on a powerful machine gives better accuracy than real-time mobile inference. The hybrid approach (pre-computed reference + real-time user) is the sweet spot.

Arm NEON Makes a Difference

XNNPACK's Arm NEON optimizations provide measurable speedups. Mobile AI isn't just "possible" on Arm — it's fast.

Developer Experience Matters

Building tools like visualize_tracking.py saved hours of debugging. When you can see exactly what the AI is detecting, problems become obvious.

WSL2 Has Networking Quirks

Expo's tunnel mode is a lifesaver when your dev machine and phone are on different networks.


What's next for BachaTrainer

Short-Term (Next Month)

  • [ ] Add more songs — More difficulty levels
  • [ ] Performance benchmarks — Measure actual latency on various Arm devices
  • [ ] App store submission — Google Play and TestFlight

Long-Term (3-6 Months)

  • [ ] Multiplayer mode — Dance battles with friends, leaderboard
  • [ ] Progress tracking — See improvement over time
  • [ ] Custom choreography — Users upload their own dance videos
  • [ ] Wearable support — Apple Watch, fitness bands
  • [ ] Social features — Share scores, challenge friends

Imagine a world where anyone can learn any dance, anywhere, with instant AI feedback. No expensive classes, no judgment, just you and your phone becoming a better dancer one song at a time.

Links

Built With

Share this project:

Updates