Physical SCUDA module

SCUDA: Scuba-sign Capturing User-interface Data Analysis

Inspiration

Scuba divers are often put under pressure during their missions. They developed a sign language to allow them to communicate quickly; however, it may be difficult to interpret these signs in harsher conditions and lower lighting. SCUDA solves this by providing a framework for which scuba diving signs can be labeled and developed.

Most machine learning projects focus on training fancy models, but we realized the real bottleneck is getting clean data. We wanted to build a complete data pipeline for sign language gesture recognition—from capturing sensor data, to automatically detecting when gestures happen, to normalizing everything into a format ready for training. The challenge isn't training a model; it's collecting and structuring the right data first.

I also want to put specific thanks to inspiration due to a similar idea that I have been working on, with Jennifer Lee and Elijah Bridges at NCSU on ASL translation. This project is standalone however and does not reference any external resources.

What it does

SCUDA captures hand movements using a wearable IMU sensor (accelerometer + gyroscope + rotation) and automatically:

Detects when a gesture starts and stops in real-time
Records the full motion (100 frames per gesture at 100Hz)
Encodes hand position automatically (up, down, left, right, etc.)
Normalizes all gestures to the same length (100 timesteps) so they can be fed into ML models
Lets users label each gesture with a name (e.g., "WAVE", "HELLO")

The output is a clean CSV dataset where each row is one labeled gesture ready for training.

How we built it

We created a three-part system:

Part 1: Hardware Layer — A tiny microcontroller (XIAO ESP32-C3) connected to an IMU sensor reads motion data at 100Hz and streams it over USB.

Part 2: Segmentation — We wrote code that automatically figures out "is the user gesturing right now?" by looking at how fast the hand is rotating. Once it detects motion, it records frames until motion stops.

Part 3: Data Processing — A Python script reads the raw recordings and:

Filters out noise and failed attempts
Stretches/shrinks each gesture to exactly 100 timesteps (using linear interpolation)
Lets the user label each one interactively
Outputs a clean, normalized dataset

Challenges we ran into

Challenge 1: False Positives from Noise

The problem: The gyroscope is sensitive, so small tremors (hand jitter) were triggering fake gesture detections. We'd get a lot of false positives.

How we fixed it: We required the sensor to stay "active" (moving) for at least 8 frames (~80ms) before committing to a gesture. We also increased the motion threshold from 0.3 to 0.6 rad/s. Dropping false positives significantly.

Challenge 2: Hardware Connection Kept Failing

The problem: The IMU sensor randomly stopped responding ("Failed to find BNO08x chip!"). Very frustrating when testing.

Root causes we found:

The board got moisture on it (user's sweat from testing!)
Loose wires on the I2C connection
Code we wrote was too complex and crashed before initializing the sensor

How we fixed it: Dried the board, made sure wires were firmly seated, and simplified the firmware. Also wrote a quick I2C scanner to diagnose connection issues.

Challenge 3: Gestures Have Different Lengths

The problem: Some gestures last 150ms, others last 2500ms. ML models need fixed-size inputs. We can't just pad with zeros.

How we fixed it: We used linear interpolation to resample every gesture to exactly 100 timesteps. This stretches short gestures and compresses long ones while preserving the motion pattern. All 13 sensor channels get resampled independently.

Challenge 4: Tracking Hand Position Without Markers

The problem: How do we know if the hand moved "left" or "right" without external cameras or motion markers?

How we fixed it: We calculate hand position from the direction of acceleration. If the hand accelerates leftward, we encode that as "LEFT_MIDDLE" (or other discrete positions). This doesn't require calibration—it works from any starting position because we only care about the direction, not the absolute location.

Accomplishments we're proud of

Built a complete end-to-end pipeline — Most projects skip data engineering, but we made the entire capture → normalize → label workflow actually work.
Automatic gesture boundary detection — The firmware figures out when gestures start and stop without any pre-trained models, just physics.
Zero-calibration hand position encoding — Hand position is automatically calculated from motion direction. No special setup needed.
Eliminated false positives — By requiring sustained motion (8 frames), we reduced noise detections.
Normalizes variable-length data — Every gesture becomes a consistent 100 × 13 data matrix ready for ML, regardless of how fast or slow it was performed.

What we learned

Data engineering is the hard part — We spent way more time on segmentation, resampling, and filtering than we would on training any model. Real ML projects are 80% data work.
Simple beats complex — Linear interpolation is better than fancy spline fitting for resampling time series. It's simple, it works, and it's interpretable.
State machines prevent bugs — By explicitly drawing out the "when do we transition between states?", we avoided off-by-one errors that would lose data.
Hardware debugging is real — Not everything is a software problem. Moisture and loose connections taught us to think physically.
Hysteresis matters — Requiring sustained motion (not just one spike) eliminates noise without adding latency. It's a general pattern worth reusing.

What's next

Train ML models (Random Forest, LSTM, etc.) on the normalized dataset
Deploy real-time inference on the device itself
Expand to 20+ gestures for practical sign language
Test across multiple users and see if the model generalizes
Eventually build a real-time sign-to-speech application

Technical Stack

Hardware: Seeed XIAO ESP32-C3 + Adafruit BNO08x IMU
Firmware: Arduino C++ via PlatformIO
Data Processing: Python, NumPy, Pandas
Development: VS Code, Git

Summary

We built the data infrastructure for sign language recognition. The hard part wasn't training a model—it was reliably capturing gestures, filtering noise, normalizing variable-length data, and creating a pipeline users could actually interact with. This is what our data science analysis was.

Built With

cpp
github
platformio
python

Updates

Aaron Wang started this project — Mar 15, 2026 06:14 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.