SCUDA: Scuba-sign Capturing User-interface Data Analysis
Inspiration
Scuba divers are often put under pressure during their missions. They developed a sign language to allow them to communicate quickly; however, it may be difficult to interpret these signs in harsher conditions and lower lighting. SCUDA solves this by providing a framework for which scuba diving signs can be labeled and developed.
Most machine learning projects focus on training fancy models, but we realized the real bottleneck is getting clean data. We wanted to build a complete data pipeline for sign language gesture recognition—from capturing sensor data, to automatically detecting when gestures happen, to normalizing everything into a format ready for training. The challenge isn't training a model; it's collecting and structuring the right data first.
I also want to put specific thanks to inspiration due to a similar idea that I have been working on, with Jennifer Lee and Elijah Bridges at NCSU on ASL translation. This project is standalone however and does not reference any external resources.
What it does
SCUDA captures hand movements using a wearable IMU sensor (accelerometer + gyroscope + rotation) and automatically:
- Detects when a gesture starts and stops in real-time
- Records the full motion (100 frames per gesture at 100Hz)
- Encodes hand position automatically (up, down, left, right, etc.)
- Normalizes all gestures to the same length (100 timesteps) so they can be fed into ML models
- Lets users label each gesture with a name (e.g., "WAVE", "HELLO")
The output is a clean CSV dataset where each row is one labeled gesture ready for training.
How we built it
We created a three-part system:
Part 1: Hardware Layer — A tiny microcontroller (XIAO ESP32-C3) connected to an IMU sensor reads motion data at 100Hz and streams it over USB.
Part 2: Segmentation — We wrote code that automatically figures out "is the user gesturing right now?" by looking at how fast the hand is rotating. Once it detects motion, it records frames until motion stops.
Part 3: Data Processing — A Python script reads the raw recordings and:
- Filters out noise and failed attempts
- Stretches/shrinks each gesture to exactly 100 timesteps (using linear interpolation)
- Lets the user label each one interactively
- Outputs a clean, normalized dataset
Challenges we ran into
Challenge 1: False Positives from Noise
The problem: The gyroscope is sensitive, so small tremors (hand jitter) were triggering fake gesture detections. We'd get a lot of false positives.
How we fixed it: We required the sensor to stay "active" (moving) for at least 8 frames (~80ms) before committing to a gesture. We also increased the motion threshold from 0.3 to 0.6 rad/s. Dropping false positives significantly.
Challenge 2: Hardware Connection Kept Failing
The problem: The IMU sensor randomly stopped responding ("Failed to find BNO08x chip!"). Very frustrating when testing.
Root causes we found:
- The board got moisture on it (user's sweat from testing!)
- Loose wires on the I2C connection
- Code we wrote was too complex and crashed before initializing the sensor
How we fixed it: Dried the board, made sure wires were firmly seated, and simplified the firmware. Also wrote a quick I2C scanner to diagnose connection issues.
Challenge 3: Gestures Have Different Lengths
The problem: Some gestures last 150ms, others last 2500ms. ML models need fixed-size inputs. We can't just pad with zeros.
How we fixed it: We used linear interpolation to resample every gesture to exactly 100 timesteps. This stretches short gestures and compresses long ones while preserving the motion pattern. All 13 sensor channels get resampled independently.
Challenge 4: Tracking Hand Position Without Markers
The problem: How do we know if the hand moved "left" or "right" without external cameras or motion markers?
How we fixed it: We calculate hand position from the direction of acceleration. If the hand accelerates leftward, we encode that as "LEFT_MIDDLE" (or other discrete positions). This doesn't require calibration—it works from any starting position because we only care about the direction, not the absolute location.
Accomplishments we're proud of
Built a complete end-to-end pipeline — Most projects skip data engineering, but we made the entire capture → normalize → label workflow actually work.
Automatic gesture boundary detection — The firmware figures out when gestures start and stop without any pre-trained models, just physics.
Zero-calibration hand position encoding — Hand position is automatically calculated from motion direction. No special setup needed.
Eliminated false positives — By requiring sustained motion (8 frames), we reduced noise detections.
Normalizes variable-length data — Every gesture becomes a consistent 100 × 13 data matrix ready for ML, regardless of how fast or slow it was performed.
What we learned
Data engineering is the hard part — We spent way more time on segmentation, resampling, and filtering than we would on training any model. Real ML projects are 80% data work.
Simple beats complex — Linear interpolation is better than fancy spline fitting for resampling time series. It's simple, it works, and it's interpretable.
State machines prevent bugs — By explicitly drawing out the "when do we transition between states?", we avoided off-by-one errors that would lose data.
Hardware debugging is real — Not everything is a software problem. Moisture and loose connections taught us to think physically.
Hysteresis matters — Requiring sustained motion (not just one spike) eliminates noise without adding latency. It's a general pattern worth reusing.
What's next
- Train ML models (Random Forest, LSTM, etc.) on the normalized dataset
- Deploy real-time inference on the device itself
- Expand to 20+ gestures for practical sign language
- Test across multiple users and see if the model generalizes
- Eventually build a real-time sign-to-speech application
Technical Stack
- Hardware: Seeed XIAO ESP32-C3 + Adafruit BNO08x IMU
- Firmware: Arduino C++ via PlatformIO
- Data Processing: Python, NumPy, Pandas
- Development: VS Code, Git
Summary
We built the data infrastructure for sign language recognition. The hard part wasn't training a model—it was reliably capturing gestures, filtering noise, normalizing variable-length data, and creating a pipeline users could actually interact with. This is what our data science analysis was.
Log in or sign up for Devpost to join the conversation.