Inspiration
https://www.youtube.com/watch?v=ci-yB6EgVW4
What it does
Cadence is a hybrid glove + camera MIDI controller that turns real hand motion into expressive music control.
- Glove sensors (ESP32 over serial/Bluetooth): reads 4 flex sensors (pointer/middle/ring/pinky), a thumb touch/pressure ADC, hall sensors, and IMU (accel + gyro).
- Webcam hand tracking (MediaPipe): tracks a single hand and estimates finger bend + hand position.
Sensor fusion → performance controls:
- Uses velocity-based finger bend detection (fast vs slow EMA) to trigger note-on/off naturally (like “pluck” gestures instead of static thresholds).
- Thumb ADC continuously maps to MIDI CC27 (a smooth, pressure-like knob).
- Madgwick-filtered IMU pitch maps to MIDI CC28, and also biases octave selection so tilting your wrist shifts the register musically.
- Hand X/Y position maps to CC25/CC26 for spatial/FX control.
- Hall sensors can act like a sustain/pedal (CC64) and the IMU axes can drive extra expression (e.g., CC10 pan, CC11 expression).
Evolving harmonic bed: runs a Markov / chord-library chord pad on MIDI channel 2, while your finger gestures play melody notes on MIDI channel 1.
Result: you can perform melodies, filter sweeps, modulation, expression, and harmonic movement using one hand—with continuous control and “gesture velocity” feel.
How we built it
Hardware / firmware
- Built a glove around an ESP32 that streams sensor data as compact CSV lines (supporting legacy and newer formats).
- Integrated flex sensors, thumb pressure/touch, optional hall sensors, and optional IMU.
Python real-time engine
- A dedicated FlexReader thread reads and parses serial/Bluetooth data robustly (multiple regex formats, reconnection handling).
- MediaPipe Hand Landmarker runs from the webcam to estimate finger bends and hand position in real-time.
- A Madgwick AHRS filter fuses accel + gyro to get stable wrist pitch without needing a magnetometer.
- Sensor fusion logic blends glove and camera signals and uses EMA-based velocity detection to produce musical triggers.
MIDI output
- Uses mido + rtmidi to create a virtual MIDI port (“GestureHand MIDI”), so any DAW/synth can receive notes/CC.
- Runs a background chord thread that voices chord tones with slight humanized timing/velocity.
Challenges we ran into
- Noisy, real-world sensor signals: flex sensors drift and jitter, and camera tracking can drop frames or lose the hand.
- Latency + stability tradeoffs: pushing responsiveness without jitter required careful smoothing (slow EMA) and attack detection (fast EMA).
- Different “directions” per sensor: some flex sensors increase voltage when bent, others decrease—ring finger needed inversion.
- IMU alignment + drift: getting a usable pitch signal required calibration, deadbanding gyro noise, and smoothing to avoid twitchy octave jumps.
- Robust parsing + formats: supporting evolving ESP32 message formats while staying backward compatible.
- Musical feel: avoiding machine-gun MIDI spam meant deadbands for CC updates, auto note-off aging, and careful thresholds for velocity triggers.
Accomplishments that we're proud of
- Natural gesture-to-note triggering using bend velocity, not just static thresholds—so it feels more like an instrument.
- True hybrid tracking: glove sensors keep working when vision fails, and camera data adds nuance when available.
- Expressive control mapping: thumb pressure (CC27) + wrist pitch (CC28) + hand XY (CC25/26) + sustain (CC64) gives a full performance surface.
- Built-in harmonic engine: the Markov/chord pad makes performances sound “complete” even as a solo player.
- Plug-and-play MIDI: virtual port means it works with basically any DAW/synth with zero special integration.
What we learned
- Filtering matters more than raw sensors: EMAs + deadbands can make cheap sensors feel “premium.”
- Calibration is UX: adding an interactive OpenCV calibration wizard is the difference between “demo” and “instrument.”
- Fusion beats perfection: combining two imperfect systems (glove + camera) produces a result more reliable than either alone.
- Music is about constraints: mapping pitch to probabilistic octave choice creates variation while staying musically coherent.
- Real-time systems need guardrails: auto-release, reconnection handling, and fallback modes prevent live-performance failure.
What's next for Cadence
- Better calibration + profiles: save/load per-user sensor calibration and MIDI mapping presets.
- More gestures: pinch, tap, and pose detection (e.g., “fist = mute,” “pinch = latch chord,” “two-finger point = arpeggiate”).
- Chord interaction: let gestures influence harmony directly (e.g., thumb pressure selects tension, wrist roll selects inversion).
- Timing + quantization options: optional beat-sync’d triggering for tighter DAW integration.
- Wireless polish: smoother Bluetooth pairing + packet timestamping for consistent latency.
- Packaging: a simple UI for device status, mapping, and synth presets—so anyone can play without editing code.
Log in or sign up for Devpost to join the conversation.