Inspiration

https://www.youtube.com/watch?v=ci-yB6EgVW4

What it does

Cadence is a hybrid glove + camera MIDI controller that turns real hand motion into expressive music control.

  • Glove sensors (ESP32 over serial/Bluetooth): reads 4 flex sensors (pointer/middle/ring/pinky), a thumb touch/pressure ADC, hall sensors, and IMU (accel + gyro).
  • Webcam hand tracking (MediaPipe): tracks a single hand and estimates finger bend + hand position.
  • Sensor fusion → performance controls:

    • Uses velocity-based finger bend detection (fast vs slow EMA) to trigger note-on/off naturally (like “pluck” gestures instead of static thresholds).
    • Thumb ADC continuously maps to MIDI CC27 (a smooth, pressure-like knob).
    • Madgwick-filtered IMU pitch maps to MIDI CC28, and also biases octave selection so tilting your wrist shifts the register musically.
    • Hand X/Y position maps to CC25/CC26 for spatial/FX control.
    • Hall sensors can act like a sustain/pedal (CC64) and the IMU axes can drive extra expression (e.g., CC10 pan, CC11 expression).
  • Evolving harmonic bed: runs a Markov / chord-library chord pad on MIDI channel 2, while your finger gestures play melody notes on MIDI channel 1.

Result: you can perform melodies, filter sweeps, modulation, expression, and harmonic movement using one hand—with continuous control and “gesture velocity” feel.

How we built it

  • Hardware / firmware

    • Built a glove around an ESP32 that streams sensor data as compact CSV lines (supporting legacy and newer formats).
    • Integrated flex sensors, thumb pressure/touch, optional hall sensors, and optional IMU.
  • Python real-time engine

    • A dedicated FlexReader thread reads and parses serial/Bluetooth data robustly (multiple regex formats, reconnection handling).
    • MediaPipe Hand Landmarker runs from the webcam to estimate finger bends and hand position in real-time.
    • A Madgwick AHRS filter fuses accel + gyro to get stable wrist pitch without needing a magnetometer.
    • Sensor fusion logic blends glove and camera signals and uses EMA-based velocity detection to produce musical triggers.
  • MIDI output

    • Uses mido + rtmidi to create a virtual MIDI port (“GestureHand MIDI”), so any DAW/synth can receive notes/CC.
    • Runs a background chord thread that voices chord tones with slight humanized timing/velocity.

Challenges we ran into

  • Noisy, real-world sensor signals: flex sensors drift and jitter, and camera tracking can drop frames or lose the hand.
  • Latency + stability tradeoffs: pushing responsiveness without jitter required careful smoothing (slow EMA) and attack detection (fast EMA).
  • Different “directions” per sensor: some flex sensors increase voltage when bent, others decrease—ring finger needed inversion.
  • IMU alignment + drift: getting a usable pitch signal required calibration, deadbanding gyro noise, and smoothing to avoid twitchy octave jumps.
  • Robust parsing + formats: supporting evolving ESP32 message formats while staying backward compatible.
  • Musical feel: avoiding machine-gun MIDI spam meant deadbands for CC updates, auto note-off aging, and careful thresholds for velocity triggers.

Accomplishments that we're proud of

  • Natural gesture-to-note triggering using bend velocity, not just static thresholds—so it feels more like an instrument.
  • True hybrid tracking: glove sensors keep working when vision fails, and camera data adds nuance when available.
  • Expressive control mapping: thumb pressure (CC27) + wrist pitch (CC28) + hand XY (CC25/26) + sustain (CC64) gives a full performance surface.
  • Built-in harmonic engine: the Markov/chord pad makes performances sound “complete” even as a solo player.
  • Plug-and-play MIDI: virtual port means it works with basically any DAW/synth with zero special integration.

What we learned

  • Filtering matters more than raw sensors: EMAs + deadbands can make cheap sensors feel “premium.”
  • Calibration is UX: adding an interactive OpenCV calibration wizard is the difference between “demo” and “instrument.”
  • Fusion beats perfection: combining two imperfect systems (glove + camera) produces a result more reliable than either alone.
  • Music is about constraints: mapping pitch to probabilistic octave choice creates variation while staying musically coherent.
  • Real-time systems need guardrails: auto-release, reconnection handling, and fallback modes prevent live-performance failure.

What's next for Cadence

  • Better calibration + profiles: save/load per-user sensor calibration and MIDI mapping presets.
  • More gestures: pinch, tap, and pose detection (e.g., “fist = mute,” “pinch = latch chord,” “two-finger point = arpeggiate”).
  • Chord interaction: let gestures influence harmony directly (e.g., thumb pressure selects tension, wrist roll selects inversion).
  • Timing + quantization options: optional beat-sync’d triggering for tighter DAW integration.
  • Wireless polish: smoother Bluetooth pairing + packet timestamping for consistent latency.
  • Packaging: a simple UI for device status, mapping, and synth presets—so anyone can play without editing code.

Built With

Share this project:

Updates