Inspiration

We were inspired by the desire to make gaming more immersive and accessible through natural hand gestures. Traditional keyboard and mouse controls can feel disconnected from the driving experience, so we wanted to create an intuitive hand-tracking system that lets players literally "hold the wheel" using just their webcam. The idea of transforming your hands into a virtual steering wheel—where you can steer by tilting your hands and brake with an open palm—felt like a natural evolution of how humans interact with vehicles.

What it does

Drivable is a hand-tracking controller system for Roblox driving games that uses computer vision to translate hand gestures into keyboard inputs. The system detects both hands in real-time through your webcam and maps their movements to game controls:

  1. Steering: Hold both hands in front of the camera as if gripping a steering wheel. Raising your left hand steers left, raising your right hand steers right, and keeping both hands level drives straight. The system calculates the angle between your hands to provide proportional steering control. Acceleration: Automatically engages when both hands are detected (mimicking holding the wheel)
  2. Braking: Show an open palm with either hand to activate the brake Start/Stop Control: Press 's' to start detection, 'x' to stop, ensuring you have full control over when the system is active

The application displays real-time visual feedback showing hand landmarks, steering angle, brake status, and gesture classifications directly on the camera feed.

How we built it

We built Drivable using Python with several key technologies:

  1. MediaPipe Hands: Google's ML solution for real-time hand tracking and landmark detection
  2. OpenCV: For webcam access, image processing, and visual overlay rendering
  3. pynput: To simulate keyboard inputs (WASD keys) that Roblox games recognize
  4. NumPy: For efficient numerical computations

The architecture consists of three modular components:

  1. HandDetector: Handles MediaPipe integration, gesture recognition (open palm detection, fist detection), and angle calculations based on hand landmark positions
  2. KeyboardController: Manages keyboard input simulation with proper state tracking to prevent key conflicts
  3. HandSteeringApp: Coordinates the camera feed, hand detection pipeline, and keyboard control with smoothing algorithms for stable steering

We implemented a height-comparison algorithm that calculates the steering angle based on the vertical difference between your hands, with a configurable threshold (12.5°) for the "straight ahead" dead zone. A moving average filter smooths the steering input to prevent jittery movements.

Challenges we ran into

  1. Gesture Recognition Accuracy: Getting reliable open palm detection required careful tuning of MediaPipe's confidence thresholds and landmark comparisons. We had to account for different hand orientations and lighting conditions.
  2. Steering Calibration: Finding the right balance between responsive steering and stability was tricky. We experimented with various angle thresholds and smoothing window sizes before settling on a 10-frame moving average.
  3. Hand Positioning Consistency: Users naturally move their hands slightly, so we needed robust logic to determine which hand is "left" vs "right" based on x-position rather than MediaPipe's handedness classification.
  4. Keyboard Input Timing: Ensuring smooth key press/release cycles without conflicts or stuck keys required careful state management and cleanup procedures.
  5. Real-time Performance: Balancing detection accuracy with frame rate to maintain responsive control while running MediaPipe's neural networks.

Accomplishments that we're proud of

  1. Successfully created an intuitive, natural control scheme that feels like actually driving
  2. Achieved smooth, responsive steering with proportional control based on hand angle
  3. Built a modular, well-documented codebase that separates concerns cleanly
  4. Implemented robust gesture recognition that works reliably across different users and environments
  5. Created comprehensive visual feedback that shows users exactly what the system is detecting
  6. Added proper start/stop controls so users can easily toggle the system on and off
  7. Made the system work seamlessly with Roblox without requiring any game modifications

What we learned

  1. How to work with MediaPipe's hand tracking API and interpret landmark coordinates for gesture recognition
  2. The importance of smoothing and filtering in real-time control systems to prevent jittery inputs
  3. How coordinate systems and geometry (atan2, angle calculations) can be used to create intuitive control mappings
  4. Techniques for simulating keyboard inputs programmatically across different operating systems
  5. The value of visual debugging—overlaying detection data on the camera feed was crucial for development and user feedback
  6. State management strategies for handling asynchronous input systems (hand detection vs keyboard output)
  7. How to balance detection sensitivity with accuracy through parameter tuning ## What's next for Driveable
  8. Advanced Gesture Controls: Add more gestures like hand rotation for turn signals, pinch gestures for gear shifting, or finger counting for speed presets
  9. Calibration System: Personal calibration mode that adapts to individual hand sizes and positioning preferences

Built With

Share this project:

Updates