Pineapple Vision Pro

Inspiration

Hands are busy, eyes must stay focused, but information is digital. During presentations, hospital work, and hands-on tasks, stopping to touch a mouse or keyboard breaks focus and flow, and may be dangerous or unfeasible. Touch interfaces require stopping work. Voice fails in noisy environments. Controllers are impractical when hands are full. Traditional mice require repetitive, forceful movements that can lead to repetitive strain injuries such as mouse tendinitis, and they exclude people with motor disabilities.

We realized that humans already know how to interact naturally, such as pointing, pinching, and using gestures, so why force humans to adapt to machines? Instead, Pineapple Vision Pro adapts machines to people.

What it does

Pineapple Vision Pro turns any camera into a gesture control interface using natural mid-air hand movements without touching, voice, or physical controllers needed.

Useful across many environments, such as presentations where it eliminates awkward pauses, and sterile settings like hospitals where minimizing contamination is critical.

Pineapple Vision Pro supports continuous, natural gestures that tolerate tremor and limited dexterity. It reduces effort, force, and repetition, allowing users with prosthetic hands or hand tremors to interact comfortably using mid-air gestures. Our system relies on natural, continuous gestures designed for low-effort control, making it effective for users experiencing limited dexterity, tremor, or fatigue.

Features:

Point – Move the cursor

Pinch – Click

Two-finger swipe – Swipes left/right

Two-hand finger stretch – Zoom in/out

Rotate wrist – Scroll up/down

Clap – Turn on/off laser pointer

How we built it

Step 1: Hand Tracking - MediaPipe processes each frame and gives us 3D landmark points per hand (fingertips, knuckles, wrist, etc.).

Step 2: Gesture Recognition - analyzing the landmark positions to detect gestures:

Finger counting
Distance measurement
Hand velocity
Angle calculation
Two-hand tracking

Step 3: AprilTag Screen Calibration We use 4 AprilTag markers placed at screen corners (IDs 0, 1, 2, 3). The system:

Detects all 4 tags in the camera view
Computes a homography matrix that transforms the camera's trapezoid view into perfect screen rectangle coordinates
Maps any hand position in camera space to exact pixel coordinates on screen
For multi-monitor setups, we calibrate each screen independently and detect which display you're pointing at based on cursor trajectory.

Step 4: System Control - PyAutoGUI executes the actions:

Move cursor to calculated screen position
Send click/drag commands
Trigger scroll and swipe events
Switch between monitors

Challenges we ran into

Initially, we used a fist gesture to rotate the model, inspired by 3D CAD interactions. However, this approach struggled with finger occlusion; However, hidden fingers caused unstable detection and geometric inconsistencies. To address this, we switched to a thumbs-up gesture, which provides a clear and reliable reference point for orientation.