Inspiration

The global average screen time is 6 hours and 40 minutes a day and as a result, the average person is not getting enough physical activity to stay healthy. And honestly? It makes sense because the apps eating up our time are fun — just look at Subway Surfers for example, it’s been downloaded over 4 billion times. The game improves your mood when you need a little break, but the issue is that none of it moves your body. So we asked ourselves — what if it did?

What it does

Dashcam turns your body into the controller for Subway Surfers. Instead of swiping a phone, you physically run in place, step left, step right, jump, duck — and Subway Surfers responds in real time.

A webcam uses on-device pose detection to classify your movement, and those moves are streamed to a browser running the actual Unity WebGL build of Subway Surfers: San Francisco

How we built it

We host a vendored Unity 2019.4 WebGL build of Subway Surfers, and the program dispatches synthetic KeyboardEvents (with the right keyCode/which) targeted at the game. Everything else exists to produce those keypresses

Pose detection. A webcam frame goes through MediaPipe pose landmarks, then through a custom feet-intent lane controller. The key design decision: we trigger lane changes on the earliest reliable lateral foot movement (~80 ms) rather than waiting for the torso to cross a lane line, with a two-stage state machine — feet signal intent, hips confirm arrival. Direction comes from foot velocity (so crossover steps work), with One-Euro/EMA filtering for jitter, per-player calibration, and suppression of false lane triggers while the player is mid-jump or ducking

The transport. A phone/camera client sends already-classified actions (not video) over WebSocket to a relay, which fans them out to the display. We built two layers:

  • A lightweight Python relay (input_server.py, :8765) — a pure fan-out hub that bridges the phone client to the game display
  • A hardened Node session server (server/, :8080/ws) — zod-validated message envelopes, 4-digit pairing codes (Jackbox-style), a fixed 30 Hz authoritative tick, token-bucket rate limiting, and ws ping/pong liveness

Continuous state, reconstructed from heartbeats. Running and lane aren't fire-and-forget events — they're sustained states. The phone re-asserts them as heartbeats; the server reconstructs one authoritative InputState and applies timeouts. The avatar runs only while high-knees keep arriving.

Challenges we ran into

  • Fine-tuning the image detection sensitivity — too aggressive and every micro-movement triggers a lane change, too conservative and the controls feel nonresponsive; suppressing false positives during movements
  • Getting synthetic keyboard input to reliably drive a compiled WASM binary with no exposed API surface — we had no hooks, no callbacks, just reverse-engineering what exact events the game expected
  • Modeling movement as continuous sustained state rather than discrete events

Accomplishments that we're proud of

  • Getting a player on any network to control Subway Surfers entirely through their body movements and having it actually feel fun
  • The two-stage intent model (feet signal, hips confirm) feels good to play with that level of control — the character responds at the speed of intention, not just the speed of completion
  • A phone becomes a fully functional game controller with zero installs or additional hardware — just open a link, pair with a code, and start moving

What we learned

  • Latency is a UX problem as much as a networking problem — even objectively fast systems feel broken if the feedback loop isn't tight. Optimizing for perceived responsiveness changed how we thought about every layer of the pipeline
  • Pose estimation for the edges is extremely hard. The mid-jump, crouching, partial occlusion are the moments where false triggers occur the most. Most of our iteration time went into making the model fail gracefully rather than making it more accurate on clean inputs
  • Synthetic input into a compiled black box teaches you a lot about how browsers actually handle events. Things that "should" work according to the spec will sometimes not

What's next for Dashcam

  • Expanding the move set — the current lane/jump/roll moves are just a starting point, and more expressive gesture recognition for power-ups or using the menu would make the experience feel much richer
  • Supporting more games. The input pipeline is largely game-agnostic; Subway Surfers was just a proof of concept, not just a limitation
  • Tighter mobile calibration flow so new players can get set up in under 30 seconds without needing to tweak sensitivity manually
  • Exploring multiplayer modes beyond the current architecture — leaderboards, async challenges, shared runs.
Share this project:

Updates