Inspiration

Over 70 million people use sign language as a primary means of communication, yet learning ASL remains slow, isolating, and rarely fun. Rhythm games like Guitar Hero turn skill-building into flow through tight feedback loops and dopamine-driven repetition. We asked: what if mastering fingerspelling felt like playing along to a song? SignRunner fuses accessibility with arcade game design — turning muscle-memory drilling into something you actually want to do again.

What it does

SignRunner is a real-time, computer-vision ASL fingerspelling game. Your webcam tracks your hand, an on-device neural network classifies the sign you're forming, and you play along to a scrolling note-highway — Guitar Hero, but in sign language. Pick a track from the curated or community library, sign each letter as it hits the target zone, and chain combos scored on accuracy and reaction time. It ships with a persistent global leaderboard, a community-authored song library, and a Dev Mode beatmap studio for turning any track into a playable chart.

How we built it

SignRunner is a full-stack, ML-in-the-loop application that closes a real-time perception → game-state loop in tens of milliseconds. The browser streams compressed frames to an inference microservice that returns structured predictions driving the game engine each tick.

Layer Technology Role
Frontend / Game Engine Next.js 15 (App Router, Turbopack), React 19, TypeScript SSR/CSR hybrid app + custom requestAnimationFrame note-highway, scoring, and combo state machine
Styling / Design System Tailwind CSS 4, shadcn/ui, WebGL shader backdrop, Framer Motion Cohesive dark/gold design language, GPU-accelerated visuals, micro-interactions
API Layer tRPC v11, Zod End-to-end type-safe RPC with runtime schema validation, zero codegen
Data / Persistence MongoDB (replica set), Prisma ORM Songs, players, leaderboard; transactional writes for community uploads
Computer Vision MediaPipe Hands, OpenCV 21-point hand-landmark extraction + feature-mask preprocessing
ML Inference PyTorch, MobileNetV2 (transfer-learned), FastAPI + Uvicorn A–Z classification at ~30–50ms/frame with softmax confidence gating
Inference Transport HTTP frame streaming (JPEG), CORS microservice Decoupled, horizontally scalable perception service returning {letter, confidence, handDetected}

The pipeline: capture → JPEG encode → MediaPipe landmark detection → feature-mask tensor → MobileNetV2 forward pass → confidence-thresholded prediction → game-loop hit detection against the beatmap timeline.

Challenges we ran into

  • Real-time latency vs. accuracy: keeping end-to-end inference fast enough to feel like a rhythm game while staying robust to noisy, low-light webcam frames and motion blur.
  • Dependency hell in the CV stack: newer MediaPipe wheels silently dropped the legacy solutions API and broke hand tracking — we had to pin the exact compatible build.
  • Database topology: Prisma's MongoDB connector mandates a replica set for transactional integrity, a non-obvious setup hurdle for local dev.
  • Codebase unification: merging a working game/ML backend with a separate design system into one clean, non-redundant monorepo.
  • Deterministic UX: compacting elaborate, animated game screens into single-viewport, no-scroll layouts without sacrificing polish.

Accomplishments that we're proud of

  • A genuinely real-time, on-device ASL classifier wired into a playable game loop — webcam to score in the tens-of-milliseconds range, no cloud round-trip.
  • A production-grade full stack that actually runs: live inference microservice, replica-set-backed leaderboard, community uploads, and an in-app beatmap authoring tool.
  • A cohesive, shipped design — custom logo, per-song generative artwork, and a shader-driven aesthetic consistent from landing page to gameplay.

What we learned

  • Fusing computer vision (MediaPipe + a transfer-learned CNN) with frame-accurate game-loop timing — where every millisecond of inference budget is visible to the player.
  • The realities of shipping ML locally: dependency pinning, confidence calibration, and graceful degradation when a camera or signal is missing.
  • How much disciplined information architecture and design systems matter for making software feel finished, not merely functional.

What's next for SignRunner

  • Extend from fingerspelling to full ASL word- and phrase-level recognition with temporal (sequence) models.
  • Real licensed song audio with beat-synced charts and adaptive difficulty.
  • In-browser WASM/WebGPU inference to eliminate the server dependency entirely and run fully edge-side.
  • Multiplayer races, daily challenges, and a richer creator economy around community charts.
  • Co-design and accessibility evaluation with the Deaf and hard-of-hearing community to evolve SignRunner from a game into a credible learning tool.

Built With

Share this project:

Updates