SignRunner

Inspiration

Over 70 million people use sign language as a primary means of communication, yet learning ASL remains slow, isolating, and rarely fun. Rhythm games like Guitar Hero turn skill-building into flow through tight feedback loops and dopamine-driven repetition. We asked: what if mastering fingerspelling felt like playing along to a song? SignRunner fuses accessibility with arcade game design — turning muscle-memory drilling into something you actually want to do again.

What it does

SignRunner is a real-time, computer-vision ASL fingerspelling game. Your webcam tracks your hand, an on-device neural network classifies the sign you're forming, and you play along to a scrolling note-highway — Guitar Hero, but in sign language. Pick a track from the curated or community library, sign each letter as it hits the target zone, and chain combos scored on accuracy and reaction time. It ships with a persistent global leaderboard, a community-authored song library, and a Dev Mode beatmap studio for turning any track into a playable chart.

How we built it

SignRunner is a full-stack, ML-in-the-loop application that closes a real-time perception → game-state loop in tens of milliseconds. The browser streams compressed frames to an inference microservice that returns structured predictions driving the game engine each tick.

Layer	Technology	Role
Frontend / Game Engine	Next.js 15 (App Router, Turbopack), React 19, TypeScript	SSR/CSR hybrid app + custom `requestAnimationFrame` note-highway, scoring, and combo state machine
Styling / Design System	Tailwind CSS 4, shadcn/ui, WebGL shader backdrop, Framer Motion	Cohesive dark/gold design language, GPU-accelerated visuals, micro-interactions
API Layer	tRPC v11, Zod	End-to-end type-safe RPC with runtime schema validation, zero codegen
Data / Persistence	MongoDB (replica set), Prisma ORM	Songs, players, leaderboard; transactional writes for community uploads
Computer Vision	MediaPipe Hands, OpenCV	21-point hand-landmark extraction + feature-mask preprocessing
ML Inference	PyTorch, MobileNetV2 (transfer-learned), FastAPI + Uvicorn	A–Z classification at ~30–50ms/frame with softmax confidence gating
Inference Transport	HTTP frame streaming (JPEG), CORS microservice	Decoupled, horizontally scalable perception service returning `{letter, confidence, handDetected}`

The pipeline: capture → JPEG encode → MediaPipe landmark detection → feature-mask tensor → MobileNetV2 forward pass → confidence-thresholded prediction → game-loop hit detection against the beatmap timeline.

Challenges we ran into

Real-time latency vs. accuracy: keeping end-to-end inference fast enough to feel like a rhythm game while staying robust to noisy, low-light webcam frames and motion blur.
Dependency hell in the CV stack: newer MediaPipe wheels silently dropped the legacy solutions API and broke hand tracking — we had to pin the exact compatible build.
Database topology: Prisma's MongoDB connector mandates a replica set for transactional integrity, a non-obvious setup hurdle for local dev.
Codebase unification: merging a working game/ML backend with a separate design system into one clean, non-redundant monorepo.
Deterministic UX: compacting elaborate, animated game screens into single-viewport, no-scroll layouts without sacrificing polish.

Accomplishments that we're proud of

A genuinely real-time, on-device ASL classifier wired into a playable game loop — webcam to score in the tens-of-milliseconds range, no cloud round-trip.
A production-grade full stack that actually runs: live inference microservice, replica-set-backed leaderboard, community uploads, and an in-app beatmap authoring tool.
A cohesive, shipped design — custom logo, per-song generative artwork, and a shader-driven aesthetic consistent from landing page to gameplay.

What we learned

Fusing computer vision (MediaPipe + a transfer-learned CNN) with frame-accurate game-loop timing — where every millisecond of inference budget is visible to the player.
The realities of shipping ML locally: dependency pinning, confidence calibration, and graceful degradation when a camera or signal is missing.
How much disciplined information architecture and design systems matter for making software feel finished, not merely functional.

What's next for SignRunner

Extend from fingerspelling to full ASL word- and phrase-level recognition with temporal (sequence) models.
Real licensed song audio with beat-synced charts and adaptive difficulty.
In-browser WASM/WebGPU inference to eliminate the server dependency entirely and run fully edge-side.
Multiplayer races, daily challenges, and a richer creator economy around community charts.
Co-design and accessibility evaluation with the Deaf and hard-of-hearing community to evolve SignRunner from a game into a credible learning tool.