Inspiration
Over 70 million people use sign language as a primary means of communication, yet learning ASL remains slow, isolating, and rarely fun. Rhythm games like Guitar Hero turn skill-building into flow through tight feedback loops and dopamine-driven repetition. We asked: what if mastering fingerspelling felt like playing along to a song? SignRunner fuses accessibility with arcade game design — turning muscle-memory drilling into something you actually want to do again.
What it does
SignRunner is a real-time, computer-vision ASL fingerspelling game. Your webcam tracks your hand, an on-device neural network classifies the sign you're forming, and you play along to a scrolling note-highway — Guitar Hero, but in sign language. Pick a track from the curated or community library, sign each letter as it hits the target zone, and chain combos scored on accuracy and reaction time. It ships with a persistent global leaderboard, a community-authored song library, and a Dev Mode beatmap studio for turning any track into a playable chart.
How we built it
SignRunner is a full-stack, ML-in-the-loop application that closes a real-time perception → game-state loop in tens of milliseconds. The browser streams compressed frames to an inference microservice that returns structured predictions driving the game engine each tick.
| Layer | Technology | Role |
|---|---|---|
| Frontend / Game Engine | Next.js 15 (App Router, Turbopack), React 19, TypeScript | SSR/CSR hybrid app + custom requestAnimationFrame note-highway, scoring, and combo state machine |
| Styling / Design System | Tailwind CSS 4, shadcn/ui, WebGL shader backdrop, Framer Motion | Cohesive dark/gold design language, GPU-accelerated visuals, micro-interactions |
| API Layer | tRPC v11, Zod | End-to-end type-safe RPC with runtime schema validation, zero codegen |
| Data / Persistence | MongoDB (replica set), Prisma ORM | Songs, players, leaderboard; transactional writes for community uploads |
| Computer Vision | MediaPipe Hands, OpenCV | 21-point hand-landmark extraction + feature-mask preprocessing |
| ML Inference | PyTorch, MobileNetV2 (transfer-learned), FastAPI + Uvicorn | A–Z classification at ~30–50ms/frame with softmax confidence gating |
| Inference Transport | HTTP frame streaming (JPEG), CORS microservice | Decoupled, horizontally scalable perception service returning {letter, confidence, handDetected} |
The pipeline: capture → JPEG encode → MediaPipe landmark detection → feature-mask tensor → MobileNetV2 forward pass → confidence-thresholded prediction → game-loop hit detection against the beatmap timeline.
Challenges we ran into
- Real-time latency vs. accuracy: keeping end-to-end inference fast enough to feel like a rhythm game while staying robust to noisy, low-light webcam frames and motion blur.
- Dependency hell in the CV stack: newer MediaPipe wheels silently dropped the legacy
solutionsAPI and broke hand tracking — we had to pin the exact compatible build. - Database topology: Prisma's MongoDB connector mandates a replica set for transactional integrity, a non-obvious setup hurdle for local dev.
- Codebase unification: merging a working game/ML backend with a separate design system into one clean, non-redundant monorepo.
- Deterministic UX: compacting elaborate, animated game screens into single-viewport, no-scroll layouts without sacrificing polish.
Accomplishments that we're proud of
- A genuinely real-time, on-device ASL classifier wired into a playable game loop — webcam to score in the tens-of-milliseconds range, no cloud round-trip.
- A production-grade full stack that actually runs: live inference microservice, replica-set-backed leaderboard, community uploads, and an in-app beatmap authoring tool.
- A cohesive, shipped design — custom logo, per-song generative artwork, and a shader-driven aesthetic consistent from landing page to gameplay.
What we learned
- Fusing computer vision (MediaPipe + a transfer-learned CNN) with frame-accurate game-loop timing — where every millisecond of inference budget is visible to the player.
- The realities of shipping ML locally: dependency pinning, confidence calibration, and graceful degradation when a camera or signal is missing.
- How much disciplined information architecture and design systems matter for making software feel finished, not merely functional.
What's next for SignRunner
- Extend from fingerspelling to full ASL word- and phrase-level recognition with temporal (sequence) models.
- Real licensed song audio with beat-synced charts and adaptive difficulty.
- In-browser WASM/WebGPU inference to eliminate the server dependency entirely and run fully edge-side.
- Multiplayer races, daily challenges, and a richer creator economy around community charts.
- Co-design and accessibility evaluation with the Deaf and hard-of-hearing community to evolve SignRunner from a game into a credible learning tool.
Built With
- api
- apis
- browser
- codebase
- css
- database
- next.js
- prediction
- prisma
- react
- tailwind
- trpc
- typescript
- webcam
Log in or sign up for Devpost to join the conversation.