Inspiration
"It is never my custom to use words lightly... how precious words are." — Nelson Mandela
Over 70 million people worldwide are deaf, hard-of-hearing, or non-verbal. Most struggle daily to communicate with those who don't know sign language. Existing solutions require: 💸 Expensive hardware ($500-$2000) 🌐 Constant internet connection 🔓 Privacy sacrifice (video uploads to servers)
I asked myself: What if anyone could communicate instantly using just their webcam—no downloads, no servers, no barriers?
SignLand was born from this question. I wanted to build a tool that respects privacy, works offline, and gives everyone a voice.
What it does
SignLand translates sign language and hand gestures into spoken words in real-time, entirely in your browser.
✨ Core Features: Feature Description Impact 👍 Gesture Recognition Common signs (thumbs up, peace, "I love you") Instant phrase communication 🔤 ASL Alphabet Spell words letter-by-letter Unlimited vocabulary 🤖 Smart Mode AI refines gestures into natural sentences Professional grammar 🌍 10+ Languages English, Spanish, Hindi, French, German, etc. Global accessibility 🔒 Privacy-First 100% local processing Zero video uploads ⚡ <400ms Latency Faster than a blink Real conversation flow 📱 Offline PWA Install as mobile/desktop app Works without internet
🎬 Real-World Use Cases:- ✓ Non-verbal students participating in classrooms ✓ Deaf individuals ordering at restaurants ✓ Emergency communication when voice isn't possible ✓ Cross-language accessibility bridges ✓ Silent environments (libraries, hospitals)
How we built it
🏗️ System Architecture:- ┌─────────────┐ │ Camera │ 30 FPS video stream │ (Webcam) │ └──────┬──────┘ │ ▼ ┌─────────────────────────┐ │ MediaPipe Hands │ Detects 21 hand landmarks │ (Hand Tracking AI) │ (x, y, z coordinates) └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ Gesture Classifier │ Recognizes ASL letters │ (Custom Algorithm) │ & common gestures └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ Text Buffer │ Builds words from │ (Word Builder) │ detected letters └──────────┬──────────────┘ │ ▼ ┌─────┴─────┐ │ │ ▼ ▼ ┌────────┐ ┌──────────────┐ │ Fast │ │ Smart Mode │ (Optional) │ Mode │ │ Gemini API │ Grammar refinement └───┬────┘ └──────┬───────┘ │ │ └──────┬───────┘ │ ▼ ┌─────────────────────────┐ │ Web Speech API │ Text-to-speech │ (Speech Synthesis) │ in 10+ languages └──────────┬──────────────┘ │ ▼ 🔊 Spoken Output
💻 Tech Stack: Frontend: Next.js 16 (App Router) + React 19 TypeScript (type-safe development) Tailwind CSS + Shadcn/UI (beautiful, accessible components) Framer Motion (smooth animations)
AI/ML: MediaPipe Hands (Google's 21-landmark hand tracking) Google Gemini API (grammar refinement, context understanding) Custom gesture classifier (angle-based + confidence scoring)
Speech & PWA: Web Speech API (text-to-speech synthesis) next-pwa (service workers for offline support) WebAssembly (5MB cached MediaPipe models)
Infrastructure: Vercel (global edge deployment) Clerk (secure authentication) GitHub (version control + CI/CD)
Challenges I ran into
⚠️ Challenge #1: Similar ASL Letters Problem: Letters like M vs N, and A vs S looked almost identical to the AI
Initial accuracy: 65% ❌
My Solution:
Added precise angle measurements between fingers
Required gestures to stay consistent for 3 frames
Added 500ms cooldown between detections
Final accuracy: 92% ✅
⚠️ Challenge #2: Mobile Performance Problem: MediaPipe lagged on phones (only 15 FPS)
My Solution: Desktop: Use high resolution (1280x720) Mobile: Use lower resolution (640x480) Result: Smooth 30 FPS on all devices ✅
⚠️ Challenge #3: False Detections Problem: Random hand movements triggered wrong gestures (40% false positives!)
My Solution: Only accept gestures with 85%+ confidence Gesture must appear in 3 consecutive frames 500ms cooldown between detections
Result: False positives dropped to 5% ✅
⚠️ Challenge #4: Offline vs AI Dilemma Problem:
Gemini API makes grammar perfect But it needs internet My Solution: Built two modes users can choose:
Mode How It Works Best For Fast Mode 100% offline, instant Quick phrases, privacy-critical situations Smart Mode Uses Gemini API online Professional communication, natural grammar ⚠️ Challenge #5: Cross-Browser Voice Quality Problem: Speech sounded different on every browser:
Chrome: Perfect ⭐⭐⭐⭐⭐
Firefox: Robotic ⭐⭐⭐
Safari: Broken ⭐⭐
My Solution: Detect available voices on each browser Rank them by quality Always pick the best available Result: Consistent experience everywhere ✅
🏆
Accomplishments that we're proud of
📊 What I Achieved: ⚡ Sub-400ms latency (gesture → speech)
🎯 92% ASL accuracy (tested with real users)
📱 100% mobile responsive (works on any screen)
🔒 Zero server uploads (your video never leaves your device)
🌐 10+ languages supported
📦 Only 5MB download (PWA with offline support)
🚀 Production-ready (deployed and battle-tested)
What we learned
🧠 Technical Skills: ✅ Real-time ML in browsers is powerful: WebAssembly makes desktop-class AI possible Service Workers enable true offline functionality
✅ MediaPipe optimization requires balance: High resolution = better accuracy but slower Low resolution = faster but less accurate
Solution: Adjust based on device type
Log in or sign up for Devpost to join the conversation.