SignLand AI: Hands → Words → Voice (Gemini 3)

Landing Screen
Sign-In
Sign-up

Inspiration

"It is never my custom to use words lightly... how precious words are." — Nelson Mandela

Over 70 million people worldwide are deaf, hard-of-hearing, or non-verbal. Most struggle daily to communicate with those who don't know sign language. Existing solutions require: 💸 Expensive hardware ($500-$2000) 🌐 Constant internet connection 🔓 Privacy sacrifice (video uploads to servers)

I asked myself: What if anyone could communicate instantly using just their webcam—no downloads, no servers, no barriers?

SignLand was born from this question. I wanted to build a tool that respects privacy, works offline, and gives everyone a voice.

What it does

SignLand translates sign language and hand gestures into spoken words in real-time, entirely in your browser.

✨ Core Features: Feature Description Impact 👍 Gesture Recognition Common signs (thumbs up, peace, "I love you") Instant phrase communication 🔤 ASL Alphabet Spell words letter-by-letter Unlimited vocabulary 🤖 Smart Mode AI refines gestures into natural sentences Professional grammar 🌍 10+ Languages English, Spanish, Hindi, French, German, etc. Global accessibility 🔒 Privacy-First 100% local processing Zero video uploads ⚡ <400ms Latency Faster than a blink Real conversation flow 📱 Offline PWA Install as mobile/desktop app Works without internet

🎬 Real-World Use Cases:- ✓ Non-verbal students participating in classrooms ✓ Deaf individuals ordering at restaurants ✓ Emergency communication when voice isn't possible ✓ Cross-language accessibility bridges ✓ Silent environments (libraries, hospitals)

How we built it

🏗️ System Architecture:- ┌─────────────┐ │ Camera │ 30 FPS video stream │ (Webcam) │ └──────┬──────┘ │ ▼ ┌─────────────────────────┐ │ MediaPipe Hands │ Detects 21 hand landmarks │ (Hand Tracking AI) │ (x, y, z coordinates) └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ Gesture Classifier │ Recognizes ASL letters │ (Custom Algorithm) │ & common gestures └──────────┬──────────────┘ │ ▼ ┌─────────────────────────┐ │ Text Buffer │ Builds words from │ (Word Builder) │ detected letters └──────────┬──────────────┘ │ ▼ ┌─────┴─────┐ │ │ ▼ ▼ ┌────────┐ ┌──────────────┐ │ Fast │ │ Smart Mode │ (Optional) │ Mode │ │ Gemini API │ Grammar refinement └───┬────┘ └──────┬───────┘ │ │ └──────┬───────┘ │ ▼ ┌─────────────────────────┐ │ Web Speech API │ Text-to-speech │ (Speech Synthesis) │ in 10+ languages └──────────┬──────────────┘ │ ▼ 🔊 Spoken Output

💻 Tech Stack: Frontend: Next.js 16 (App Router) + React 19 TypeScript (type-safe development) Tailwind CSS + Shadcn/UI (beautiful, accessible components) Framer Motion (smooth animations)

AI/ML: MediaPipe Hands (Google's 21-landmark hand tracking) Google Gemini API (grammar refinement, context understanding) Custom gesture classifier (angle-based + confidence scoring)

Speech & PWA: Web Speech API (text-to-speech synthesis) next-pwa (service workers for offline support) WebAssembly (5MB cached MediaPipe models)

Infrastructure: Vercel (global edge deployment) Clerk (secure authentication) GitHub (version control + CI/CD)

Challenges I ran into

⚠️ Challenge #1: Similar ASL Letters Problem: Letters like M vs N, and A vs S looked almost identical to the AI

Initial accuracy: 65% ❌

My Solution:

Added precise angle measurements between fingers

Required gestures to stay consistent for 3 frames

Added 500ms cooldown between detections

Final accuracy: 92% ✅

⚠️ Challenge #2: Mobile Performance Problem: MediaPipe lagged on phones (only 15 FPS)

My Solution: Desktop: Use high resolution (1280x720) Mobile: Use lower resolution (640x480) Result: Smooth 30 FPS on all devices ✅

⚠️ Challenge #3: False Detections Problem: Random hand movements triggered wrong gestures (40% false positives!)

My Solution: Only accept gestures with 85%+ confidence Gesture must appear in 3 consecutive frames 500ms cooldown between detections

Result: False positives dropped to 5% ✅

⚠️ Challenge #4: Offline vs AI Dilemma Problem:

Gemini API makes grammar perfect But it needs internet My Solution: Built two modes users can choose:

Mode How It Works Best For Fast Mode 100% offline, instant Quick phrases, privacy-critical situations Smart Mode Uses Gemini API online Professional communication, natural grammar ⚠️ Challenge #5: Cross-Browser Voice Quality Problem: Speech sounded different on every browser:

Chrome: Perfect ⭐⭐⭐⭐⭐

Firefox: Robotic ⭐⭐⭐

Safari: Broken ⭐⭐

My Solution: Detect available voices on each browser Rank them by quality Always pick the best available Result: Consistent experience everywhere ✅

🏆

Accomplishments that we're proud of

📊 What I Achieved: ⚡ Sub-400ms latency (gesture → speech)

🎯 92% ASL accuracy (tested with real users)

📱 100% mobile responsive (works on any screen)

🔒 Zero server uploads (your video never leaves your device)

🌐 10+ languages supported

📦 Only 5MB download (PWA with offline support)

🚀 Production-ready (deployed and battle-tested)

What we learned

🧠 Technical Skills: ✅ Real-time ML in browsers is powerful: WebAssembly makes desktop-class AI possible Service Workers enable true offline functionality

✅ MediaPipe optimization requires balance: High resolution = better accuracy but slower Low resolution = faster but less accurate

Solution: Adjust based on device type