Inspiration
Roughly 2 million people worldwide are DeafBlind — they cannot rely on sight or sound. Existing assistive technologies serve one sense at a time: screen readers for the blind, captioning for the deaf. Almost nothing fuses both modalities into a single touch-first interface.
I wanted to build a system where a DeafBlind user could feel their environment in real time:
- A smoke alarm → long pulse on the wrist
- A doorbell → distinct rhythm
- A sign-language wave → Braille word on a display
The goal was a working prototype — not just a concept.
What It Does
Sankalp is a real-time multimodal accessibility engine with three parallel inputs:
- Microphone → Whisper (speech) + YAMNet (sound classification)
- Webcam → MediaPipe (sign-language gesture detection)
- Knowledge queries → Wolfram Alpha + Gemini
All inputs are converted into a unified SemanticEvent:
type, content, urgency (0–10), source, timestamp
Outputs
- 6-dot animated Braille grid (Grade-1 UEB)
- Haptic vibration patterns (via phone)
Haptic Grammar (Core Innovation)
| Pattern (ms) | Meaning | Trigger |
|---|---|---|
[2000] |
Emergency | urgency ≥ 9 |
[120,80,120,80,400] |
Name being called | speech |
[100,80,100,80,100] |
Doorbell / knock | sound |
[600] |
Sign detected | vision |
[80] |
Default notification | other |
Rule: Urgency overrides everything.
How I Built It
Backend (Python + FastAPI)
- Event system with
SemanticEvent - Audio pipeline (Whisper + YAMNet)
- Vision pipeline (MediaPipe)
- Braille encoder (custom Python)
- Haptic encoder (pattern mapping)
- WebSocket system with real-time streaming
Frontend (Next.js + TypeScript)
- Animated Braille grid (SVG)
- Haptic visualizer
- Event feed with urgency colors
- Knowledge query panel
- Emergency full-screen alert
Challenges
liblouisissues on Windows → built custom Braille encoder- No
tflite-runtimesupport → switched toai-edge-litert - MediaPipe API changes → migrated to Tasks API
- Gemini rate limits → built retry + limiter system
- Real-time tradeoff → freshness > completeness
Accomplishments
- Unified SemanticEvent architecture
- Designed Haptic Grammar (core innovation)
- 51 tests passing
- Fully working end-to-end real-time system
What I Learned
- Stub-first development works best
- ML libraries change fast (breaking changes)
- Urgency > confidence in accessibility systems
- Free-tier APIs are enough if optimized
What’s Next
- Expand sign-language recognition (20+ signs)
- Add Grade-2 Braille
- Support real Braille hardware (Bluetooth)
- Build wearable version (wristband)
- Enable full offline privacy mode
Built With
- ai-edge-litert
- asyncio
- fastapi
- faster-whisper
- framer-motion
- gemini
- google-genai
- httpx
- mediapipe
- next.js
- numpy
- opencv
- pydantic
- pytest
- python
- react
- sounddevice
- tailwindcss
- typescript
- uvicorn
- web-vibration-api
- webrtc
- websockets
- wolfram-technologies
- yamnet
Log in or sign up for Devpost to join the conversation.