SignSense AI
A real-time, empathetic, and educational bridge between the hearing and Deaf communities
💡 Inspiration
Communication is a fundamental human right, yet spontaneous interactions between hearing individuals and Deaf signers are still blocked by an invisible “silence barrier.” In moments ranging from ordering coffee to medical emergencies, existing tools fail to capture emotion, urgency, and intent—core components of sign language.
We set out to build more than a translator. SignSense AI interprets meaning, including facial expressions, body language, and gesture intensity, while also educating hearing users on how sign language communicates nuance.
🏗️ What We Built
SignSense AI is a bidirectional, real-time communication system that translates between sign language and text, while teaching users why a sign means what it means.
It functions as an empathy bridge, not just a conversion tool.
⚙️ How We Built It
Multimodal Interpretation (Sign → Text)
Powered by Google Gemini 3 Flash
Analyzes video streams for:
- Handshape and motion
- Facial expressions (eyebrows, gaze)
- Gesture speed and amplitude
Detects emotional sentiment and urgency level
Outputs structured JSON → text
This allows the system to distinguish between:
“I need help” vs “I NEED HELP NOW”
Procedural 3D Signing Engine (Text → Sign)
- Built with Three.js
- Converts typed input into ASL gloss choreography
Drives a fully articulated 3D avatar using:
- Skeletal animation
- Inverse kinematics
- Procedural finger control (15+ joints per hand)
No pre-recorded animations—every sign is generated dynamically.
Joint motion is computed using linear interpolation:
$$ \theta_t = \text{Lerp}(\theta_\text{current}, \theta_\text{target}, \alpha) $$
📚 What We Learned
- Translation alone is insufficient—speed, size, and facial expression change meaning
- Deaf communication relies heavily on non-manual markers
- Embedding education into interaction improves accessibility and understanding
🚧 Challenges
- Video Latency: Solved by optimizing Gemini Flash inference and enforcing fast, structured outputs
- 3D Articulation: Mapping abstract AI instructions to a fully procedural hand rig required intensive math and debugging
- Accessibility UX: Balancing fast conversations with deep educational insights
🔁 Key Features
- Sign → Text: Video input → text + emotion + urgency
- Text → Sign: Input → ASL gloss → real-time 3D signing
Dual Modes:
- Quick Conversation for speed
- Understand & Learn for education and cultural context
🏆 Accomplishments
- Built a fully procedural 3D signing avatar
- Created a multimodal pipeline that understands emotion and urgency
- Enabled real-time, empathetic communication for critical use cases
🚀 What’s Next
- Google Veo 3 Integration: Enable more realistic, expressive signing motion for higher-fidelity responses
- Smarter Signing AI: Improve text → sign accuracy, including speed, emphasis, and facial markers
- Mobile Support: Make SignSense AI fully accessible on smartphones for everyday use
- Voice Integration: Add voice option to generate reply * Voice → Sign:
- Real-World Testing: Validate performance in real environments like clinics and public services

Log in or sign up for Devpost to join the conversation.