Inspiration

Communication is a fundamental human right, yet 70 million Deaf people worldwide face daily barriers. Traditional text-to-speech tools fail to capture the nuance, emotion, and spatial nature of sign language. Inspired by the multimodal capabilities of Gemini 2.0 Flash, we asked: "Can we build a bridge that not only translates words but understands the physical world?" We wanted to move beyond static dictionary lookups to create a truly intelligent, context-aware interpreter that feels like a human companion.

What it does

SignBridge is a comprehensive AI platform that bridges the gap between Deaf and hearing communities:

Real-Time Translation: Converts spoken language (via voice or text) into fluid, expressive 3D sign language animations. Spatial Awareness (Unique Feature): Using the camera, it detects real-world objects (like a "cup" on your desk) and allows the avatar to spatially reference them—pointing to the actual item during conversation. AI Teaching Agent: A personalized tutor that watches you sign via webcam and provides instant feedback on your handshape, movement, and timing. Multi-Dialect Support: Seamlessly switches between ASL (American), BSL (British), and ISL (Indian) sign languages, respecting grammatical differences. How we built it We built a modern, full-stack application centered around the Gemini API:

Frontend: React + Vite for a snappy UI, with Three.js / React Three Fiber rendering our high-fidelity 3D avatar. Backend: Node.js & Express handling real-time WebSocket connections (Socket.io). AI Core: Gemini 2.0 Flash is the brain. We use it for: Vision: Object detection for spatial awareness. Linguistics: Translating English text into gloss sequences (e.g., converting "How are you?" to "YOU HOW?"). Analysis: Comparing user webcam frames against reference signs for the teaching mode. Deployment: Hybrid architecture with frontend on Cloud components and backend API services. Challenges we ran into Latency vs. Accuracy: Real-time translation needs to be instant. We optimized Gemini prompts and implemented caching to ensure the avatar starts signing while the user is still speaking. Spatial Mapping: Mapping 2D camera coordinates to the 3D avatar's world space was complex. We had to create a coordinate transformation system so when the avatar points "left," it actually points to the object on the user's left. Socket Connectivity: Ensuring stable, bidirectional communication for real-time video frames (for the teaching mode) required fine-tuning our WebSocket configuration, especially across different deployment environments. Accomplishments that we're proud of First-of-its-kind Spatial Awareness: Seeing the avatar correctly point to a "book" on our desk for the first time was a magic moment. It proves that AI translation can be grounded in physical reality. True Multimodality: Successfully integrating text, voice, and video streams into a single seamless experience powered by one model (Gemini). Inclusive Design: Supporting multiple dialects (ASL, BSL, ISL) from Day 1 to ensure we aren't just building for one region. What we learned Prompt Engineering is Key: We learned how to craft specific prompts to get Gemini to output precise animation data JSONs instead of generic text. The Power of Flash: Gemini 2.0 Flash's speed is a game-changer for real-time accessibility tools. It enables interactions that felt impossible with slower models. Accessibility is Complex: Sign language isn't just "hand words"—it's facial expressions, body posture, and space. Capturing this requires deep respect for Deaf culture and linguistic nuance. What's next for SignBridge - AI-Powered Sign Language Translator Mobile App: Porting the experience to a native mobile app for on-the-go usage. Two-Way Translation: Enhancing the sign-to-text capabilities to allow full conversations where the avatar reads the user's signing fluently. AR Integration: Using Augmented Reality glasses to overlay translation directly into the user's field of view. Community Contributions: Opening the platform for the Deaf community to contribute regional signs and refine the avatar's expressions.

Built With

Share this project:

Updates