Inspiration
My younger brother is deaf, and I have seen how difficult it can be for him to join spontaneous conversations with people around him. I wanted to build something that removes that barrier and lets him participate in real time conversations.
Gemini API Live’s multimodal capabilities made that vision possible. It gave us a way to process live video for sign language interpretation, convert speech to text, and generate natural audio response for real time conversation. That is what led to SignBridge Live.
What it does
SignBridge Live enables real-time, two-way communication between deaf and hearing users:
- Deaf user signs on camera → the app interprets and outputs speech/text in real time.
- Hearing user speaks → the app transcribes and converts speech into text and sign-ready output for avatar in real time.
The result is a seamless conversation flow where both people can participate naturally.
How we built it
We built SignBridge Live as a full-stack, real-time multimodal system:
- Frontend: React + TypeScript + Vite with custom hooks for microphone capture, audio playback, avatar rendering, and WebSocket transport.
- Backend: FastAPI with a low-latency WebSocket pipeline for streaming audio/video chunks.
- AI orchestration: Google ADK (LlmAgent + Runner) integrated with the Gemini Live API on Vertex AI to manage real-time gloss generation, sign interpretation, and ambient audio understanding.
- Speech services: Cloud Speech-to-Text for transcription and Cloud Text-to-Speech for spoken output.
- Sign pipeline: Gloss/animation planning service to drive sign-language avatar behavior.
- Realtime collaboration: Session create/join/leave flow, broadcast to participants in a session, reconnect-safe client state, and QR/shareable session links.
- Infrastructure: Dockerized services with Google Cloud–ready architecture (Cloud Run + Vertex AI path).
Challenges we ran into
- Keeping end-to-end latency low while preserving transcript quality required a careful chunking and buffering strategy.
- WebSocket lifecycle issues in React StrictMode caused connection handshakes and cleanup edge cases.
- Making session behavior reliable for multi-user join/leave without accidentally closing active conversations.
- Avoiding hallucinations in sign interpretation by sending real multimodal frame bytes and enforcing strict JSON outputs.
- Dependency compatibility during ADK migration (especially FastAPI/version constraints) while keeping the stack stable.
Accomplishments that we're proud of
- Built a functioning bidirectional communication loop (hearing → deaf and deaf → hearing).
- Implemented real-time modes for Hearer, Deaf, and Radio/Ambient interpretation.
- Added robust WebSocket behavior: reconnect handling, pending message flush, session acknowledgment flow, and session broadcast.
- Designed a clear user experience for shared conversations with session IDs and quick join flows.
- Delivered a well-structured architecture with modular services, stream handlers, and documented deployment paths.
What we learned
- In multimodal systems, data fidelity matters as much as model quality: how frames/audio are packaged directly affects output quality.
- Real-time accessibility products need strong interruption handling and state synchronization, not just model accuracy.
- Prompt design with strict schemas dramatically improves reliability for downstream animation/audio pipelines.
- Small UX details (auto-rejoin, transcript retention, clear mode boundaries) make a huge difference for real users.
- Building for accessibility is most effective when grounded in lived experience and constant practical testing.
What's next for SignBridge Live
- Open source SignBridge live
- Add a sign language avater that can seamlessly show sign languages from audio
- Add multilingual support for both speech and sign interpretation contexts.
- Expand shared conversation features for classrooms, families, and public service scenarios.
- Launch pilot deployments with deaf communities and gather structured usability feedback.
- Add production-grade observability, safety checks, and performance benchmarking to consistently stay under real-time latency targets.
Log in or sign up for Devpost to join the conversation.