Inspiration
For the 450 million Deaf and hard-of-hearing people worldwide, navigating a hearing-centric world often means being a spectator in your own life. Whether it's a medical emergency, a complex administrative task, or a casual chat at a cafe, the barrier isn't just "hearing"—it's nuance, speed, and emotion.
Current accessibility tools are stuck in the past. They are static "transcription" apps that output robotic, cold speech. They force the user to type out long sentences while the world waits impatiently.
We wanted to build the Universal Translator for Context. We asked: What if an AI could predict what you need to say before you even type it? What if the interface itself was alive, adapting to the conversation in real-time? OmniBridge AI was born from the desire to replace static menus with a "Polymorphic Interface" that gives Deaf users the speed and emotional depth they deserve.
What it does
OmniBridge AI is a real-time multimodal communication bridge. It acts as an intelligent mediator between a Deaf user and the hearing world.
- It Understands Context (The Brain): Using Google Gemini 3, the app listens to the conversation and analyzes the visual environment. It understands if you are at a doctor's office, in a shop, or asking for directions.
- It Generates the UI (The Interface): Instead of navigating through complex menus, the app uses Generative UI to instantly create the exact response buttons needed for the current moment. The interface morphs continuously as the conversation evolves.
- It Speaks with Emotion (The Voice): When the user selects a response, ElevenLabs converts it into high-fidelity, human-like speech. It’s not just about saying the words; it’s about conveying the right tone—whether it’s urgency, politeness, or relief.
How we built it
We prioritized a "No-Build" architecture to ensure the application is lightweight, incredibly fast, and easily deployable as a PWA anywhere in the world.
🧠 AI Orchestration (Google Cloud)
- The Logic: We utilized
gemini-3-flash-previewvia the Google GenAI SDK. We chose "Flash" specifically for its blazing-fast inference speed (<500ms), which is critical for maintaining a natural conversational flow. - Prompt Engineering: We designed a dynamic system prompt that acts as a "Conversation Architect," injecting context and enforcing strict JSON schemas for the generated UI components.
- Multimodality: We leverage Gemini's vision capabilities to analyze incoming video/image data, allowing the AI to "see" documents or objects and suggest relevant questions.
🗣️ Voice Synthesis (ElevenLabs)
- The Voice: We integrated the ElevenLabs API using the
eleven_turbo_v2_5model. This model is optimized for ultra-low latency streaming, meaning the voice starts playing almost the instant the user taps a button. - Streaming: Audio is handled via Blob streaming to prevent UI freezes.
💻 Frontend & Architecture
- Framework: Built with React 19, imported natively via ES Modules (
esm.sh). This allowed us to iterate rapidly without complex build steps. - UX/UI: Styled with Tailwind CSS using a "Glassmorphism" design system to create a modern, distraction-free interface.
- PWA: The app is a fully offline-capable Progressive Web App, ensuring accessibility even on mobile networks.
Challenges we ran into
- The "GenUI" Consistency: Making a Large Language Model (LLM) act as a reliable UI designer was difficult. We had to implement robust error handling to ensure the JSON returned by Gemini was always valid and renderable by React, even when the conversation became chaotic.
- The Latency Battle: A conversation happens in milliseconds. Chaining an LLM (Gemini) + a TTS Engine (ElevenLabs) creates potential lag. We optimized this by parallelizing requests and using the fastest available models (Flash & Turbo).
- Context Management: Teaching the AI to know when a topic has changed (e.g., moving from "Greetings" to "Business") required fine-tuning our sliding window context strategy.
Accomplishments that we're proud of
- True Polymorphism: We successfully built an interface that is never the same twice. Seeing the buttons physically morph from "Medical Options" to "Navigation Tools" in real-time is a breakthrough in UX design.
- Seamless Integration: Combining the reasoning power of Google Gemini with the emotional realism of ElevenLabs creates a user experience that feels less like a "tool" and more like a "human extension."
- Performance: Achieving a functional "No-Build" architecture that loads instantly and performs complex AI inference directly from the browser.
What we learned
- Speed is Accessibility: For a Deaf person, the "awkward silence" while an app loads is a major friction point. Optimizing for milliseconds is not just a technical goal; it's an empathy goal.
- Context is King: A translation tool without context is useless. The power of Gemini lies in its ability to understand the situation, not just transcribe the words.
- Generative UI is the Future: We learned that pre-coded interfaces are limiting. The future of accessibility lies in interfaces that build themselves based on user needs.
What's next for OmniBridge AI
- Wearable Ecosystem: Porting the "GenUI" cards to smartwatches (WearOS) for a hands-free experience.
- On-Device AI: Implementing Gemini Nano to handle basic conversation logic directly in the browser, enabling privacy-first, offline communication.
- Emotional Input: Allowing the user to select an "Emotional State" (e.g., Angry, Happy) that dynamically adjusts the ElevenLabs voice output parameters.
Built With
- aistudio
- css3
- eleven-turbo-v2-5
- elevenlabs
- gemini-3-flash-preview
- gemini-3-pro
- google-ai-studio
- google-cloud
- html5
- react
- tailwindcss
Log in or sign up for Devpost to join the conversation.