SIGNAL: Real-Time Agentic AI Co-Pilot for Deaf and Hard-of-Hearing Developers

Inspiration

Modern software engineering happens in meetings.

Architecture reviews. Standups. Incident calls. Design debates.
Critical decisions are made in seconds, often implicitly through fast-paced speech, interruptions, and unstated assumptions.

For deaf and hard-of-hearing developers, these meetings are not merely inconvenient; they are structurally inaccessible.

Most existing solutions stop at transcription. But captions are literal, laggy, and context-blind. They miss urgency, intent, contradictions, and expectations. By the time a caption is read, the moment to respond has already passed, and participation is lost.

When Gemini 3 launched with low-latency multimodal reasoning and live audio understanding, it unlocked a new question for us:

What if accessibility wasn’t about hearing the words, but understanding the meaning, responding at the right moment, and contributing instantly?

SIGNAL is the answer.

It’s built on the belief that true accessibility is cognitive, not surface-level, and that AI should help people participate as equals, not just keep up.

The Problem SIGNAL Addresses

Modern software engineering is driven by live, spoken collaboration.
Key technical decisions are made rapidly through discussion, interruption, and implicit agreement, often without explicit summaries or written follow-ups.

For deaf and hard-of-hearing developers, this creates a systemic disadvantage:

Captions are delayed and literal.
Important intent (urgency, expectation, contradiction) is lost.
By the time context is understood, the moment to respond has passed.
Participation becomes passive instead of equal.

The core problem is not lack of transcription,
it is lack of real-time semantic understanding and technical response support.

The Solution SIGNAL Provides

SIGNAL transforms live spoken meetings into semantic awareness, visual explanation, and actionable code.

Instead of transcribing everything, SIGNAL:

Detects when something important is happening.
Understands what it means in a technical context.
Helps the user respond at the right moment with the right artifacts.

SIGNAL acts as a real-time AI co-pilot that listens, reasons, draws, and codes, allowing deaf and hard-of-hearing developers to participate confidently, professionally, and on equal footing.

How SIGNAL Uses Gemini 3

SIGNAL is deeply built around Gemini 3’s low-latency, multimodal intelligence, orchestrating specialized agents for different meeting needs.

Gemini 3 is used as a live reasoning engine:

Gemini 3 Flash Preview Powers ultra-low-latency semantic signals such as:
- Decision detection
- Requests for input
- Risk acceptance
- Architectural contradictions
Gemini 3 Pro Preview (Reasoning Agent) Handles deeper contextual reasoning, including:
- Multi-minute conversation memory
- Intent inference
- Context-aware response generation
- Professional, role-aligned suggested replies
Gemini 3 Pro Preview (Code Agent) A specialized polyglot agent that:
- Listens for technical specifications (schemas, APIs, entities)
- Instantly generates production-ready code
- Outputs simultaneously in Java, Python, and Go to bridge cross-functional team gaps
Gemini 3 Pro Image Preview (Nano Banana Pro) Generates real-time visual explanations for:
- Architecture discussions
- System flows
- Abstract technical concepts
- Diagram-based reasoning

Together, these models allow SIGNAL to operate in real time—listening, understanding, visualizing, and coding while the meeting is still happening.

What it does

SIGNAL is a real-time AI co-pilot that runs alongside live meetings
(Google Meet, Zoom, Microsoft Teams) and transforms spoken technical conversations into semantic signals, visual diagrams, and live code, specifically designed for deaf and hard-of-hearing engineers.

Instead of showing raw captions, SIGNAL continuously answers the questions that matter most in a live meeting:

Is a decision being made right now?
Am I expected to respond?
What does this architecture look like?
How do we implement this data structure?

Core Capabilities

1. Real-Time Semantic Awareness (Beyond Transcription)

SIGNAL listens to the live audio stream and uses Gemini 3 Pro Preview to infer intent, not just words.

It detects and classifies high-impact moments such as:

Decision points (“Let’s go with…”, “We’ll ship it this way”)
Requests for input (“Any objections?”, “Thoughts from backend?”)
Risky engineering choices (“We’ll hotfix it for now”)
Architectural contradictions (“Earlier we agreed on X, now Y”)

These moments are surfaced instantly as short, glanceable semantic signals, not paragraphs of text.

2. Context-Aware Alerts (Situational Awareness)

When a high-impact moment occurs, SIGNAL displays concise alerts like:

“🟢 Your input is expected”
“⚠️ Technical debt decision detected, risk: medium”
“🔁 This contradicts earlier database design”
“✅ Decision being finalized”

This allows users to react during the meeting, not after it ends.

3. ✨ AI-Generated Response Assist

When SIGNAL detects that the user’s input is required, it doesn’t stop at alerting them.

Using Gemini 3, SIGNAL generates a suggested response based on:

The last several minutes of conversation
Technical context and prior decisions
The user’s role (backend, frontend, infra, etc.)
The tone of the meeting (exploratory vs decisive)

A persistent response card appears:

💬 Suggested Response (Ready to Send) > “I’m a bit concerned about the hotfix approach.
Given the recent cache issues, this could increase rollback risk.
Would it make sense to add a temporary guard or feature flag?”

Key design principles:

Never auto-sends; user remains in control.
Clearly marked as AI-suggested.
Hand gesture copy or send.
Does not disappear until dismissed.

This transforms SIGNAL from passive awareness into active participation.

4. Gesture-Based Interaction (Hands-Free Support)

SIGNAL integrates MediaPipe hand-gesture recognition.

Raising a thumb copies the suggested response instantly.
Users can paste it directly into meeting chat.
Alternatively, they can read it aloud if they prefer.

This removes friction and preserves timing—the most critical factor in live discussions.

5. 💻 Real-Time Polyglot Code Generation (The Engineering Bridge)

Software meetings often involve cross-functional teams (e.g., Backend, Data Science, DevOps) discussing the same data structures in different languages.

When SIGNAL detects technical specifications (e.g., "We need a User entity with an ID and nullable email"), it triggers the Gemini 3 Code Agent.

What happens: The agent instantly generates ready to use code with the implementation of that entity.
The Polyglot Advantage: It generates the code in Java, Python, and Go simultaneously.
The Impact: This bridges the gap between teams, allowing the deaf and hard-of-hearing developer to instantly see the exact code being discussed in their preferred language, and even share it back to the team.

6. Visual Understanding with Gemini 3 Pro Image Preview (Nano Banana Pro)

When a discussion references:

Architecture diagrams
System flows
Feature explanations
Abstract technical concepts

SIGNAL uses Gemini 3 Pro Image Preview (Nano Banana Pro) to generate high-level visual explanations in real time.

Instead of missing context, users receive a clear visual representation that makes abstract discussions immediately understandable.

How we built it

We engineered SIGNAL as a real-time, multimodal AI system designed specifically for live technical conversations. The architecture prioritizes continuous understanding and parallel execution, enabling the system to listen, reason, visualize, and code simultaneously.

Architecture Overview

Backend
- Java & Spring Boot: Robust event-driven architecture.
- Async Agent Orchestration: We use Spring @Async to run multiple Gemini agents (Visual, Code, Semantic) in parallel without blocking the audio stream.
- WebSocket Streaming: Ensures zero-latency delivery of signals to the frontend.
Infrastructure (Enterprise-Grade)
- Google Cloud Run: We fully containerized the application using Docker and deployed it to Google Cloud Run.
- Why this matters: Utilizing a fully managed serverless platform ensures automatic scaling (handling high traffic during team meetings) and high availability, while strictly adhering to security best practices (Environment Variables for secrets).
AI Layer (The Gemini Mesh)
- Gemini 3 Pro Preview: The core reasoning brain for intent detection and code generation.
- Gemini 3 Flash Preview: Handles high-velocity, low-latency signal detection.
- Gemini 3 Pro Image Preview (Nano Banana Pro): The dedicated visual agent for architecture diagrams.
Frontend
- React.js & Next.js: A high-performance UI.
- Syntax Highlighting: Dynamic code blocks for the polyglot code generation feature.
- Minimalist Design: Optimized for "glanceability" during busy meetings.
- React-Toast: For notification of copied response using hand gesture.
Interaction
- MediaPipe: For real-time, privacy-first hand gesture recognition.

Live audio is processed continuously, semantic signals are inferred in real time, and artifacts (diagrams/code) are generated asynchronously only when high-impact triggers are detected.

Architecture Diagram

Signal System Architecture

Challenges we ran into

1. Latency vs Accuracy

Live meetings leave no room for delay.
We had to balance semantic depth with ultra-low latency, using Gemini 3 Flash for instant signals and reserving Gemini 3 Pro for deeper reasoning tasks like code generation.

2. Orchestrating Multiple Agents

Managing race conditions between the "Visual Agent" and "Code Agent" was complex. We implemented atomic debouncing and thread-safe logic in our Spring Boot backend to ensure that mentioning "Database Architecture" didn't trigger duplicate or conflicting AI tasks.

3. Intent Detection Is Hard

Detecting implicit decisions and expectations required careful prompt design and contextual memory across several minutes of conversation.

4. Visual Understanding in Real Time

Generating helpful images without distracting from the meeting required tight control over when and how visuals appear.

Accomplishments that we’re proud of

Infrastructure Mastery: Successfully deploying a containerized Spring Boot application to Google Cloud Run, proving the system is production-ready and scalable.
Multimodal Synthesis: Successfully combining Audio-to-Text, Audio-to-Image, and Audio-to-Code in a single real-time pipeline.
Polyglot Engineering: Building a tool that doesn't just "hear" code but writes it in 3 languages simultaneously.
Real-Time Participation: Enabling deaf developers to not just follow along, but to lead.
Gesture Control: Integrating seamless hands-free interaction.
Design: Creating a UI that prioritizes situational awareness over information overload.

Most importantly: SIGNAL gives deaf and hard-of-hearing developers their moment back.

What we learned

Accessibility is fundamentally about timing and agency.
Raw transcripts are insufficient for real-world collaboration.
Multimodal AI is most powerful when it listens, reasons, and acts.
Low-latency reasoning unlocks entirely new categories of assistive technology.
The most helpful AI fades into the background until it matters.

What’s next for SIGNAL

Role-specific customization (backend vs infra vs frontend).
Team-level decision memory and post-meeting summaries.
Deeper integration with meeting platforms (Google Meet API).
Expanding the Polyglot Agent to support more languages (Rust, TypeScript).
User testing with deaf and hard-of-hearing engineers.

SIGNAL doesn’t help people hear better.
It helps them participate fully and confidently in real time.

This is accessibility in the Action Era. Welcome to Signal.

Built With

docker
gemini-3-flash-preview
gemini-3-pro-preview
google-cloud
google-cloud-run
java
mediapipe
nano-banana-pro
nextjs
react
react-toast
spring-boot
vertex-ai
websocket

Updates

Private user started this project — Jan 24, 2026 03:34 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.