CodeBridge — Project Story

Inspiration

Pair programming is one of the best ways to write software. Two people, one screen, ideas flowing back and forth. But it's built around voice. When you say "refactor that function" or "what if we try a different approach," you're sharing more than words — you're sharing tone, pace, and the energy of the moment.

For deaf developers, that flow gets interrupted. Text chat is slow. Interpreters are expensive and often don't know the difference between a "function" and a "method." The magic of pair programming — that feeling of thinking together — gets lost in translation.

We wanted to change that.

The moment it clicked

The idea came from a simple question: What if we could bridge the gap without losing the speed? Not by replacing voice with something worse — by making both sides feel natural. The hearing developer speaks, and the deaf developer sees it in real time as captions. The deaf developer types or signs, and the hearing developer hears it as speech. Both see the same code, edit it together, and stay in sync. No more waiting for messages to type out. No more "what did they mean by that?" The conversation flows.

What it does

CodeBridge is a real-time communication agent for deaf and hearing developers. It sits between two developers and keeps the conversation flowing in both directions.

Hearing developer speaks → The deaf developer sees live captions.
Deaf developer types or signs → The hearing developer hears it as natural speech.
Both share one code editor → When one types, the other sees it instantly.

It's not a translator. It's a bridge. The goal is to keep both people in the same flow — thinking together, building together — without slowdowns or friction.

How we built it

We used Google's Gemini Live API and GenAI SDK as the backbone. The architecture has four agents:

Voice Agent — Processes speech and turns it into captions.
Vision Agent — Interprets sign language and gestures from the camera.
Context Agent — Tracks the shared code editor (planned: resolve "this function" to actual line numbers).
Bridge Agent — Fuses all inputs and produces context-rich captions and speech.

The frontend is React with Monaco Editor and Yjs for real-time code sync. The backend is FastAPI with WebSockets for media and agent streams. We deployed everything to Google Cloud Run with Terraform.

Challenges we ran into

Yjs code sync not working. The WebsocketServer from ypy-websocket must run as an async context manager — we weren't starting it, so every WebSocket connection failed. Fix: use async with websocket_server in the FastAPI lifespan.

Cloud Run multi-instance. When we scaled to 2 instances, users hit different servers and the shared code editor stopped syncing. Fix: set max_instance_count=1 so all users share the same Yjs server.

Cold starts. With min_instance_count=0, the first WebSocket connection timed out during the 30+ second cold start. Fix: set min_instance_count=1 to keep an instance warm.

Voice captions only in one tab. Captions from the Web Speech API were shown locally instead of being sent to the backend. Fix: call sendClientCaption so the backend broadcasts to all connected tabs.

Accomplishments that we're proud of

Full bidirectional flow — Speech → captions, text → TTS, sign → captions, all in real time.
Real-time code sync — Two developers (or two tabs) see the same code, updated live.
One-command deploy — Terraform + Docker script gets the app on Cloud Run in minutes.
Vertex AI integration — Switched from API key to GCP billing to avoid free-tier limits.
Billing alerts — Email notifications when spend hits $60 so we stay in control.

What we learned

ypy-websocket requires the WebsocketServer to run as an async context manager — the docs matter.
Cloud Run is stateless — for shared state (like Yjs rooms), you need a single instance or an external store like Redis.
Cold starts hurt real-time — for WebSockets, keeping min_instance_count=1 is worth it.
Debug endpoints help — /debug/yjs and /debug/caption made troubleshooting much faster.

What's next for gemeni-codebridge

Full ASL support — Video sequences instead of single frames; ASL-specific models for fingerspelling and complex signs.
Code-aware references — Resolve "this function" to actual line numbers in the shared editor.
Private sessions — Unique URLs (e.g. /s/abc123) so each pair gets their own room.
Session summaries — AI-generated recap of decisions and action items.
Confidence indicators — Show alternatives when sign recognition is uncertain.

The core idea stays the same: pair programming should work for everyone. No barriers. No slowdowns. Just two people, one screen, building something together.