Inspiration

I'm a data analyst from Lusaka, Zambia. I'm not a full-time developer. This afternoon I was outside testing this app on my brother's phone mine is too slow for real-time AI and I pointed the camera at my gate and asked: "Is my gate locked? Do I look safe?" It described the gate, the car outside, the area around me, and gave me step-by-step safety advice based on what it actually saw. I built SightLine in one week. That moment outside is why. There are hundreds of millions of people who don't see clearly. Not completely blind but people with low vision, cataracts, glaucoma, age-related sight loss. People who hold their phone close to read a label. People who can't make out small print. People who left their glasses at home. These people deal with small frustrating moments every single day. SightLine helps with that.

What it does

You open it in your mobile browser, press START, point your camera at anything, and ask what you want to know out loud. Gemini responds immediately with a clear voice description. No download. No account. Works on any phone in any browser. Things I tested this week:

Pointed at a medicine bottle and a strip of pills Gemini described both accurately. When I asked if it was safe to take them, it told me clearly that it can't give medical or prescription advice and I should consult a doctor. That's the right response. The app helps you identify what you're holding, not replace a doctor. Pointed at a laptop screen - Gemini read the text on screen and described what was open. Pointed outside at my gate - Gemini described the surroundings and gave me safety advice when I asked.

You can ask follow-up questions. It holds the conversation context. You can also switch between front and back camera mid-session.

How we built it

Frontend: Next.js 14 on Google Cloud Run Backend: FastAPI with WebSocket on Google Cloud Run AI: Gemini 2.0 Flash Live via Google GenAI SDK on Vertex AI (us-east4) Audio: Real-time PCM16 audio from the browser, streamed over WebSocket Video: JPEG frames captured every 1.5 seconds and sent alongside the audio Infrastructure: Cloud Build for CI, min-instances=1 so there are no cold starts

Audio and video go to Gemini continuously. Voice audio comes back and plays in real time.

Challenges we ran into

Three things nearly stopped this project. First, IAM permissions. Getting Cloud Run and Cloud Build the right access to Artifact Registry took half a day. The documentation doesn't give you the exact combination you need upfront. Second, Next.js environment variables. Next.js locks in your environment variables when you build, not when you run. So if your backend URL isn't in the build, your frontend will always try to connect to localhost in production. Wasted hours on this. Fixed it by hardcoding the WebSocket URL directly. Third, the audio pipeline. Getting clean real-time audio from a phone browser, stopping it from picking up Gemini's own voice and feeding it back, and keeping the session alive without dropping that's three problems at once. Fixed with a speaking state that mutes the mic while Gemini talks, a short cooldown before reopening, and a WebSocket keepalive every 20 seconds. I'm also in Lusaka, Zambia. My server is in Virginia, USA. That distance adds real latency to every response. The app works despite it. When Gemini Live reaches African GCP regions it'll be noticeably faster.

Accomplishments that we're proud of

SightLine works. It's live, it's deployed on Google Cloud Run, and a real person can open it on their phone right now and use it. For someone who came into this as a data analyst with no prior experience in Vertex AI, Cloud Run, or real-time audio streaming, getting here in one week feels significant. During testing I held up two medicines, a bottle and a strip of pills. Gemini described both accurately. When I asked if it was safe to take them, it told me it can't give medical advice and to consult a doctor. That's responsible AI behavior, not just impressive tech. I also pointed the camera at my gate outside and asked if I looked safe. Gemini described the scene and gave me safety advice based on what it actually saw. That moment made the whole project feel real.

What we learned

I came into this as a data analyst. I left with real experience in Vertex AI, Cloud Run, Docker, WebSocket audio streaming, and Cloud Build pipelines, all learned under deadline pressure while working my regular job. The AI part was honestly not the hardest bit. The infrastructure, the browser APIs, and the edge cases took most of the time.

What's next for SightLine

Alerts for hazards without needing to ask Audio-guided onboarding so the app is usable without reading the screen Bigger, higher-contrast UI for users with low vision Support for local Zambian languages Native mobile app for better performance

Built With

  • fastapi
  • gemini-live-api
  • google-cloud-run
  • next.js
  • python
  • typescript
  • vertex-ai
  • web-audio-api
  • websocket
Share this project:

Updates