Second Pair of Eyes

project architecture by a flow diagram
google cloud authentcation
google auth
sample

Inspiration

Many hands-on tasks become difficult when help isn’t available at the moment it’s needed. We wanted to explore how live multimodal AI could act as a true “second pair of eyes” — not just answering questions, but actively guiding users in real time while they work.

What it does

Second Pair of Eyes is a real-time, hands-free AI agent that watches what the user sees and listens to what they say, then provides immediate spoken guidance. The agent can be interrupted, redirected, and asked to clarify, making the interaction feel natural and human-like.

How we built it

The agent is built using Gemini’s live multimodal capabilities and is hosted on Google Cloud. A web interface captures live inputs, which are processed by a cloud-hosted backend using the Google GenAI SDK to generate real-time responses.

Challenges

Designing an agent that behaves proactively — rather than like a simple chatbot — was a key challenge. Ensuring low-latency interaction and handling interruptions reliably required careful architectural choices.

Learnings

This project showed us that the future of AI interaction lies in shared context and real-time understanding. When an AI can see, listen, and respond instantly, it becomes a collaborator rather than a tool.

Built With

fastapi
google-cloud-run
google-gemini-live-api
google-genai-sdk
html
javascript
python
vertex-ai
web-audio-api

Updates

KrishnaVasnani07 Vasnani started this project — Mar 16, 2026 12:45 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.