Argus

Screenshot of the Argus service running with a green checkmark and the live URL.
Screenshot Argus mid-response with the camera feed visible and the kitchen response in the transcript.
Screenshot of the full app screen with the waveform active and a greeting message visible in the transcript.
Screenshot showing multiple ⚙ tool_name lines in the transcript — shows the agentic behavior happening in real time.

Inspiration

The best moments to have AI help you are the worst moments to type. Hands covered in flour, staring at a leaking pipe under the sink, pushing a grocery cart and trying to remember what you already have at home. Every existing AI assistant makes you stop what you're doing, pick up your phone, and type out a question.

We wanted to build something different — an AI that fits into your life instead of interrupting it. Something that sees what you see, hears what you hear, and already knows who you are before you say a word. That's Argus.

The name comes from Argus Panoptes — the hundred-eyed giant of Greek mythology, the all-seeing guardian. It felt right.

What We Built

Argus is a real-time AI life companion that runs on your phone. It uses your camera and microphone to observe the world around you and responds by voice — no typing, no menus, no friction.

Seven specialized agents:

🍳 Kitchen — Sees your ingredients, suggests recipes, sets timers by voice
🛒 Shopping — Voice-controlled shopping list with add, check-off, and query
🔧 Fix-It — Point at something broken, get a step-by-step repair guide
🧠 Memory — Remembers your preferences, allergies, and goals across sessions
🌤️ Weather — Real-time weather grounded to your actual location
🍽️ Restaurant — Finds restaurant info and websites by voice
🔍 Web Search — Grounds answers in live web data, no hallucinations When you connect, Argus speaks first — greeting you with the weather, the time, and anything it remembers about you. You don't have to say a word.

How We Built It

Frontend: A single-page PWA that streams 16kHz PCM audio and 640×480 JPEG frames over WebSocket to the backend every 2 seconds. No native app required — works in any mobile browser.

Backend: Node.js + Express on Google Cloud Run. A WebSocket server bridges the client to the Gemini Live API session. Tool calls are dispatched with Promise.all for parallel execution and sent back to Gemini via sendToolResponse.

AI: We use gemini-2.5-flash-native-audio-preview-12-2025 — native audio in, native audio out. No text-to-speech pipeline, no latency tax. The voice is the model.

Memory: Google Cloud Firestore stores per-user documents — preferences, shopping lists, daily logs, and observation history. Each browser gets a UUID from localStorage, giving every user their own persistent memory with zero authentication overhead.

Grounding: Three real-time grounding sources:

Open-Meteo for weather (no API key required)
DuckDuckGo Instant Answer API for web search (no API key required)
Restaurant lookup with Google Maps fallback

Infrastructure: Fully automated with Terraform IaC and a single deploy script

(`deploy-cloudrun.sh`). Cloud Run scales to zero when idle — no idle cost.

Challenges

Silent Firestore failures. The Firestore SDK constructs successfully even with missing credentials — it only fails on the first actual read/write, and those errors were being silently swallowed in catch blocks. The memory appeared to work but nothing was actually being saved. Fixed with upfront credential detection (file existence check locally, ADC on Cloud Run) and a test read on initialization to catch failures at startup instead of silently mid-session.

Async tool calls in a synchronous callback. The Gemini SDK's onmessage callback isn't awaited by the SDK itself. Tool calls are async operations, but the callback context is synchronous — unhandled promise rejections were causing silent failures. Solved with an async IIFE pattern inside the callback with a .catch() handler. IP geolocation behind Cloud Run. Cloud Run sits behind Google's load balancer, so req.socket.remoteAddress always returns an internal IP. The real client IP comes in via X-Forwarded-For. Had to parse and sanitize the header, with fallback to environment variable defaults for local development.

Accomplishments

Persistent memory that actually works in production, verified across sessions
Proactive greeting — Argus speaks first on connect, no user prompt required
14 functional tools across 7 agents, all live in production
Auto-geolocation — weather is always accurate with zero user setup
One-command deployment with Terraform + shell script
Zero extra API keys beyond Gemini — judges and users can run it immediately

What We Learned

The hardest part of building with the Gemini Live API wasn't the API itself — it was the surrounding infrastructure. Streaming audio at the right sample rate, sequencing tool calls correctly, handling session lifecycle, making Firestore actually persist in a Cloud Run environment — these are the unsexy problems that determine whether a demo works or doesn't.

We also learned that the most impressive AI experience isn't the most complex one. The single moment that lands hardest in demos is when Argus speaks first. One proactive greeting communicates more about what the system is capable of than five minutes of documentation.

What's Next

Native mobile app (React Native) with background listening
Wearable form factor — Argus as a clip-on camera companion
Calendar and email awareness for proactive reminders
Proactive push notifications ("You have eggs expiring tomorrow")
Image memory — storing and recalling visual context across sessions

Built With

duckduckgoinstantanswerapi
express.js
gemini2.5flashnativeaudio
geminiliveapi
googlecloudbuild
googlecloudfirestore
googlecloudrun
javascript
node.js
open-meteoapi
pwa
terraform
websockets

Updates

Rollie2347 O'Malley started this project — Mar 16, 2026 07:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.