🏆 Agnostic: Redefining Interaction for Field Technicians “What today lives in a smartphone, tomorrow will live in smart glasses.”

💡 Inspiration: The Human-AI Symbiosis in the Physical World When we looked at the current landscape of Artificial Intelligence, we noticed a profound imbalance: AI is rapidly transforming cognitive and desk jobs, leaving many professionals worried about their future. However, physical, unpredictable manual trades—like repairing an HVAC system, fixing a car engine, or rewiring a house—remain strictly human domains. Robots simply aren't ready for the chaos of the real world.

We identified three critical problems that inspired Agnostic:

The Confidence Ceiling for Technicians: Many independent technicians leave high-paying jobs on the table simply because they lack the specific knowledge or confidence to tackle an unfamiliar brand or complex system. They can't stop to type into a chatbot with dirty hands. The Hobbyist Barrier: DIY enthusiasts want to learn and fix things at home (like restoring an old car or fixing an appliance), but fear making dangerous mistakes. They need an expert looking over their shoulder with infinite patience. The AI Job Displacement Paradigm: If AI is replacing desk jobs, can AI be the very tool that reskills and empowers displaced workers to enter highly-skilled manual labor? We believe it can. We formulated a simple equation: $$ Workforce_{augmented} = Human_{adaptability} + AI_{knowledge} $$

This inspired us to build Agnostic, an AI companion disguised as an app today, but designed for the smart glasses of tomorrow.

⚙️ What it does Agnostic is a Multimodal Live Agent built for field technicians and hobbyists. It breaks the "text box" paradigm completely.

When a technician arrives at a job site, they open the app, clip their phone to a vest (or prop it up), and put in their earbuds. From then on, the interaction is 100% hands-free.

See, Hear, and Speak: Agnostic streams audio bidirectionally using the Gemini Live API and captures video frames at 1 FPS. It sees what you see, and talks to you like a colleague. Socratic Learning Coach: For hobbyists, instead of just saying "connect the blue wire to the top pin", Agnostic acts as a patient tutor. It asks guiding questions, uses analogies, and verifies your understanding before proceeding. Visual Precision: If you don't know where the "top pin" is, a specialized Vision Agent analyzes the frame and draws a Proactive Bounding Box directly on your screen. Generative Visual Guides: Need an assembly diagram? Agnostic uses Imagen 3 (via Nano Banana) to generate an overlaid image showing exactly how cables should be routed, using your own camera feed as the base. Mandatory Visual Safety Gate: Before any repair step begins, Agnostic activates a dedicated Safety Agent that performs a visual inspection through the camera. This agent doesn't just warn — it blocks the user from proceeding until it visually confirms that the working conditions are safe. No visual confirmation, no next step. Period. This is a hard gate, not a suggestion. Live Logistics: When a part is identified as broken, Agnostic immediately searches the real world, identifies the part, finds live prices, and pushes interactive MercadoLibre links to your screen via WebSocket. 🧠 The Collective Hive Mind (The Evolution of DIY YouTube Tutorials): How do hobbyists learn today? They watch a 20-minute YouTube video and hope the guy on screen has the exact same model or wiring as they do. Agnostic is the natural evolution of the YouTube tutorial. Every successful repair intervention is analyzed by our AI: the problem, the visual context, and the solution are converted into vector embeddings and stored in a shared RAG knowledge base. When you encounter a problem, Agnostic acts as a hyper-personalized, interactive tutorial that actually looks at your specific machine while drawing from the verified solutions of thousands of technicians worldwide. This eliminates hallucinations and turns the platform into a true collective intelligence. 🛠️ How we built it We built Agnostic using a highly orchestrated, multi-agent architecture powered by the Google GenAI SDK and Google Cloud.

The Brain (ADK Orchestra): Instead of a massive prompt, we used the Agent Development Kit (ADK) to build a hierarchy of 12+ specialized agents.

A Root Agent determines intent and routes to specialists (Electrical, HVAC, Appliance). A Safety Verification Agent acts as a mandatory visual gate — it must confirm safe conditions through the camera before allowing the user to proceed. A Logistics Agent connects to external search tools and our Firestore inventory. A RAG Knowledge Agent stores and retrieves verified repair knowledge from a shared vector database, grounding every answer in real-world interventions. The Nervous System (FastAPI + WebSockets): We hosted the backend on Google Cloud Run. We implemented a custom dual-channel architecture:

A direct raw WebSocket connection from Flutter to the BidiGenerateContentConstrained Gemini Live endpoint for zero-latency 16kHz PCM audio streaming. A parallel WebSocket/HTTP relay to our Python backend to handle tool execution, ADK orchestrations, and UI pushes. The Body (Flutter UI): A native Android app built with Flutter that serves as the visual overlay. It effortlessly captures the camera and microphone, while passively rendering tool results pushed asynchronously from the server (Bounding Boxes, Alerts, Images, Links).

🚧 Challenges we ran into The WebSocket Juggling Act: Maintaining a stable Gemini Live Bidi stream while concurrently intercepting Tool Calls and executing long-running ADK chains (like the Image Generation) without timing out the connection. We had to build a robust Snapshot-on-Demand system. Spatial Blindness in Image Generation: When prompting an image generation model to "draw a wire to the L terminal", it often hallucinates if the terminal letter isn't highly legible in the image. We introduced a Vision Precision Agent that converts visual data into explicit spatial coordinates (e.g., "the top-left screw") to feed the image editor, massively improving accuracy. Barge-in Mechanics: Handling user interruptions naturally required careful state management of our audio PCM playback queue in Flutter. 🏆 Accomplishments that we're proud of True Hands-Free Interaction: The UI is completely reactive. The technician only uses their voice. The Socratic Mentor: Watching Gemini successfully act as a patient teacher rather than an instruction manual is deeply rewarding. It fundamentally changes how humans learn physical skills. The Safety Gate That Actually Blocks: Most AI safety features are passive warnings that users ignore. Our Safety Agent is a mandatory visual checkpoint — it physically blocks the next repair step until it visually confirms safe conditions through the camera. This is a hard architectural decision that prioritizes human life over UX convenience. The Hive Mind Works: Seeing a technician in Argentina benefit from a repair solution that was first discovered by a technician in Mexico — with zero hallucination risk because the knowledge comes from a verified, real-world intervention — is the future we're building. Zero-Latency Feel: By piping audio payload straight from the device mic to the Gemini server, the conversation flows exactly like a human phone call. 📚 What we learned Agentic Hierarchies beat monolithic prompts: Splitting tasks into specialized ADK agents (one for safety, one for vision, one for electrical repairs) produces dramatically more reliable results than asking one LLM to do everything. Grounding is crucial for hardware: Finding a part online isn't enough; matching its specs to the exact model in the camera feed requires rigorous cross-referencing. The future is multimodal out-of-the-box: The sheer power of the Gemini Live API proves that the era of typing into text boxes is coming to an end. 🚀 What's next for Agnostic Because of our modular ADK architecture, the future scales horizontally:

IoT Integration: The next iteration of Agnostic won't just look at the machine; it will talk to it. Reading error codes via Bluetooth/Wi-Fi to give the agent diagnostic data before the tech even opens the casing. Expanding the Hive Mind: As the collective knowledge base grows with thousands of verified interventions across electrical, HVAC, plumbing, and automotive domains, Agnostic will become the world's most reliable repair assistant — not because of a larger language model, but because of real human experience encoded into its memory. The Hardware Jump: Transitioning the Flutter mobile interface into native applications for AR Smart Glasses, fulfilling the ultimate vision: a completely invisible, highly intelligent companion for the physical world. $$ \text{Agnostic: The bridge between Artificial Intelligence and Manual Labor.} $$

Built With

  • agent-development-kit
  • cloud-firestore
  • fastapi
  • flutter
  • gemini-2.0-flash-live
  • gemini-3.1-flash
  • google-cloud
  • google-cloud-run
  • google-gemini
  • google-genai-sdk
  • google-searc
  • mercadolibre-api
  • python
  • websockets
Share this project:

Updates