Inspiration
Every day, millions of people work with their hands - assembling furniture, repairing electronics, wiring circuits, fixing plumbing. Their hands are occupied, often dirty or gloved. The current options are terrible: pause what you're doing, pick up your phone with greasy fingers, type a one-handed query, scroll through irrelevant results, then try to remember the answer while you put the phone down and go back to work. Repeat 50 times per project.
We realized the text box doesn't just fail here - it's physically impossible to use when your hands are full. We wanted to build an AI that sees what you're working on and just helps - like having an experienced mentor looking over your shoulder.
What it does
GHOSTHAND is an AI-powered building mentor that sees what you're working on through uploaded photos and guides you in real-time. Three specialist agents work together through one natural voice:
- SPOTTER (Safety Monitor): Proactively checks every image for hazards - wrong wiring, exposed components, reversed polarity, sharp edges. Safety alerts always come first.
- GUIDE (Instructor): Gives clear, numbered step-by-step instructions. Tracks your project progress so it always knows what you've already done and what comes next.
- LOOKUP (Parts Expert): Identifies any component, tool, or material you show it. Uses Google Search to find datasheets, specifications, prices, and where to buy replacement parts.
The orchestrator coordinates all three agents and speaks like a calm, experienced craftsperson - "nice, that looks solid" instead of "the assembly meets quality standards."
How we built it
GHOSTHAND is built on Google's Agent Development Kit (ADK) with a multi-agent architecture:
Root Agent (GHOSTHAND): The orchestrator that receives user input and delegates to the right specialist agent. Uses Gemini 2.5 Flash for fast, accurate vision understanding.
Sub-Agent Architecture: Three specialized agents each with their own tools:
- SPOTTER uses a custom safety assessment tool
- GUIDE uses custom progress tracking tools (save/recall)
- LOOKUP uses Google Search for grounded, factual responses
Tool Separation: Each agent has only one type of tool to comply with Gemini's tool-mixing constraints. Google Search lives only on LOOKUP. Custom function tools live only on SPOTTER and GUIDE.
Deployment: Containerized with Docker and deployed to Google Cloud Run via Cloud Build and Artifact Registry. Vertex AI provides the Gemini model inference.
Vision Pipeline: Users upload photos via the ADK web interface. Gemini 2.5 Flash natively understands the image - reading text, identifying objects, assessing spatial relationships - no separate vision API needed.
Challenges we ran into
Tool mixing constraints: Gemini doesn't allow mixing Google Search with custom function tools on the same agent. We solved this by giving each sub-agent only one tool type and routing through the orchestrator.
Live streaming audio encoding: Browser audio capture produces 48kHz PCM, but the Live API expects 16kHz. Getting real-time voice working required understanding WebSocket audio pipelines deeply. We solved this by supporting both text+image mode and live audio mode.
Model selection for streaming: Live-compatible models (gemini-live-*) only work through the Live API, not the standard generateContent API. We had to carefully select different models for streaming vs text interactions.
Sub-agent model compatibility: In live streaming mode, ALL agents including sub-agents must use live-compatible model IDs. Standard models like gemini-2.5-flash crash when ADK tries to connect them to the Live API.
Accomplishments that we're proud of
GHOSTHAND reads real text off real objects: In our very first test, it identified a voltage tester, read "GERMANY 220-250V" off the label, spotted keys on a "ROYALTON" keyring, and described the decorative metal keychain - all from a single photo.
Multi-agent coordination works seamlessly: Three agents with different specializations and different tool types coordinate through one orchestrator to produce unified, natural responses. The user never sees the agent boundaries.
Safety-first architecture: SPOTTER checks every image for hazards before any other agent responds. This isn't a feature - it's a design principle baked into the orchestrator's instructions.
Grounded facts via Google Search: LOOKUP doesn't guess at specifications or prices. Every factual claim comes from Google Search, eliminating hallucinated datasheets or wrong pinout information - which could be genuinely dangerous in a hardware context.
Deployed and live on Google Cloud: GHOSTHAND runs on Cloud Run with Vertex AI inference, containerized and reproducible from a single deployment script
What we learned
Tool architecture matters more than prompt engineering: The biggest technical challenge wasn't writing good prompts - it was structuring which tools go on which agent to avoid API conflicts.
Multi-agent systems need clear delegation rules: Without explicit rules like "SPOTTER checks safety FIRST on every image", the orchestrator would sometimes skip safety checks. The instruction hierarchy is critical.
Gemini's vision is remarkably good: We expected it to identify large objects. We didn't expect it to read tiny text on a voltage tester label, identify brand names on keys, and describe the material of a keychain - all in one pass.
Google ADK dramatically simplifies agent orchestration: The framework handles session management, tool calling, and agent routing. We focused on agent behavior, not infrastructure plumbing.
Start simple, add complexity: Our final working version started as a single agent with no tools. We added sub-agents and tools one at a time, testing after each addition. This incremental approach saved hours of debugging.
What's next for GHOSTHAND
- Live video streaming: Real-time camera feed so GHOSTHAND continuously watches as you work, instead of requiring photo uploads
- Voice interaction: Hands-free voice commands using Gemini Live API native audio for truly hands-free building guidance
- AR overlay: Annotated arrows and labels overlaid on the camera feed showing exactly where to connect wires or place components
- Project templates: Pre-loaded step-by-step guides for common projects (build a PC, wire a light switch, assemble IKEA furniture) that GHOSTHAND can follow along with
- Community sharing: Share your build progress and tips with other GHOSTHAND users
Log in or sign up for Devpost to join the conversation.