Inspiration
As tech enthusiast who has interacted with land surveyors, I know that field equipment like RTK rovers and Total Stations are incredible—until they throw an error code in the middle of a hot afternoon with no manual in sight. "The Remote Hands" was born from the need for a hands-free, voice-activated technical assistant that can "see" what the surveyor sees and provide instant troubleshooting without leaving the tripod.
What it does
The Remote Hands is a multimodal AI field agent. Using the Gemini 2.5 Flash Live API, it maintains a real-time video and audio link with the surveyor. It can:
- Identify Equipment: Recognize specific surveying instruments through the phone's camera.
- Voice Troubleshooting: Provide hands-free technical support while the surveyor's hands are busy with the gear.
- Agentic Research: Use a LangGraph-powered ReAct agent to search the live web for technical manuals and specific error codes, synthesizing a solution in seconds.
How we built it
- Backend: Python / FastAPI hosted on Google Cloud Run.
- AI Intelligence: Gemini 2.5 Flash (Live Native Audio) for the multimodal conversation and Gemini 2.0 Flash via LangChain for the reasoning agent.
- Orchestration: LangGraph with a DuckDuckGo Search tool for real-time internet access.
- Frontend: A responsive HTML5/JavaScript mobile interface using WebSockets for low-latency audio/video streaming.
Challenges we ran into
Integrating real-time, bi-directional PCM audio with a reasoning agent was the ultimate "final boss." We had to solve complex issues involving audio sample rate mismatches, WebSocket state management, and ensuring the "Internet Brain" (LangGraph) could respond fast enough to keep the voice conversation natural.
Accomplishments that we're proud of
Successfully establishing a stable, multimodal "Live" link that allows an AI to verbally guide a human through a physical task while researching the web in the background.
What we learned
Integrating APIs and backend services required careful debugging and clear architecture planning.
What's next for The Remote Hands
Integrating a Vector Database (RAG) containing the full PDF manual libraries of major manufacturers like Leica, Trimble, and Topcon for even more precise, offline-capable troubleshooting.
Log in or sign up for Devpost to join the conversation.