Fixit: The Proactive AI Handyman for Snap Spectacles
Our Inspiration
The spark for Fixit came from our personal experience as international students. Moving to a new country meant setting up a home from scratch without a local support system or family nearby to help with simple repairs. We found that traditional video tutorials are nearly impossible to follow when your hands are covered in grease or you're balancing on a ladder. We needed an expert who could see what we see and talk us through the process in real-time.
What it does
Fixit is a proactive AI Handyman for Snap Spectacles. It provides a hands-free, heads-up repair experience by streaming live video and audio to an AI agent.
Real-time Guidance: It identifies tools and components, offering step-by-step instructions through spatial audio.
Proactive Safety: It doesn't just wait for questions; it monitors the environment for hazards like exposed wires or incorrect tool usage and interrupts the user with immediate [SAFETY_ALERT] warnings.
How we built it
Gemini said Inspiration The spark for Fixit came from our personal experience as international students. Moving to a new country meant setting up a home from scratch without a local support system or family nearby to help with simple repairs. We found that traditional video tutorials are nearly impossible to follow when your hands are covered in grease or you're balancing on a ladder. We needed an expert who could see what we see and talk us through the process in real-time.
What it does
Fixit is a proactive AI Handyman for Snap Spectacles. It provides a hands-free, heads-up repair experience by streaming live video and audio to an AI agent.
Real-time Guidance: It identifies tools and components, offering step-by-step instructions through spatial audio.
Proactive Safety: It doesn't just wait for questions; it monitors the environment for hazards—like exposed wires or incorrect tool usage—and interrupts the user with immediate [SAFETY_ALERT] warnings.
Visual Aid: It can send reference diagrams or highlighted markers back to the Spectacles' display to ensure the user is looking at the right part.
How we built it
We architected a high-speed pipeline connecting the wearable hardware to a sophisticated cloud brain:
Frontend (Lens Studio): We used TypeScript to build a custom bridge that captures 16kHz PCM audio and JPEG video frames, streaming them over WebSockets.
Backend (FastAPI): A Python-based agent manages the session state and handles concurrent processing of user queries and safety monitoring.
AI Brain: We utilized Gemini 3 Pro Preview, leveraging its massive multimodal window and new thinking_config for deep reasoning. We tuned the "thinking level" to be rapid for the safety loop, ensuring interventions happen in sub-second timeframes.
TTS Optimization: Because of Spectacles' payload limits, we built a custom chunking and queuing system in JavaScript to deliver smooth, natural-sounding voice responses.
Challenges we ran into
The biggest technical hurdle was maintaining low latency while processing high-resolution multimodal data. We also faced a significant pivot regarding deployment: our original architecture included a complex vector database for manual retrieval, but we found it difficult to host such a resource-heavy stack in a cost-efficient manner. We had to strip back non-essential services and optimize our code into a "Thin Client" model that prioritized the raw reasoning power of Gemini 3 over auxiliary microservices.
Accomplishments that we're proud of
We are incredibly proud of the Proactive Safety Loop. It represents a shift from "AI as a chatbot" to "AI as a guardian." Seeing the agent successfully identify a "danger" in the video feed and interrupt the user's speech with a life-saving warning was our biggest "eureka" moment. We also succeeded in creating a seamless voice-to-voice experience on hardware with very strict memory and data constraints.
What we learned
We learned the true value of multimodal reasoning. Gemini 3 Pro doesn't just "see" an image; it understands the spatial relationship between a tool and a hand. We also learned how to build for "glanceable" UI, realizing that in an MR handyman context, less is more, audio is the primary interface, and visuals should only be used for critical confirmation.
What's next for Fixit
The next step is moving from 2D overlays to Spatial 3D Anchors, allowing Fixit to "draw" directly on the physical object the user is holding. We also plan to use Gemini’s 1M+ context window to ingest a library of thousands of official appliance manuals, allowing Fixit to provide factory-spec repair paths for any device it sees, from a 1990s dishwasher to the latest smart fridge. We also want to add visual aids to it for better user understanding.
Built With
- javascript
- python
- render
- spectacles
- typescript
- websockets
Log in or sign up for Devpost to join the conversation.