đź’ˇ Inspiration
We’ve all been there: staring at a "drawer of shame" filled with cracked phones and broken gadgets. For most, the barrier to DIY repair isn't a lack of tools—it's the fear of the unknown. One wrong tug on a fragile ribbon cable can turn a simple screen swap into a permanent paperweight.
I built Gemini-Mechanic because I believe everyone should have the confidence to fix their own gear. My goal was to create an AI agent that doesn't just "read a manual" to you, but actually sees the device in your hands and guides your hands through the process, making hardware repair accessible, safe, and sustainable.
🚀 What it does
Gemini-Mechanic is an immersive, multimodal repair assistant that acts as a master technician over your shoulder.
- Visual Component Recognition: Point your camera at a circuit board to instantly identify parts like capacitors, connectors, or specific screw types.
- Live Guided Troubleshooting: Leveraging the Gemini Multimodal Live API, the agent provides real-time voice instructions. If you’re stuck, you can ask, "Wait, which cable do I pull first?" and it will guide you visually.
- Safety Guardrails: The agent proactively identifies hazards, such as swollen Li-ion batteries or high-voltage areas, ensuring the user stays safe throughout the repair.
🛠️ How we built it
The project is built with a focus on low-latency interaction and high-precision spatial reasoning:
- The Brain: Gemini 1.5 Pro handles the complex task of identifying tiny hardware components and reasoning through repair steps.
- The Senses: I integrated the Multimodal Live API to allow for bidirectional video and audio streaming, enabling a hands-free "see-and-speak" experience.
- The Frontend: A responsive web application built with React, optimized for mobile use at a workbench.
- The Backend: Developed on Windows using Python, leveraging Google Cloud Vertex AI for robust model orchestration and API management.
đźš§ Challenges we ran into
- The Macro Problem: Most webcams and phone cameras struggle with the extreme close-ups needed for circuit boards. I had to refine the prompting to help Gemini reason through slightly blurry or low-light images common in DIY workspaces.
- Real-Time Latency: In a "Live" environment, timing is everything. Balancing the "Thinking" time of a large model with the need for immediate user feedback was a significant hurdle that required optimizing the video stream.
- Precision and Safety: Teaching the AI to distinguish between very similar-looking ribbon cables required careful context setting to ensure the instructions were 100% accurate and safe for the user.
🎉 Accomplishments that we're proud of
- Hands-Free Interaction: Successfully creating a workflow where a user never has to touch their screen with "greasy repair hands"—they can just talk to the agent.
- Safety First: Successfully implementing a detection system that prioritizes battery safety and electrical hazards.
- Democratizing Repair: Taking a complex task (like an iPhone 8 screen repair) and making it feel achievable for a beginner through the power of Multimodal AI.
🎓 What we learned
Building Gemini-Mechanic taught me that the future of AI isn't just in a chat box; it's in embodiment. Giving an AI "eyes" and a "voice" that works in tandem with a user’s physical actions bridges the gap between digital knowledge and physical skill. I also deepened my understanding of building high-utility, mobile-first tools on Windows using Google Cloud's ecosystem.
đź”® What's next for Gemini-Mechanic
The dream is a world where no gadget is "unfixable." Future updates will include:
- Expanded Knowledge: Adding diagnostics for household appliances and automotive repairs.
- Parts Marketplace: Integrating a feature to automatically link the user to the exact replacement parts identified during the diagnostic phase.
- Community Data: Allowing expert repairers to "teach" the agent new tricks to keep up with the latest hardware releases.
Log in or sign up for Devpost to join the conversation.