đź’ˇ Inspiration

We’ve all been there: staring at a "drawer of shame" filled with cracked phones and broken gadgets. For most, the barrier to DIY repair isn't a lack of tools—it's the fear of the unknown. One wrong tug on a fragile ribbon cable can turn a simple screen swap into a permanent paperweight.

I built Gemini-Mechanic because I believe everyone should have the confidence to fix their own gear. My goal was to create an AI agent that doesn't just "read a manual" to you, but actually sees the device in your hands and guides your hands through the process, making hardware repair accessible, safe, and sustainable.

🚀 What it does

Gemini-Mechanic is an immersive, multimodal repair assistant that acts as a master technician over your shoulder.

  • Visual Component Recognition: Point your camera at a circuit board to instantly identify parts like capacitors, connectors, or specific screw types.
  • Live Guided Troubleshooting: Leveraging the Gemini Multimodal Live API, the agent provides real-time voice instructions. If you’re stuck, you can ask, "Wait, which cable do I pull first?" and it will guide you visually.
  • Safety Guardrails: The agent proactively identifies hazards, such as swollen Li-ion batteries or high-voltage areas, ensuring the user stays safe throughout the repair.

🛠️ How we built it

The project is built with a focus on low-latency interaction and high-precision spatial reasoning:

  • The Brain: Gemini 1.5 Pro handles the complex task of identifying tiny hardware components and reasoning through repair steps.
  • The Senses: I integrated the Multimodal Live API to allow for bidirectional video and audio streaming, enabling a hands-free "see-and-speak" experience.
  • The Frontend: A responsive web application built with React, optimized for mobile use at a workbench.
  • The Backend: Developed on Windows using Python, leveraging Google Cloud Vertex AI for robust model orchestration and API management.

đźš§ Challenges we ran into

  • The Macro Problem: Most webcams and phone cameras struggle with the extreme close-ups needed for circuit boards. I had to refine the prompting to help Gemini reason through slightly blurry or low-light images common in DIY workspaces.
  • Real-Time Latency: In a "Live" environment, timing is everything. Balancing the "Thinking" time of a large model with the need for immediate user feedback was a significant hurdle that required optimizing the video stream.
  • Precision and Safety: Teaching the AI to distinguish between very similar-looking ribbon cables required careful context setting to ensure the instructions were 100% accurate and safe for the user.

🎉 Accomplishments that we're proud of

  • Hands-Free Interaction: Successfully creating a workflow where a user never has to touch their screen with "greasy repair hands"—they can just talk to the agent.
  • Safety First: Successfully implementing a detection system that prioritizes battery safety and electrical hazards.
  • Democratizing Repair: Taking a complex task (like an iPhone 8 screen repair) and making it feel achievable for a beginner through the power of Multimodal AI.

🎓 What we learned

Building Gemini-Mechanic taught me that the future of AI isn't just in a chat box; it's in embodiment. Giving an AI "eyes" and a "voice" that works in tandem with a user’s physical actions bridges the gap between digital knowledge and physical skill. I also deepened my understanding of building high-utility, mobile-first tools on Windows using Google Cloud's ecosystem.

đź”® What's next for Gemini-Mechanic

The dream is a world where no gadget is "unfixable." Future updates will include:

  • Expanded Knowledge: Adding diagnostics for household appliances and automotive repairs.
  • Parts Marketplace: Integrating a feature to automatically link the user to the exact replacement parts identified during the diagnostic phase.
  • Community Data: Allowing expert repairers to "teach" the agent new tricks to keep up with the latest hardware releases.

Built With

Share this project:

Updates