Gemini-Mechanic

💡 Inspiration

We’ve all been there: staring at a "drawer of shame" filled with cracked phones and broken gadgets. For most, the barrier to DIY repair isn't a lack of tools—it's the fear of the unknown. One wrong tug on a fragile ribbon cable can turn a simple screen swap into a permanent paperweight.

I built Gemini-Mechanic because I believe everyone should have the confidence to fix their own gear. My goal was to create an AI agent that doesn't just "read a manual" to you, but actually sees the device in your hands and guides your hands through the process, making hardware repair accessible, safe, and sustainable.

🚀 What it does

Gemini-Mechanic is an immersive, multimodal repair assistant that acts as a master technician over your shoulder.

Visual Component Recognition: Point your camera at a circuit board to instantly identify parts like capacitors, connectors, or specific screw types.
Live Guided Troubleshooting: Leveraging the Gemini Multimodal Live API, the agent provides real-time voice instructions. If you’re stuck, you can ask, "Wait, which cable do I pull first?" and it will guide you visually.
Safety Guardrails: The agent proactively identifies hazards, such as swollen Li-ion batteries or high-voltage areas, ensuring the user stays safe throughout the repair.

🛠️ How we built it

The project is built with a focus on low-latency interaction and high-precision spatial reasoning:

The Brain: Gemini 1.5 Pro handles the complex task of identifying tiny hardware components and reasoning through repair steps.
The Senses: I integrated the Multimodal Live API to allow for bidirectional video and audio streaming, enabling a hands-free "see-and-speak" experience.
The Frontend: A responsive web application built with React, optimized for mobile use at a workbench.
The Backend: Developed on Windows using Python, leveraging Google Cloud Vertex AI for robust model orchestration and API management.

🚧 Challenges we ran into

The Macro Problem: Most webcams and phone cameras struggle with the extreme close-ups needed for circuit boards. I had to refine the prompting to help Gemini reason through slightly blurry or low-light images common in DIY workspaces.
Real-Time Latency: In a "Live" environment, timing is everything. Balancing the "Thinking" time of a large model with the need for immediate user feedback was a significant hurdle that required optimizing the video stream.
Precision and Safety: Teaching the AI to distinguish between very similar-looking ribbon cables required careful context setting to ensure the instructions were 100% accurate and safe for the user.

🎉 Accomplishments that we're proud of

Hands-Free Interaction: Successfully creating a workflow where a user never has to touch their screen with "greasy repair hands"—they can just talk to the agent.
Safety First: Successfully implementing a detection system that prioritizes battery safety and electrical hazards.
Democratizing Repair: Taking a complex task (like an iPhone 8 screen repair) and making it feel achievable for a beginner through the power of Multimodal AI.

🎓 What we learned

Building Gemini-Mechanic taught me that the future of AI isn't just in a chat box; it's in embodiment. Giving an AI "eyes" and a "voice" that works in tandem with a user’s physical actions bridges the gap between digital knowledge and physical skill. I also deepened my understanding of building high-utility, mobile-first tools on Windows using Google Cloud's ecosystem.

🔮 What's next for Gemini-Mechanic

The dream is a world where no gadget is "unfixable." Future updates will include:

Expanded Knowledge: Adding diagnostics for household appliances and automotive repairs.
Parts Marketplace: Integrating a feature to automatically link the user to the exact replacement parts identified during the diagnostic phase.
Community Data: Allowing expert repairers to "teach" the agent new tricks to keep up with the latest hardware releases.

Built With

Updates

Maame Afua A.P Fordjour started this project — Mar 03, 2026 03:05 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.