Inspiration
As students living abroad on a budget, when something breaks - whether it's a bicycle derailleur, a pair of glasses, or a door handle - we panic. Professional repairs are often too expensive, and buying a replacement contributes to the global problem of e-waste. But what if Gemini could instantly generate a custom manual for the exact broken object sitting in front of us, and assist us every step of the way? This is how the idea for FixIt came to be.
What it does
FixIt is an autonomous repair agent. It utilizes a multi-stage agentic workflow.
- Instant Diagnosis: Snap a photo of any broken item. FixIt identifies the object, diagnoses the likely issue, and assesses safety risks immediately.
- Visual Defect Highlighting: The AI doesn't just tell you what's wrong; it draws on your photo to pinpoint the exact defect.
- Adaptive Repair Plans: It generates a step-by-step guide tailored to your specific model.
- AI Technical Illustrations: For complex steps, it generates clear, high-contrast technical diagrams (like IKEA instructions but for your specific mess).
- Further Assistance: Whenever you get stuck, you can snap another picture of the object within FixIt to get further assistance.
- Fact-Checked by the Web: Using Google Search Grounding, it verifies its advice against real user manuals and forums to reduce hallucinations.
- Community: Once a repair has been completed successfully, you get the choice to share it with our community for future reuse!
How we built it
We built FixIt as a modern full-stack application. We used Python (FastAPI) on the backend and Vite/React on the frontend, together with a SQLite database for our prototype. We also used Tailwind CSS for rapid styling.
The AI Brain: We used the Gemini API in multiple parts of our project:
- Gemini 3 Flash Preview handles the rapid image analysis and multi-turn troubleshooting. It also moderates any content before it is posted to the community.
- Gemini 2.5 Flash Image generates the technical illustrations for each step.
- Google Search Grounding is dynamically invoked to fetch manual URLs and verify specs.
Challenges we ran into
Our main struggle early on was visual consistency - it was hard to force the step-by-step image generation to stick to our specific model of the item. Initially, a lot of "image hallucinations" happened as well. In order to mitigate that, we used the initial image snapped by the user as a reference image. We also used Google Search Grounding using the Gemini 3 API to look up a manual for the specific model being fixed.
Accomplishments that we're proud of
We're proud of developing an app that we can use in our everyday lives. FixIt can reduce the stress of being a student or being on a budget, letting you focus on what is really important. We're also particularly proud of finding multiple ways to make the process of using FixIt more visually consistent and trustworthy.
What we learned
We learned that multimodal AI is the future of technical support. We also learned the importance of making the output as deterministic as possible for such tasks - a particularly important part in the development of multimodal AI would be to use search data and well-written prompts to verify its outputs. This would add a layer of trust that pure generative models sometimes lack.
What's next for FixIt
- Video Generation: Our original dream was to generate video loops. However, all of the problems that we already encountered using image generation were even more apparent when we attempted to generate videos instead. The output was inconsistent with the original object, and the videos often included "hallucinations". With the development of video generation, we truly believe that one day it will be possible to mitigate those issues and integrate it within FixIt.
- AR Overlay: Once video generation is more feasible for FixIt, we plan to use the camera feed to overlay the instructions directly onto the object in real-time.
Built With
- css
- fastapi
- html
- javascript
- python
- react-native
- sqlite
- tailwind
- vite
Log in or sign up for Devpost to join the conversation.