Inspiration

Home devices — like fire alarms, washing machines, and thermostats — often fail, leaving users clueless on how to diagnose or fix the issue. We noticed that not everyone has immediate access to experts or detailed manuals. Inspired by the real-world need for simple, quick troubleshooting, we built FixIt Scholar — an AI-powered assistant that helps anyone diagnose, visualize, and repair common household device problems.

What it does

FixIt Scholar analyzes an image, video, audio clip, or short text description of a malfunctioning device. It:

Diagnoses the problem using Gemini 2.5 (Google's latest AI model).

Generates a detailed, step-by-step repair guide.

Creates simple sketch-style visualizations for each repair step.

Locates nearby contractors if professional help is needed.

Suggests and finds replacement parts available at nearby stores.

It delivers everything in a clean, easy-to-consume JSON format for seamless integrations.

How we built it

We used Google Gemini 2.5 Flash for problem diagnosis and repair guide generation.

Gemini 2.0 Flash Image generation models were used for creating simple repair illustrations.

Integrated Google Maps Places API to find nearby contractors and stores.

Designed a modular Python backend that combines:

Media input parsing (image, video, audio)

Diagnosis & sketch generation

Location services for contractors and parts

Clean JSON output for easy API usage.

We used Click for building a CLI interface for easy testing and expansion.

Challenges we ran into

API Stability: The generative models sometimes produced non-JSON outputs; we had to implement retries and strict parsing logic.

Timeouts with Media Inputs: Sending large video files caused timeouts — we optimized input streaming.

Location Handling: Auto-detecting user location via IP and falling back to manual input took fine-tuning.

Data Cleanliness: Ensuring that missing information (like unavailable ratings) didn’t result in "null" or "unknown" outputs — instead, we used controlled default values.

Accomplishments that we are proud of

Achieved full multimodal support — images, videos, audio, and text.

Fully structured JSON responses without manual intervention.

Local contractor and parts discovery directly linked to diagnosis.

Recovered gracefully even if the AI initially failed to diagnose — automatic retries improved success rates.

What we learned

How to design for multimodal AI — merging vision, audio, and text seamlessly.

Fine-tuning AI prompt engineering for strict JSON outputs.

Building resilient agentic architectures with fallback mechanisms.

Handling real-world issues like API errors, media processing bottlenecks, and geolocation fallbacks.

What's next for FixIt Scholar

Add AR visualizations to overlay repair instructions live onto broken devices.

Extend to cover appliances manuals integration for more specialized help.

Build a mobile app version for broader accessibility.

Add voice conversation support (multimodal Gemini audio APIs).

Support multi-language outputs to reach non-English speakers.

Built With

Share this project:

Updates