About the Project

Inspiration

Many small problems in our everyday environments often go unnoticed until they become bigger issues. Things like cluttered workspaces, safety risks, maintenance problems, or inefficient setups are common but people may not always know how to identify or fix them quickly.

With the rapid progress of multimodal AI, we wondered: what if your phone camera could act like a diagnostic assistant for the real world?

This idea led to Fixer.ai, an AI-powered assistant that analyzes your surroundings using the camera and suggests actionable fixes.

The goal was to create a simple tool that can turn visual input from the real world into practical recommendations, making AI genuinely useful in everyday situations.


What it Does

Fixer.ai is a mobile web application that uses AI to detect problems in a user's environment and recommend solutions.

The workflow is simple:

  1. The user opens the Fixer.ai web app on their mobile device.
  2. The camera preview appears.
  3. The user points the camera at a scene and taps Scan Problem.
  4. The captured image is sent to a backend running on Google Cloud Run.
  5. The backend sends the image to Gemini 2.5 Flash for multimodal analysis.
  6. Gemini identifies potential issues and generates fix recommendations.
  7. The results are displayed in the user interface.

Instead of just describing what it sees, the AI focuses on identifying problems, severity, and suggested fixes.


How We Built It

Fixer.ai was built as a lightweight cloud-based system using the following components:

Frontend

  • HTML
  • CSS
  • JavaScript
  • Browser Camera API (getUserMedia)

The frontend provides a mobile-friendly interface that captures images from the user's camera and sends them to the backend.

Backend

  • Node.js
  • Express.js
  • Multer (for handling image uploads)

The backend processes incoming images and forwards them to the Gemini API.

AI Integration

  • Gemini 2.5 Flash
  • Google Generative Language API

Gemini performs multimodal reasoning on the uploaded image and generates structured fix recommendations.

Cloud Infrastructure

  • Google Cloud Run

The backend is deployed using Cloud Run, providing a serverless environment that automatically handles scaling and deployment.

Source Control

  • GitHub

The entire project codebase is managed in a public GitHub repository.


Challenges We Ran Into

1. Handling Camera Input Across Devices

Accessing the mobile camera reliably across different browsers required careful handling of the browser's media APIs and fallback mechanisms.

2. Formatting AI Responses

The raw responses from Gemini often contained structured JSON wrapped in code blocks. Additional backend processing was needed to convert the AI output into clean, user-friendly text.

3. Deploying the Backend to Cloud Run

Configuring the backend container correctly for Cloud Run required ensuring the application listened on the correct port and handled environment variables properly.

4. Ensuring a Smooth Mobile Experience

Since the application runs entirely in a mobile browser, the UI needed to remain simple, responsive, and easy to use while still demonstrating the AI functionality clearly.


What We Learned

Building Fixer.ai provided several valuable insights:

  • Multimodal AI is powerful for real-world applications. Gemini's ability to analyze both images and text makes it well suited for practical problem detection.
  • Serverless infrastructure simplifies deployment. Google Cloud Run made it easy to deploy and run the backend without managing servers.
  • Prompt engineering is critical. Carefully designing prompts ensured the AI produced structured outputs such as problem descriptions, severity ratings, and fix steps.
  • Simple interfaces can unlock powerful AI capabilities. A lightweight mobile web interface was enough to demonstrate the full potential of the system.

What's Next for Fixer.ai

There are several possible directions to expand the project:

  • Real-time video analysis instead of single image scans
  • Augmented reality overlays highlighting detected issues
  • A fix history log to track previously identified problems
  • Multilingual support for broader accessibility
  • Integration with smart home or IoT devices

Fixer.ai demonstrates how multimodal AI can move beyond chat interfaces and become a practical assistant for the physical world.

Built With

Share this project:

Updates