Inspiration

Unplanned equipment downtime costs industrial manufacturers $50 billion annually. Caterpillar's CAT Inspect app collects over 6 million inspections a year where operators walk around a machine, snap a photo, and manually grade each component red, yellow, or green. No computer vision. No analysis. Completely subjective. If a hydraulic hose was weeping oil three weeks ago on a different machine and eventually burst into a $2,800 repair plus environmental cleanup, that knowledge dies in a PDF somewhere. The next operator inspecting a similar hose on a different machine has no way of knowing. Meanwhile, 75% of hydraulic system failures trace back to issues that were visible but went unactioned, and the construction industry faces a 500,000-worker shortage with 70% of institutional knowledge lost when experienced technicians retire. CAT Inspect captures rich visual data but performs zero AI analysis on it. CAT's new AI Assistant provides conversational intelligence but doesn't touch inspection photos. We saw the gap between these two products and asked: what if every inspection photo was automatically analyzed, cross-referenced against every past inspection across the entire fleet, and checked against real federal safety regulations — instantly?

What it does

Inspector is an AI-powered inspection assistant that transforms a single photo into a data-backed diagnosis with fleet-wide institutional memory.

How we built it

We built the backend using FAST API. We also used Gemini 3.0 models in a unique way to solve our problem. Fleet Memory: Actian VectorAI DB as our vector database, storing inspection embeddings for semantic similarity search. We built a hybrid search pipeline that fuses: CLIP image embeddings for visual similarity (finding photos that look like the current issue) Text embeddings from inspection descriptions for contextual matching Frontend was Vanilla JS.

Challenges we ran into

Hybrid embedding fusion: Getting CLIP image embeddings and text embeddings to play nicely together in the same vector space required careful normalization and weighting. Pure image search returned visually similar but contextually irrelevant results (e.g., any photo with yellow tint). Pure text search missed visual patterns. Finding the right fusion ratio was iterative.

Accomplishments that we're proud of

I am proud that the fleet memory actually works. Uploading a photo of oil-weeping hoses and watching it surface a past inspection from a completely different machine that shows the escalation to catastrophic failure is genuinely powerful. This is intelligence that doesn't exist in any current Caterpillar product.

What we learned

The gap is real. Our research confirmed that CAT Inspect performs zero AI analysis on inspection photos and Caterpillar's new AI Assistant doesn't touch the inspection workflow. This isn't a solution looking for a problem but it's a clear, unfilled gap in a $50B problem space. Vector databases are the unlock. The moment we had CLIP embeddings in VectorAI and could query "show me everything that looks like this" across fleet history, the product went from "another AI wrapper" to something with genuine compound intelligence. Every inspection makes the next one smarter. Accuracy over impressiveness. We spent hours validating compliance citations, verifying Cat part numbers, and confirming maintenance intervals against real data. The extra effort to make everything verifiable was worth it.

What's next for Inspector

Multi-photo per inspection — allowing 2–4 angles per component so the AI catches things a single angle misses, like a crack only visible from the side. And more data from the CAT team to improve our ways. Long term : Every inspection makes the fleet smarter, every operator benefits from the institutional knowledge of every other operator, and no machine ever breaks down from something that was already visible.

Built With

Share this project:

Updates