Butterfly: Realtime Intelligence for CAT Inspections

Main
Logo
Problem
Solution

Inspiration

Field inspections are one of the most critical parts of keeping heavy equipment operational, but the workflow hasn't changed much in decades. Technicians walk around a machine with a clipboard or a clunky mobile form, manually tapping through every component while their hands are dirty, their eyes are on the equipment, and their attention is split. One missed field, one rushed entry, one overlooked component can mean a machine goes back into operation with an unresolved issue. When we looked at the CAT Inspect challenge, that friction was the thing that stood out most. The problem isn't that technicians don't know how to do their job. It's that the tool they're given to document it actively gets in the way of doing it. A form that demands your full attention is the wrong interface for someone whose full attention should be on the machine in front of them.

Meet Butterfly.

Instead of asking a technician to stop, look at a screen, and manually log every finding, we wanted to let them just walk and talk. Butterfly listens to what the technician describes and simultaneously watches through the camera, analyzing what it sees in real time to independently verify each component's condition. If a technician says a hydraulic hose looks fine, but the camera catches visible cracking, Butterfly flags it, becoming every inspector's second set of eyes and ears.

The goal was to turn a burdensome documentation process into something that feels less like paperwork and more like a peer on every walkthrough: one that never gets tired, never rushes, and never lets something slip through.

What it does

Butterfly is a multimodal AI co-pilot that transforms the entire inspection workflow from start to finish. When a technician begins a walk-around, Butterfly activates and stays live throughout the inspection. The technician just talks, describing what they see on each component in plain language, and Butterfly instantly updates the inspection form in real time without any manual input. Every field populates automatically as the conversation happens, visible on screen as it occurs. Butterfly doesn't just take the technician's word for it. As components are described, Butterfly simultaneously analyzes live video feed from the camera, independently evaluating what it sees against what the technician reports. If there's a discrepancy like a component flagged as passing that shows visible damage on camera, Butterfly catches it. Technicians can also upload video footage of a specific component for deeper analysis, which is how we demonstrate this capability in our demo. When the technician wants a second opinion or wants to ask a question mid-inspection, they can just go "Hey Butterfly" and ask anything: what a finding means, what's still left to inspect, what the current status of a specific component is or anything they're confused about. Butterfly integrates a live voice agent that persists memory across sessions, responding in one or two sentences to every question and gets out of the way so the technician can keep moving. Once the inspection is complete, Butterfly generates a full PDF report documenting every component, its status, all recorded notes, and the timestamp, all ready to hand off to a supervisor or file for compliance. For every component flagged as FAIL or MONITOR, Butterfly scrapes the CAT Product Shop and feeds direct links to the exact replacement parts your machine needs, turning an inspection finding into an immediate service action. A cracked hydraulic hose immediately goes from inspection to replacement part on its way with just the tap of a button.

How we built it

Butterfly is a full-stack multimodal AI inspection engine built on React and TypeScript, powered by Gemini 2.5 Flash and deployed for real-time field use.

The inspection experience runs on a hands-free feedback loop. The operator walks the machine, speaking naturally while the phone streams live video. In parallel, continuous speech recognition captures transcripts with wake word control, periodic video frame capture builds a visual evidence timeline, and Gemini 2.5 Flash processes both streams through a multimodal pipeline that maps language, visual context, and jobsite slang directly into a structured 38-item CAT 320 inspection form using tool calling for guaranteed schema output.

The AI does more than transcribe. Every finding is cross-validated across three sources: what the inspector says, what the camera sees, and what VisionLink telemetry reports. If a verbal pass conflicts with sensor data, Butterfly flags it with evidence, including value, unit, and timestamp, creating an auditable inspection record.

The architecture uses a React 18 frontend with real-time streaming, Supabase Edge Functions for multimodal analysis and post-inspection intelligence, Firecrawl for parts lookup, and a sequential processing queue to prevent data loss during long sessions. All outputs use typed function calling to ensure strict schema compliance.

After submission, the full session feeds into a debrief engine that generates health scores, safety clearance decisions, root cause analysis, prioritized work orders, predictive maintenance timelines, and parts recommendations. Butterfly transforms a routine walkaround into structured, actionable maintenance intelligence.

Accomplishments that we're proud of

We delivered a complete ground-up redesign of the CAT inspection app, keeping in mind the concerns of real inspectors. Our interface is built for real field conditions and a complete hands-free workflow. The UI/UX is upgraded based on the reviews of the existing design and never sacrifices clarity or control. On the intelligence layer, we achieved highly accurate, real-time form completion through all of our data sources. When one fails or detects failure in the inspection, the other comes in to verify, all built on the publicly available CAT schemas. Our system produces structured inspections with evidence tagging and safety metrics, going beyond any typical AI summaries. Lastly, we are especially proud that the system operates as a continuous reasoning engine rather than a post-hoc AI wrapper, demonstrating a production-ready architecture that we are genuinely excited to present.

What we learned

We learned A LOT building Butterfly. From integrating voice agents that understand and integrate into external apps to fine-tuning our speech input for the lowest possible latency to creating a visual analysis system that maximizes inspector accuracy for every single component, we learned how to push ourselves beyond what we thought was possible and create a product technicians can ACTUALLY USE in the field.

What's next for Butterfly: Realtime Intelligence for CAT Inspections

Butterfly is built to scale without rebuilding. Right now, we go deep on one machine type — getting the inspection logic accurate, the voice pipeline reliable, and the outputs structured and deployable. That foundation is intentionally schema-driven, meaning adding support for a new machine type is as simple as dropping in a new checklist schema and documentation pack. The same capture and reasoning loop handles the rest automatically.

On the ecosystem side, our outputs are already structured in a way that maps naturally onto CAT's existing telematics infrastructure. The integration path to VisionLink and Product Link is straightforward — our telemetry layer is already shaped to match it, which means fleet-wide deployment doesn't require rearchitecting anything.

The capture layer is also fully modular. The same pipeline that currently ingests camera and audio from a phone can take input from Meta smart glasses instead — giving technicians a truly hands-free inspection experience with zero changes to the backend or report format. Same reasoning, same outputs, no phone required.

Long term, Butterfly becomes a predictive intelligence layer. By syncing ECM data from the machine directly to the cloud, every inspection feeds into a living dashboard that tracks fleet health over time, surfaces patterns across machines, and flags issues before they become failures. The inspection stops being a one-time event and becomes part of a continuous analytics loop across an entire fleet.

Built With

elevenlabs
fastapi
firecrawl
gemini
react
supabase
typescript

Submitted to

HackIllinois 2026

Created by

I worked on the data layer, blueprint for the product, and the Supabase integration, as well as the pitch and presentation.

Yash Yardi
CS & Physics @ UIUC
I worked on data with real time charting, data presentation, UI/UX design, connecting Gemini and ElevenLab's API to data for cross reference, video scripting, and much more

Bobby Mandell
I set up the ElevenLabs voice agent integration, setting up client tools for persistent memory. I also tuned the Gemini model to reduce latency and increasing efficiency during the inspection process, as well as working on the video script.

Aryan Pradhan
Shrishant Hattarki