Inspiration

As hackers, we were inspired by the passion shown by the inspectors, mainly by how much he wanted to ensure that inspecting was as efficient as possible, with inspection numbers being down. This app was created to make it as easy as possible for an inspector to walk around a CAT excavator, find out what is wrong, and document it.

What it does

CATalytic Vision uses our own AI localizer to determine whether a given part is within the camera frame, and an AR mesh highlights the portion of the excavator being inspected. The pictures are then sent to our VLM, which checks for issues with that specific part and categorizes it as Pass, Monitor, or Fail. While this is happening, the inspector can talk generally about anything they notice that is relevant to that specific part. Once finished, the inspector can simply say the keyword "next" to move to the next part they want to inspect, without touching the screen at all. At the end, we use the OpenAI API to summarize the transcription into notes for a full inspection report that the user can easily access.

How we built it

We built the UI in Figma and then used Android Studio to convert it into a functional mobile application. Using our knowledge of Kotlin and coding agent ClaudeCode, we took the data from our AI Localizer (trained on over 10,000 images of the different components) to show a mesh overlay of the specific part and labeling of the parts. Then, our VLM, hosted on the cloud, trained on images (starting from the base images and then trained on images generated by Gemini), calculates the grade of the component. We use Android ARKit and Android Speech Detection to create the voice components and the OpenAI API to convert those transcriptions into a well-made report.

Challenges we ran into

One of the big challenges was understanding the data the AI Localizer needed from the camera exactly. At first, the inverse bitmap would send a fully black image to the model, and thus, the picture wouldn't be analyzed properly. Using a logcat to see the detection value and then dropping the minimum value to detect allowed us to debug this issue.

Accomplishments that we're proud of

We are extremely proud that we managed to build such a polished UI in such a short amount of time, and proud that our trained models are significantly better than GPT, with over 85% accuracy in detecting locations on our AI Localizer and a great accuracy with grading with our VLM.

What we learned

We learned a lot about integrating backend AI models with our frontend development, and it was all of our first time utilizing a mostly AR library.

What's next for CATalytic Vision

Next, we would love to add more parts to our excavator for analysis, iOS compatibility, and a UI feature that allows the generated reports to be fully sent to the PDF checklist, ready for printing.

Built With

  • kotlin
Share this project:

Updates