Cat Ready

Voice-first, multimodal pre-op inspection assistant for heavy equipment. Built for HackIllinois 2026 in collaboration with Caterpillar.

Inspiration

Pre-start inspections are mandatory for safety and compliance, but the tools operators use get in the way. Paper checklists get skipped or filled out in the cab. Digital forms force excessive tapping and scrolling. Either way, there’s little proof that someone actually walked the machine and looked. We wanted a path where the operator can move around the equipment, speak naturally, and only pull out the phone when they need to document something with a photo—while the system turns that into a structured, auditable record. The inspiration was simple: make inspections executable the way people already want to do them—by talking and, when it matters, showing.

What it does

Cat Ready turns a daily safety and maintenance checklist into a guided, voice-first flow with optional photos.

Checklist-driven flow. The app walks the operator through a single equipment profile (e.g. CAT 982 Medium Wheel Loader) step by step—tires, bucket, drivetrain, fluids, cab, engine, and so on—so nothing is skipped.
Voice-first capture. For each step, the operator records a short utterance in the browser (e.g. “Tires look good, no cuts or low pressure”). That audio is sent to the backend, transcribed with speech-to-text, and stored with the step.
Optional images. When something is wrong or uncertain, the operator can attach one or more photos. Those are sent with the same step and analyzed separately.
AI evaluation. The backend combines the transcript and any image descriptions, then uses an LLM to decide PASS, FAIL, or UNSURE and to produce a short, human-readable reason. That gives both a clear outcome and an audit trail.
Results and QR. The inspection is persisted. The operator gets a results view with overall status and step-by-step outcomes, plus a QR code that can be scanned later to pull up the same record—useful for handoff and compliance.

How we built it

Backend: Node.js and Express for a REST API that accepts multipart requests (audio + images). We use SQLite (better-sqlite3) for inspections and steps, and the OpenAI API for three jobs: Whisper (speech-to-text), GPT-4o vision (image description), and GPT-4o-mini (PASS/FAIL/UNSURE evaluation). Multer handles file uploads; we keep orchestration in small service modules (e.g. stt.js, vision.js, llm.js, processStep.js). The backend is deployed as a single Express app on Vercel with Root Directory set to backend.

Frontend: Next.js 15 with the App Router, React 19, and TypeScript. We use Tailwind for layout and Framer Motion for transitions. The inspect flow is a linear sequence: machine selection → step-by-step capture (record, optional photos, submit) → results with QR. We also generate a QR code for the completed inspection and provide a scanner so users can open a past inspection by scanning. The frontend talks to the backend only over HTTP using NEXT_PUBLIC_API_URL. It’s deployed as a separate Vercel project with Root Directory set to frontend.

Deployment: Two Vercel projects from the same repo—one for the Next.js app, one for the Express API. Each project has its own Root Directory. The frontend learns the backend URL via the NEXT_PUBLIC_API_URL env var, so the two deployments communicate with a single config value.

Challenges we ran into

Backend on Vercel. We first tried a catch-all API route (api/[[...path]].js) that required and re-exported the Express app. Vercel failed with “No entrypoint found in output directory: N/A.” It was treating the project as an API-routes-only build and looking for an entrypoint in the build output, but we have no build step. We fixed it by dropping the api/ handler and using zero-config Express instead: app.js at the backend root with module.exports = app, empty buildCommand in backend/vercel.json, and the Express framework preset. We also hit “Function Runtimes must have a valid version” when we set runtime: "nodejs20.x" in vercel.json. For Node, the right approach is to leave runtime out of vercel.json and use the Node version from Project Settings or from package.json engines. We removed the runtime from the config and kept engines.node in backend/package.json.

Two deployments, one repo. It wasn’t obvious whether Root Directory should be backend or frontend. We have both. The answer was to use two Vercel projects—one for the Next.js app, one for the Express API—each with its own Root Directory. The frontend doesn’t discover the backend; it needs the API URL. We already had NEXT_PUBLIC_API_URL in the frontend; we set it in the frontend project’s env to the backend’s deployment URL. No code change, just configuration.

Repo organization. We moved root-level media/ and images/ under archive/ so the tree stays clean. We updated the backend media path and README accordingly.

Serverless limits. On Vercel, the filesystem is read-only except /tmp, and instances are ephemeral. SQLite and writing uploads to disk work for local dev and demos but won’t persist on Vercel. We kept the current setup so the app runs end-to-end locally and can be demoed; production would need a hosted DB and object storage, which we’ve left as a clear next step.

Accomplishments that we're proud of

End-to-end voice + vision pipeline. We got STT (Whisper), image description (GPT-4o vision), and LLM evaluation (GPT-4o-mini) working together in one flow so operators can speak and optionally attach photos and get back a structured PASS/FAIL/UNSURE with reasons.
Checklist-driven UX. The app doesn’t replace the checklist; it makes it executable. Operators follow the same sections and steps they’re used to, with the system doing the logging and interpretation.
Deployed both sides. We have the Next.js frontend and the Express backend each deployed on Vercel from the same repo, with a simple env var linking them. Local dev and production share the same codebase.
QR in both directions. We generate a QR code for completed inspections and provide a scanner to open past inspections, which supports handoff and audit without extra apps.
Clean separation of concerns. Backend services (STT, vision, LLM) are in small modules; the frontend only talks to the API. That keeps the stack understandable and makes it straightforward to swap in a different DB or storage later.

What we learned

Vercel and Express. Zero-config Express (app at root, no api/ wrapper) works better for a single Express app than a catch-all API route when you don’t have a build step. Let Vercel detect the app and avoid setting runtime in vercel.json for Node; use engines in package.json or Project Settings instead.
Monorepo with two deploy targets. One repo can feed two Vercel projects by setting Root Directory per project. The frontend needs an explicit API URL (e.g. NEXT_PUBLIC_API_URL); once that’s set, the two deployments communicate over HTTP without any proxy.
Voice + images in one step. Combining transcript and image descriptions in a single LLM call gives a coherent PASS/FAIL/UNSURE and reason. Keeping the pipeline as transcribe → describe images → evaluate made the backend easy to reason about and debug.

What's next for CAT Ready

Production-grade persistence. Move off SQLite and local file writes for the deployed backend: use a hosted database (e.g. Turso, PlanetScale, or Vercel Postgres) and object storage (e.g. Vercel Blob or S3) for uploads so inspections and media persist and scale on Vercel.
More equipment and checklists. Expand beyond the single CAT 982 flow to support multiple models and configurable checklists so different sites or fleets can use their own procedures.
Stronger verification and compliance. Add timestamps, location (where relevant), and clearer audit trails so inspections are easier to defend in compliance reviews. Optional integration with existing fleet or maintenance systems.
Refinements to the AI pipeline. Tune prompts and models for domain-specific language and edge cases (e.g. heavy accent, noisy environment, ambiguous photos) and optionally support custom PASS/FAIL criteria per step or per customer.

Built With

better-sqlite3
clsx
cors
dotenv
express.js
framer-motion
javascript
lucide-react
multer
next.js-15
node.js
openai-api
openai-gpt-4o
openai-gpt-4o-mini
openai-whisper
pnpm
postcss
qrcode.react
react-19
sqlite
tailwind-css
tailwind-merge
tailwindcss-animate
turbopack
typescript
vercel

Updates

AviSrikumeran sri-kumeran started this project — Mar 01, 2026 06:48 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.