About the Project

Inspiration

Working in the food service industry, I've seen firsthand how quality control bottlenecks slow down operations. QC stations rely on human checkers manually verifying every order—a process that's slow, error-prone, and doesn't scale. I wondered: what if AI could handle this automatically?

The challenge was that running continuous video processing on edge devices like Raspberry Pi causes overheating and poor performance. I needed a solution that kept the edge lightweight while leveraging powerful AI in the cloud.

What I Learned

This project taught me the power of hybrid cloud architecture. By splitting responsibilities—lightweight frame capture on the edge, heavy AI inference in the cloud—I achieved the best of both worlds:

$$\text{Total Latency} = T_{\text{capture}} + T_{\text{upload}} + T_{\text{inference}} < 5\text{s}$$

I also discovered spec-driven development with Kiro. Writing formal requirements and correctness properties before coding forced me to think through edge cases early. Property-based testing with fast-check then verified these properties held across thousands of random inputs.

Key technical insights:

  • DETR (DEtection TRansformer) models work surprisingly well for food item detection
  • Cloudflare Workers AI eliminates cold starts that plague traditional serverless AI
  • D1's SQLite-based approach simplifies schema migrations dramatically

How I Built It

The system has three main components:

  1. RPi Vision Client (Python) — Captures single frames from RTSP cameras using ffmpeg, then immediately releases the connection. This "capture-and-release" pattern keeps CPU usage under 15%.

  2. Cloud Vision API (Cloudflare Workers) — Receives frames, runs object detection via Workers AI (DETR ResNet-50), stores thumbnails in R2, and logs results to D1.

  3. HeySalad QC Web App (React + TypeScript) — Displays real-time detection results with bounding box overlays, manages stations, and generates printable QR code mats.

The detection pipeline:

Camera → RPi (ffmpeg) → Workers AI → D1/R2 → React Dashboard
         ~10ms          ~2s           ~50ms    real-time

Challenges I Faced

Challenge 1: Raspberry Pi Overheating

My initial approach used OpenCV with continuous video streaming. The Pi would thermal throttle within minutes. Solution: switched to periodic single-frame capture with ffmpeg subprocess calls, reducing CPU from 80%+ to <15%.

Challenge 2: Workers AI Model Selection

Not all vision models support object detection with bounding boxes. After testing several options, DETR ResNet-50 (@cf/facebook/detr-resnet-50) provided the right balance of accuracy and speed for food item detection.

Challenge 3: Confidence Threshold Tuning

Too low = false positives (detecting "sandwich" in empty frames). Too high = missed items. I settled on a configurable threshold with a default of 0.5, allowing operators to tune per-station:

$$P(\text{detection}) = \begin{cases} 1 & \text{if } \text{confidence} > \theta \ 0 & \text{otherwise} \end{cases}$$

Challenge 4: CORS and Authentication

The RPi client needed to authenticate without exposing keys in browser requests. I implemented API key authentication via X-API-Key header for machine-to-machine communication, separate from the browser-based frontend.

Share this project:

Updates