LensGuard — "See what changed—only the parts that matter."
A general-purpose, edge-ready visual difference engine that aligns images across time, detects and classifies meaningful changes, and learns from operator feedback.
Inspiration
Manufacturing lines, labs, clinics, and field inspections depend on catching small visual deviations before they become big problems—misaligned parts, missing labels, hairline cracks, contamination, or equipment wear. Existing tools often either over-alert on harmless variation (lighting, camera angle, shadows) or miss subtle, evolving defects. LensGuard was created to deliver a robust, explainable, and open-source visual difference engine that runs on low-cost edge devices and scales to the cloud. The goal: show only the changes that matter, reduce false alarms, and continuously learn from operator feedback.
What it does
LensGuard detects and classifies meaningful changes between time-separated images of the same scene.
Key capabilities
- Alignment & normalization: Automatic geometric registration and photometric normalization so comparisons are like-for-like.
- Hybrid change detection: Fast classical CV proposes candidates; deep models validate and refine changes.
- Unsupervised anomaly segmentation: Finds novel changes without labeled defect data.
- Optional supervised refinement: Learns known change types (e.g., scratch, dent, missing component) for higher precision.
- Temporal tracking: Tracks changes over time to suppress flicker and highlight trends (e.g., a growing crack).
- Explainable overlays: Heatmaps, masks, and boxes overlaid on originals.
- Human-in-the-loop: Approve/deny and relabel, powering an active learning loop.
- Edge-ready & cloud-scalable: Runs on Raspberry Pi/Jetson or servers; supports batch and real-time streams.
Typical scenarios
- Manufacturing: Missing screws/parts, surface defects, misprints, label/logo compliance.
- Brand compliance & retail: Shelf planogram verification, signage integrity.
- Infrastructure: Corrosion, spalling, leaks, and cracks in periodic inspections.
- Compliance audits: Safety markers, seals, configuration drift.
How we built it
Methodologies & techniques
1) Data ingest & storage
- Sources: USB/RTSP cameras, drones/robots, or bulk folders.
- Metadata: timestamp, camera ID, site, task ID.
- Storage: MinIO (S3-compatible) for images/artifacts; SQLite on edge and PostgreSQL/TimescaleDB on server for metadata and time series.
2) Preprocessing (alignment & normalization)
- Geometric registration: ORB/AKAZE features + RANSAC homography; ArUco/AprilTag markers for hard scenes. Homography warp: $$ s\begin{bmatrix}x'\y'\1\end{bmatrix}= \begin{bmatrix} h_{11}&h_{12}&h_{13}\ h_{21}&h_{22}&h_{23}\ h_{31}&h_{32}&h_{33} \end{bmatrix} \begin{bmatrix}x\y\1\end{bmatrix} $$
- Photometric normalization: CLAHE, gray-world white balance, optional gamma/exposure matching.
- Denoising: Bilateral or fast non-local means; morphology to clean small blobs.
3) Change proposal (fast classical CV)
- Structural similarity (SSIM) and absolute/gradient differences to create coarse masks and ROIs. SSIM: $$ \mathrm{SSIM}(x,y)= \frac{(2\mu_x\mu_y+C_1)(2\sigma_{xy}+C_2)} {(\mu_x^2+\mu_y^2+C_1)(\sigma_x^2+\sigma_y^2+C_2)} $$
- Morphological filtering to eliminate noise and enforce coherent regions.
4) Semantic refinement (deep learning)
- Unsupervised anomaly detection: Anomalib models (PatchCore/PaDiM/STFPM) trained on “normal” images produce anomaly heatmaps and scores.
- Optional supervised segmentation: Lightweight UNet/DeepLab with MobileNet/EfficientNet backbones trained on labeled change masks to improve precision and classify change types.
Training loss (example, BCE + Dice + CE): $$ \mathcal{L} = \lambda_1,\mathrm{BCE}(M,\hat{M})
- \lambda_2\Big(1-\frac{2\sum M\hat{M}}{\sum M+\sum \hat{M}}\Big)
- \lambda_3,\mathrm{CE}(y,\hat{y}) $$
5) Temporal logic
- Blob tracking via IoU/centroid association; require persistence in $N$ of $M$ frames to suppress one-off noise.
- Trend metric (growth rate) to prioritize evolving issues: $$ r=\frac{1}{\Delta t},\frac{A_t - A_{t-\Delta t}}{A_{t-\Delta t}} $$
6) Serving & UX
- FastAPI backend (REST/WebSockets).
- Web viewer with before/after slider, overlays, alert list, and timeline scrubber.
- Feedback controls to approve/deny, relabel, and send patches to Label Studio.
7) Active learning & MLOps
- Label Studio for annotation; DVC for dataset versioning; MLflow for experiments and model registry.
- Scheduled retraining pipelines; threshold calibration via precision-recall curves.
- Monitoring with Prometheus/Grafana (FPS, latency, false-alarm rate, review throughput).
Technology stack (all open-source or free)
- CV & imaging: OpenCV, scikit-image, Kornia, NumPy
- Deep learning: PyTorch, torchvision, segmentation-models-pytorch, Anomalib, Albumentations
- Serving & UI: FastAPI, Uvicorn, Jinja2/Bootstrap or React/Vite, Socket.IO/WebSockets
- Annotation: Label Studio or CVAT
- Model runtime: ONNX Runtime (CPU/ARM), TensorRT (Jetson), NVIDIA Triton (optional)
- Storage & data: MinIO (S3), SQLite (edge), PostgreSQL/TimescaleDB (server)
- Pipelines & ops: Docker, docker-compose, Prefect/GitHub Actions
- MLOps: MLflow, DVC
- Observability: Prometheus, Grafana, Loki (optional)
- Messaging (optional): Redis Streams or MQTT (Mosquitto)
Datasets (free sources to bootstrap)
- Industrial anomalies: MVTec AD; KolektorSDD/KSDD2; DAGM 2007; Severstal Steel Defect (Kaggle)
- Change detection pairs: LEVIR-CD/LEVIR-CD+; CDD; VL-CMU-CD; PCD
- Infrastructure cracks: SDNET2018; CrackForest; DeepCrack
- Own data: 100–1,000 normal images per scene for unsupervised; 100–1,000 labeled pairs for supervised refinement.
Edge feasibility
- Raspberry Pi 4/5: Run CV proposals every frame and gate deep inference to ROIs; ONNX Runtime on CPU; input 256–384 px for ~3–5 FPS.
- Jetson Nano/Xavier/Orin: Export to ONNX, optimize with TensorRT INT8; 10–15+ FPS with gating and lightweight backbones.
- x86 without GPU: ONNX Runtime/OpenVINO EP; suitable for periodic inspections or batch jobs.
Challenges we ran into
- Lighting and shadows: Uncontrolled illumination caused spurious differences; mitigated with photometric normalization, gradient-based diffs, and temporal persistence.
- Viewpoint drift: Small camera shifts created false changes; feature-based registration plus optional fiducials reduced misalignment.
- Data scarcity: Labeled defect data was limited; starting with unsupervised anomaly detection delivered utility on day one, with supervised refinement added later through operator feedback.
- Edge constraints: Limited compute/memory—addressed via a gated pipeline, model quantization, and ROI-focused inference.
- Operational calibration: Site-specific thresholds and filters—solved with calibration workflows and dashboards.
Accomplishments that we’re proud of
- Low false alarms without losing recall: Hybrid ensemble (CV proposals + unsupervised anomaly + optional supervised refiner) outperformed any single approach.
- Edge readiness: Quantized ONNX/TensorRT models and ROI-gated inference achieved real-time performance on Pi/Jetson.
- Explainability: Clear overlays and per-region scores increased trust and sped up validation.
- Continuous improvement: Feedback loop with Label Studio, DVC, and MLflow adapted the system to each site.
- All open-source: No licenses, no lock-in, highly customizable.
What we learned
- Alignment and normalization matter more than model size. Good geometry and photometrics dramatically cut false positives pre-ML.
- Hybrid beats purist. Classical CV is fast and cheap; deep learning is robust—together they deliver practical accuracy at low cost.
- Temporal memory is powerful. Persistence checks and trend analysis reduce noise and turn single alerts into actionable insights.
- Small models, smart routing. Lightweight backbones and ROI-first pipelines make edge inference practical.
What’s next for LensGuard
- Few-shot change typing: CLIP-style embeddings to classify change types with minimal labels per site.
- Multi-modal fusion: Optional thermal or depth streams to separate real material changes from illumination effects.
- Fleet learning: Federated or privacy-preserving updates to share robustness across sites without sharing raw images.
- Proactive analytics: Growth rates, predicted time-to-threshold, and risk scoring for maintenance planning.
- Auto-setup assistant: Camera placement guidance, fiducial usage, lighting checks, and automated calibration scoring.
- Integrations: Webhooks and connectors for MES/SCADA/CMMS to open tickets on high-priority changes automatically.
Why it’s better than many existing solutions
- Robust out-of-the-box: Alignment, normalization, classical proposals, deep models, and temporal tracking handle lighting, angle, and noise without heavy manual tuning.
- Known and unknown changes: Unsupervised anomaly detection finds novel issues; supervised heads classify known defect types.
- Edge-first design: Real-time pipelines on affordable hardware; no mandatory cloud or expensive licenses.
- Explainable and operator-centric: Human-readable overlays, scores, and change types increase trust and speed up validation.
- Self-improving: Built-in active learning increases precision the longer it runs.
- Open and extensible: 100% open-source stack avoids lock-in and enables rapid customization for new domains.
Scalability and feasibility
- Scale up: Containerized services, optional Triton for multi-model serving, and k3s/Kubernetes for many cameras/sites. TimescaleDB supports time-series metadata at fleet scale.
- Feasible for small teams: Clear MVP path, commodity hardware, and open-source tools let a small team pilot in weeks and expand iteratively.
- Cost control: Edge inference reduces bandwidth/cloud spend; open-source stack avoids per-seat or per-camera licensing fees.
Log in or sign up for Devpost to join the conversation.