LensGuard — "See what changed—only the parts that matter."

A general-purpose, edge-ready visual difference engine that aligns images across time, detects and classifies meaningful changes, and learns from operator feedback.

Inspiration

Manufacturing lines, labs, clinics, and field inspections depend on catching small visual deviations before they become big problems—misaligned parts, missing labels, hairline cracks, contamination, or equipment wear. Existing tools often either over-alert on harmless variation (lighting, camera angle, shadows) or miss subtle, evolving defects. LensGuard was created to deliver a robust, explainable, and open-source visual difference engine that runs on low-cost edge devices and scales to the cloud. The goal: show only the changes that matter, reduce false alarms, and continuously learn from operator feedback.

What it does

LensGuard detects and classifies meaningful changes between time-separated images of the same scene.

Key capabilities

Alignment & normalization: Automatic geometric registration and photometric normalization so comparisons are like-for-like.
Hybrid change detection: Fast classical CV proposes candidates; deep models validate and refine changes.
Unsupervised anomaly segmentation: Finds novel changes without labeled defect data.
Optional supervised refinement: Learns known change types (e.g., scratch, dent, missing component) for higher precision.
Temporal tracking: Tracks changes over time to suppress flicker and highlight trends (e.g., a growing crack).
Explainable overlays: Heatmaps, masks, and boxes overlaid on originals.
Human-in-the-loop: Approve/deny and relabel, powering an active learning loop.
Edge-ready & cloud-scalable: Runs on Raspberry Pi/Jetson or servers; supports batch and real-time streams.

Typical scenarios

Manufacturing: Missing screws/parts, surface defects, misprints, label/logo compliance.
Brand compliance & retail: Shelf planogram verification, signage integrity.
Infrastructure: Corrosion, spalling, leaks, and cracks in periodic inspections.
Compliance audits: Safety markers, seals, configuration drift.

How we built it

Methodologies & techniques

1) Data ingest & storage

Sources: USB/RTSP cameras, drones/robots, or bulk folders.
Metadata: timestamp, camera ID, site, task ID.
Storage: MinIO (S3-compatible) for images/artifacts; SQLite on edge and PostgreSQL/TimescaleDB on server for metadata and time series.

2) Preprocessing (alignment & normalization)

Geometric registration: ORB/AKAZE features + RANSAC homography; ArUco/AprilTag markers for hard scenes. Homography warp: $$ s\begin{bmatrix}x'\y'\1\end{bmatrix}= \begin{bmatrix} h_{11}&h_{12}&h_{13}\ h_{21}&h_{22}&h_{23}\ h_{31}&h_{32}&h_{33} \end{bmatrix} \begin{bmatrix}x\y\1\end{bmatrix} $$
Photometric normalization: CLAHE, gray-world white balance, optional gamma/exposure matching.
Denoising: Bilateral or fast non-local means; morphology to clean small blobs.

3) Change proposal (fast classical CV)

Structural similarity (SSIM) and absolute/gradient differences to create coarse masks and ROIs. SSIM: $$ \mathrm{SSIM}(x,y)= \frac{(2\mu_x\mu_y+C_1)(2\sigma_{xy}+C_2)} {(\mu_x^2+\mu_y^2+C_1)(\sigma_x^2+\sigma_y^2+C_2)} $$
Morphological filtering to eliminate noise and enforce coherent regions.

4) Semantic refinement (deep learning)

Unsupervised anomaly detection: Anomalib models (PatchCore/PaDiM/STFPM) trained on “normal” images produce anomaly heatmaps and scores.
Optional supervised segmentation: Lightweight UNet/DeepLab with MobileNet/EfficientNet backbones trained on labeled change masks to improve precision and classify change types.
Training loss (example, BCE + Dice + CE): $$ \mathcal{L} = \lambda_1,\mathrm{BCE}(M,\hat{M})
- \lambda_2\Big(1-\frac{2\sum M\hat{M}}{\sum M+\sum \hat{M}}\Big)
- \lambda_3,\mathrm{CE}(y,\hat{y}) $$

5) Temporal logic

Blob tracking via IoU/centroid association; require persistence in $N$ of $M$ frames to suppress one-off noise.
Trend metric (growth rate) to prioritize evolving issues: $$ r=\frac{1}{\Delta t},\frac{A_t - A_{t-\Delta t}}{A_{t-\Delta t}} $$

6) Serving & UX

FastAPI backend (REST/WebSockets).
Web viewer with before/after slider, overlays, alert list, and timeline scrubber.
Feedback controls to approve/deny, relabel, and send patches to Label Studio.

7) Active learning & MLOps

Label Studio for annotation; DVC for dataset versioning; MLflow for experiments and model registry.
Scheduled retraining pipelines; threshold calibration via precision-recall curves.
Monitoring with Prometheus/Grafana (FPS, latency, false-alarm rate, review throughput).

Technology stack (all open-source or free)

CV & imaging: OpenCV, scikit-image, Kornia, NumPy
Deep learning: PyTorch, torchvision, segmentation-models-pytorch, Anomalib, Albumentations
Serving & UI: FastAPI, Uvicorn, Jinja2/Bootstrap or React/Vite, Socket.IO/WebSockets
Annotation: Label Studio or CVAT
Model runtime: ONNX Runtime (CPU/ARM), TensorRT (Jetson), NVIDIA Triton (optional)
Storage & data: MinIO (S3), SQLite (edge), PostgreSQL/TimescaleDB (server)
Pipelines & ops: Docker, docker-compose, Prefect/GitHub Actions
MLOps: MLflow, DVC
Observability: Prometheus, Grafana, Loki (optional)
Messaging (optional): Redis Streams or MQTT (Mosquitto)

Datasets (free sources to bootstrap)

Industrial anomalies: MVTec AD; KolektorSDD/KSDD2; DAGM 2007; Severstal Steel Defect (Kaggle)
Change detection pairs: LEVIR-CD/LEVIR-CD+; CDD; VL-CMU-CD; PCD
Infrastructure cracks: SDNET2018; CrackForest; DeepCrack
Own data: 100–1,000 normal images per scene for unsupervised; 100–1,000 labeled pairs for supervised refinement.

Edge feasibility

Raspberry Pi 4/5: Run CV proposals every frame and gate deep inference to ROIs; ONNX Runtime on CPU; input 256–384 px for ~3–5 FPS.
Jetson Nano/Xavier/Orin: Export to ONNX, optimize with TensorRT INT8; 10–15+ FPS with gating and lightweight backbones.
x86 without GPU: ONNX Runtime/OpenVINO EP; suitable for periodic inspections or batch jobs.

Challenges we ran into

Lighting and shadows: Uncontrolled illumination caused spurious differences; mitigated with photometric normalization, gradient-based diffs, and temporal persistence.
Viewpoint drift: Small camera shifts created false changes; feature-based registration plus optional fiducials reduced misalignment.
Data scarcity: Labeled defect data was limited; starting with unsupervised anomaly detection delivered utility on day one, with supervised refinement added later through operator feedback.
Edge constraints: Limited compute/memory—addressed via a gated pipeline, model quantization, and ROI-focused inference.
Operational calibration: Site-specific thresholds and filters—solved with calibration workflows and dashboards.

Accomplishments that we’re proud of

Low false alarms without losing recall: Hybrid ensemble (CV proposals + unsupervised anomaly + optional supervised refiner) outperformed any single approach.
Edge readiness: Quantized ONNX/TensorRT models and ROI-gated inference achieved real-time performance on Pi/Jetson.
Explainability: Clear overlays and per-region scores increased trust and sped up validation.
Continuous improvement: Feedback loop with Label Studio, DVC, and MLflow adapted the system to each site.
All open-source: No licenses, no lock-in, highly customizable.

What we learned

Alignment and normalization matter more than model size. Good geometry and photometrics dramatically cut false positives pre-ML.
Hybrid beats purist. Classical CV is fast and cheap; deep learning is robust—together they deliver practical accuracy at low cost.
Temporal memory is powerful. Persistence checks and trend analysis reduce noise and turn single alerts into actionable insights.
Small models, smart routing. Lightweight backbones and ROI-first pipelines make edge inference practical.

What’s next for LensGuard

Few-shot change typing: CLIP-style embeddings to classify change types with minimal labels per site.
Multi-modal fusion: Optional thermal or depth streams to separate real material changes from illumination effects.
Fleet learning: Federated or privacy-preserving updates to share robustness across sites without sharing raw images.
Proactive analytics: Growth rates, predicted time-to-threshold, and risk scoring for maintenance planning.
Auto-setup assistant: Camera placement guidance, fiducial usage, lighting checks, and automated calibration scoring.
Integrations: Webhooks and connectors for MES/SCADA/CMMS to open tickets on high-priority changes automatically.

Why it’s better than many existing solutions

Robust out-of-the-box: Alignment, normalization, classical proposals, deep models, and temporal tracking handle lighting, angle, and noise without heavy manual tuning.
Known and unknown changes: Unsupervised anomaly detection finds novel issues; supervised heads classify known defect types.
Edge-first design: Real-time pipelines on affordable hardware; no mandatory cloud or expensive licenses.
Explainable and operator-centric: Human-readable overlays, scores, and change types increase trust and speed up validation.
Self-improving: Built-in active learning increases precision the longer it runs.
Open and extensible: 100% open-source stack avoids lock-in and enables rapid customization for new domains.

Scalability and feasibility

Scale up: Containerized services, optional Triton for multi-model serving, and k3s/Kubernetes for many cameras/sites. TimescaleDB supports time-series metadata at fleet scale.
Feasible for small teams: Clear MVP path, commodity hardware, and open-source tools let a small team pilot in weeks and expand iteratively.
Cost control: Edge inference reduces bandwidth/cloud spend; open-source stack avoids per-seat or per-camera licensing fees.

Built With

albumentations
anomalib
bash
cvat
docker
fastapi
grafana
javascript
kornia
kubernetes
minio
mlflow
numpy
onnx
opencv
pandas
prometheus
python
pytorch
react
redis
scikit-image
segmentation-models-pytorch
socket.io
sqlite
torchmetrics
torchvision
uvicorn
vite

Submitted to

TrackShift Innovation Challenge

Created by

Team Leader, worked on planning and project management with dividing tasks to member. Worked on the AI, ML, DL, CV related parts of the project,

Rupesh Bharambe
SIH’24 Winner | 4 x National Level Hackathon Winner | 13 x NT Hackathon Finalists | Lifelong Learner & Explorer | AI-ML-DL Enthusiast
Nilesh Kale
International Hackathon Winner | B.Tech Final Year Student | Artificial Intelligence and Data Science
Abhijeet Gaikwad

Updates

Rupesh Bharambe started this project — Oct 25, 2025 03:29 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.