Inspiration
In industries from manufacturing to civil engineering, manual visual inspection is a massive bottleneck. It's slow, expensive, subjective, and prone to human error. A tiny crack missed in an F1 car, a subtle brand logo non-compliance, or the slow degradation of a bridge can all lead to disastrous or costly outcomes.
We were inspired by the gap between simple "spot the difference" tools and what these high-stakes industries actually need. It's not enough to know that something changed; you need to know what changed and how. Our inspiration was to build a general-purpose engine that doesn't just find a "diff" but understands it.
What it does
Q3 is a general-purpose visual comparison engine that detects, highlights, and classifies changes across time-series images.
Our engine performs a three-step process: Alignment: It first perfectly aligns the new image with a baseline or previous image, correcting for slight shifts in camera angle, zoom, or lighting. Detection: It performs a deep comparison to create a "heat map" of what has visually changed. Classification: This is our core feature. Instead of just showing a red circle, it classifies the change. It can distinguish between "Surface Rust," "New Crack," "Missing Component," "Color Fading," or "Object Added."
This provides users with a scannable, automated log of meaningful changes, complete with a "what," "where," and "when" for every visual alteration.
How we'll build it
Frontend: We'll use React and Tailwind CSS to create a clean, responsive dashboard. This will allow users to upload image streams, manage their monitored assets, and view the comparison results with clear, annotated overlays.
Backend: A FastAPI (Python) server manages the API endpoints, job queues, and user data.
Core CV Engine: Image Registration: We will use OpenCV with feature-matching algorithms (like SIFT) to precisely align images before comparison. This is critical to avoid false positives from camera shake. Difference Detection: We will move beyond simple pixel subtraction, using Structural Similarity (SSIM) to find meaningful changes in texture, luminance, and structure, which is far more robust to lighting variations. Change Classification: For the classification "hack," we used a multi-modal large language model (LLM). We crop the "changed" regions from our SSIM map and pass them to the Gemini API with a prompt like: "Analyze this image patch. Describe the primary visual anomaly in 1-3 words (e.g., 'crack', 'rust', 'stain', 'missing object')." This will allow us to build a powerful, general-purpose classifier in hours, not months.
Log in or sign up for Devpost to join the conversation.