Inspiration
Insurance fraud is a massive industry problem, costing approximately $308.6 billion annually. A common form of fraud involves exaggerating the severity of an accident or falsifying the time of the event to align with active insurance coverage. Currently, verifying these claims requires manual review by adjusters which is a slow and error-prone process. We were inspired to build an automated model that can instantly ingest the two most common pieces of evidence, the police/incident report and CCTV footage, and mathematically determine if they tell the same story
What it does
CV Crash is a three-stage automated audit pipeline designed to detect inconsistencies between what was said happened and what actually happened.
- Text Analysis: It parses text incident reports to extract the reported timestamp and the claimed severity (e.g., "Severe," "Rear-end").
- Video Analysis: It analyzes raw CCTV footage using Computer Vision to detect the exact frame of impact and classify the visual severity of the crash.
- The Audit: It calculates a consistency score by comparing the reported details against the visual evidence. If the time gap (ΔT) exceeds our threshold (τ), it flags the claim as INCONSISTENT. If the visual damage does not match the reported severity description, it flags a MISMATCH
How we built it
Data Engineering & Synthesis: We scraped CCTV footage from YouTube, curating a mix of clear 1080p clips and noisy, blurry, low-light footage to simulate the raw video evidence insurers actually receive. We then synthesized corresponding police incident reports for these clips, intentionally fabricating specific details in some reports to test our auditor's ability to catch lies.
The CV Pipeline -Frame Extraction: We used OpenCV to break raw MP4 files into individual frames. -Collision Detection: We deployed a YOLOv8-Nano model to scan the stream. -Severity Classification: We implemented a dual approach: standard rule-based logic combined with a MobileNetV2 neural network.
The NLP Pipeline Printed Text Extraction: We utilized PyTesseract to parse standard typed reports, extracting specific entities like severity, timestamps, and crash descriptions. Handwritten Text Extraction: We deployed Microsoft's TrOCR (Transformer-based OCR) to interpret complex handwritten notes that traditional OCR engines fail to process
The Consistency Logic We implemented a weighted scoring algorithm to ensure fair auditing -Pre-Flight Check: The system first checks Collision_Detected. If the AI sees no crash, the score is immediately set to 0. Time Audit (50 pts): We calculate the absolute difference: ΔT=| TReport−TActual | If ΔT≤5 seconds, the system awards 50 points. Severity Audit (50 pts): We normalize the text strings and compare the reported severity against the visual classification. A match awards the remaining 50 points.
Challenges we ran into
-Data Alignment: Creating a "ground truth" for the NLP side was difficult. We had to manually create incident reports that sounded authentic (like the detailed "Urban intersection in a Russian-speaking region" report) while intentionally injecting subtle errors to test our Auditor's detection capabilities
-Time Normalization: The text report gives us "Clock Time" (e.g., "14:30 PM"), but the video gives us "Duration Time" (e.g., "Frame 450 at 15s"). We had to build robust normalization logic in the ReportProcessor to map these two timelines together.
Visual Ambiguity: In clips like the "Nighttime Highway" scenario, low light made it difficult for the standard YOLO model to detect the collision moment immediately. We had to tweak confidence thresholds to avoid false positives from passing cars.
Accomplishments that we're proud of
-Robust Scoring Logic: We are proud of the scoring_consistency function. It doesn't just guess; it provides a structured breakdown (e.g., "Time gap (12s) exceeds threshold"), making the output explainable to a human user.
-The Pipeline works: Watching the system ingest a raw YouTube video and a text file, and correctly outputting a JSON object with status: "MISMATCH" because we faked the report, was a huge win
What we learned
What's next for CV Crash Insurance Fraud Detection
-3D Reconstruction: Moving from 2D bounding boxes to 3D volume estimation to calculate repair costs automatically.
-Geospatial Verification: extracting location data from the report (e.g., "Intersection of Multi-lane street") and verifying it against GPS tags in the video metadata.
Log in or sign up for Devpost to join the conversation.