IronScan ๐Ÿ—๏ธ

AI-Powered Construction Verification & Spatial Intelligence

IronScan transforms raw job site footage into measurable, navigable digital twinsโ€”automatically flagging deviations between BIM models and as-built conditions before they become expensive to fix.


๐Ÿ“‹ What It Does

IronScan takes body camera footage from construction workers and a BIM model to produce:

  • Deviation Report: Per-frame and per-element analysis of discrepancies between design and built reality, generated by Gemini 2.0 Flash comparing BIM renders against site photos.
  • Annotated Keyframes: Site photos with markers showing severity, location, and recommended actions.
  • Fused Point Cloud: A colored .ply file built from depth-estimated frames, creating a dense mathematical map of the site.
  • HTML Report: A comprehensive dashboard featuring compliance scores, priority issues, per-element tables, and linkable comparison images.

Result: A web dashboard used by project managers, architects, and managers to monitor compliance and resolve issues remotely, reducing costly site visits.


๐ŸŽฌ Clean Video Enhancement

Before processing, footage passes through Clean Video, an AI-powered enhancement engine that transforms low-quality footage into production-ready data (Higher resolution, higher FPS, sharper, and cleaner).

๐Ÿง  How It Works

We use a deep learning pipeline combining:

  • RIFE: AI frame interpolation (24/30 FPS โ†’ 60 FPS).
  • Super-resolution: Upscaling (720p โ†’ 1080p / 4K).
  • Neural Denoising: Removing compression artifacts and grain.

โš™๏ธ Processing Pipeline

  1. Frame Extraction: Video is broken into individual frames.
  2. Denoising: Artifacts and grain are removed.
  3. Interpolation: Intermediate frames are generated for smoother motion.
  4. Super-Resolution: Deep neural networks upscale frame detail.
  5. Reconstruction: Enhanced frames are stitched via FFmpeg, preserving audio/timing.

๐Ÿ— Architecture

Stage 1: Frame Selection & Enhancement

Every frame is scored for edge sharpness using a Laplacian transform. Frames below the threshold are discarded. Survivors are enhanced (resharpening, dehazing, motion blur reduction) to ensure the cleanest possible input for AI models.

Stage 2: Depth Estimation

Depth Anything V2 runs on each enhanced frame to produce a per-pixel depth map. Relative depth values are calibrated to metric scale using a known ceiling height as a scene reference.

Stage 3: Point Cloud Reconstruction

Depth maps from multiple frames are back-projected into 3D space and fused into a single dense point cloud via voxel grid deduplication, creating a .ply file of the as-built geometry.

Stage 4: BIM Deviation Analysis

The fused cloud and BIM model are sent to Gemini via API. The model compares geometries and labels structural differences (missing, misplaced, or dimensionally incorrect elements).


๐Ÿ›  Tech Stack

Layer Technology
Depth Estimation Depth Anything V2 (vits/vitb/vitl)
Vision LLM Google Gemini 2.0 Flash
BIM Parsing trimesh
BIM Rendering NumPy software rasterizer (no GPU/OpenGL)
Image Processing OpenCV
ML Framework PyTorch (CUDA / Apple MPS)
Point Cloud Output PLY binary
Frontend React + TypeScript
Backend FastAPI + Three.js (3D Viewer)

๐Ÿš€ Usage

Command Examples

# Process video input
python pipeline.py --model site.obj --video footage.mp4 --outdir results/

# Test a single image
python pipeline.py --model site.obj --image frame.png --outdir results/

# Use a stricter sharpness filter for blurry footage
python pipeline.py --model site.obj --video footage.mp4 --blur-threshold 80

### Camera Poses
Edit camera_poses.py before running. For video, add timestamped keypoints โ€” poses are linearly interpolated between them. For single image, set IMAGE_POSE.
camera_poses.py
Share this project:

Updates