Inspiration

Pipeline operators spend 40+ hours manually aligning In-Line Inspection (ILI) data for a single pipeline segment. Misalignment causes unnecessary excavations costing $50K–$500K each — and missing a real defect can be catastrophic. We wanted to see if we could combine classical signal processing algorithms with agentic AI to do the entire analysis in seconds, and have the system explain its reasoning in plain English like a senior integrity engineer would.

What it does

PipelineAI ingests ILI inspection CSVs from three runs spanning 15 years (2007, 2015, 2022) and automatically:

  • Aligns them using Dynamic Time Warping to eliminate ±10% odometer drift between inspection tools
  • Clusters spatially proximate anomalies into ASME B31G interaction zones using DBSCAN
  • Matches 5,115 anomalies across runs via the Hungarian algorithm (96.5% match rate)
  • Tracks 362 three-way chains — the same defect followed across all three inspections
  • Computes growth rates, acceleration, and risk scores per 49 CFR Part 192
  • Narrates the lifecycle of the highest-risk chains using a 6-agent AI storytelling system — e.g. "This defect was dormant for 8 years then accelerated to 2.86 pp/yr. It will breach the 80% critical threshold in 3.2 years."

The React webapp lets engineers upload CSVs, see instant KPIs, interact with a pipeline heatmap and 3D pipe segment viewer, drill into highest-risk anomalies, and read AI-generated chain narratives — all in under 2 minutes.

How we built it

Backend (Python + FastAPI): An 11-step analysis pipeline — DTW alignment → piecewise-linear distance correction → DBSCAN clustering → Hungarian matching → growth analysis → risk scoring → 6-agent AI storytelling. The agents (Alignment, Matching, Validator, Explainer, Trend, Projection) run as an AutoGen RoundRobinGroupChat backed by Gemini 2.0 Flash. Exposed via FastAPI with async background analysis and webhook endpoints.

Frontend (React + Vite + TypeScript): Drag-and-drop upload → real-time backend processing → Zustand state management → interactive dashboard with 5 KPI cards, a pipeline heatmap (distance × clock position, color-coded by depth), a Three.js 3D pipe segment viewer rendering multiple anomalies per joint, risk score histogram (Recharts), high-risk anomaly table with regulatory compliance details, AI insights panel with expandable chain narratives, and a filter sidebar with Radix sliders and checkboxes. All styled with Tailwind CSS using the RCP design system.

Key algorithms: DTW with Sakoe-Chiba band constraint, piecewise-linear interpolation for distance correction, Hungarian algorithm (scipy linear_sum_assignment) for optimal matching, DBSCAN with circular clock-position handling for interaction zones, and XGBoost with SHAP for ML growth prediction.

Challenges we ran into

Odometer drift was the biggest technical hurdle. ILI tools measure distance with ±10% variance between runs, so a defect at 5,000 ft in 2007 might appear at 5,400 ft in 2022. We solved this with DTW alignment on girth weld reference points followed by piecewise-linear correction — achieving an RMSE of 8.2 ft.

Clock position wrap-around — 12 o'clock and 1 o'clock are adjacent but numerically distant. We converted to circular sine/cosine coordinates for DBSCAN so clustering correctly handles the 12→1 boundary.

Three-way chain linking — matching A→B and B→C doesn't trivially give A→B→C chains when match confidence varies. We built a chain propagation algorithm that links through pairwise matches while preserving confidence at each step.

Frontend-backend data contract — ensuring Python Pydantic models (snake_case with camelCase aliases) perfectly matched TypeScript interfaces required careful schema design with model_dump(by_alias=True) and validation on both sides.

Accomplishments that we're proud of

  • 96.5% match rate across 5,115 anomalies with only 8.2 ft alignment RMSE — exceeding the 95% / 10 ft industry thresholds
  • 362 three-way chains successfully tracked across 15 years, with 239 identified as accelerating
  • 6 specialized AI agents that produce lifecycle narratives indistinguishable from what a senior pipeline engineer would write, including regulatory citations and years-to-critical projections
  • Full-stack demo in 2 minutes: upload 3 CSVs → instant KPIs → interactive heatmap → 3D pipe viewer → AI narratives → risk table with 49 CFR compliance
  • $2.5M–$25M estimated cost savings per analysis through reduction of unnecessary excavations
  • 158 hours saved per pipeline segment compared to manual alignment (99% time reduction)

What we learned

The 70/30 split between proven algorithms and AI augmentation was the right architecture. DTW, Hungarian, and DBSCAN provide deterministic, auditable results that engineers trust — the AI agents add the interpretive layer that makes the data actionable. A single LLM prompt can't match the quality of 6 specialized agents each contributing domain expertise to tell a defect's 15-year story.

We also learned that the data contract between backend and frontend is worth getting right early — once the Pydantic-to-TypeScript schema was locked in, every new visualization component just worked.

What's next for PipelineAI

  • Live agent calls from the webapp (currently using cached analysis output)
  • Multi-pipeline portfolio view for operators managing hundreds of segments
  • GIS map integration for excavation planning with GPS coordinates
  • Automated regulatory reports — one-click PDF generation for 49 CFR compliance submissions
  • Mobile alerts for field engineers when new critical defects are detected
  • Continuous learning — retrain the XGBoost growth model as new inspection data arrives

Built With

Share this project:

Updates