RCP Anomaly Detection — Automated ILI Alignment, Matching, and Growth Analytics

Inspiration

Pipelines are lifelines. Yet, inspection data collected years apart rarely lines up: odometer drift, differing references, and evolving corrosion make apples‑to‑apples comparisons hard. Engineers still match defects by eye—slow, costly, and inconsistent. We were inspired to automate the way experts think: align runs, match the same physical defects, quantify growth, and surface new or missing anomalies so teams can act faster and safer.

What it does

  • Aligns two inspection runs (e.g., 2007→2015 and 2015→2022) to a common coordinate system, correcting distance drift.
  • Matches the same physical defects across runs with confidence bands (High/Med/Low).
  • Computes growth metrics (depth, length, width), normalized by time between inspections.
  • Flags new defects, missing defects, and uncertain matches for review.
  • Delivers an interactive, single‑page dashboard (no iframes) and an iframe version for stakeholder sharing.

How we built it

The end‑to‑end workflow is orchestrated in ili_eda.ipynb and produces web dashboards and a presentation.

Data preparation and alignment

  • Standardize units and feature types; normalize clock positions and handle missing fields.
  • Use anchors (girth welds, valves, fittings) to correct drift and map distances to a common axis.
  • Linear drift correction (conceptual): $$x_2^{*} = a\,x_2 + b$$ Parameters $(a,b)$ are estimated to minimize anchor residuals. For segments needing more flexibility, piecewise or non‑linear transforms can be applied.

Anomaly matching and confidence

  • For each defect in Run 2, search candidate defects in Run 1 within a distance window; compare corrected distance, clock, type compatibility, and size.
  • Example similarity score (conceptual): $$S = w_d \exp!\left(-\tfrac{|x_1 - x_2^{*}|}{\tau_d}\right) \;+ w_c \cos(\Delta\theta) \;+ w_s \exp!\left(-\tfrac{\lVert s_1 - s_2 \rVert}{\tau_s}\right)$$
  • Assign best matches; flag uncertain/new/missing. Confidence bands are thresholded (e.g., High: $c\ge0.8$, Med: $c\ge0.5$, Low otherwise).

Growth analytics

  • For matched defects, compute growth normalized by elapsed time $\Delta t$: $$g_{\text{depth}} = \frac{d_2 - d_1}{\Delta t},\quad g_{\text{length}} = \frac{\ell_2 - \ell_1}{\Delta t},\quad g_{\text{width}} = \frac{w_2 - w_1}{\Delta t}$$
  • Highlight fastest‑growing anomalies for prioritization.

Interactive dashboards and outputs

  • Interactive metrics (depth, length, width) with web‑side histogram bin toggles (20/30/50/80).
  • Separate color scales per scatter; neat, aligned layout.
  • Alignment UI shows full matched pairs (no sampling), confidence filters, and show/hide unmatched.
  • Outputs:
    • Inline single‑page dashboard: outputs/dashboard_all_in_one.html
    • Iframe consolidated dashboard: outputs/dashboard_all_in_one.html
    • Alignment pages: outputs/ili_alignment_2007_to_2015.html, outputs/ili_alignment_2015_to_2022.html
    • Winter‑themed deck: outputs/presentation_company_overview_winter_v2.pptx

Challenges we ran into

  • Cross‑run drift and inconsistent reference reporting required robust alignment and careful anchor selection.
  • Varying column names and missing fields caused runtime issues; we added resilient column selection and defaults (e.g., confidence fallback).
  • Inline embedding initially failed due to notebook scope and a confidence mapping bug—fixed with local helper definition, safe Series mapping, and robust distance column heuristics.
  • Iframe path visibility concerns were avoided by shipping a self‑contained single‑page dashboard.

Accomplishments that we're proud of

  • Full matched‑pair alignment visualizations with confidence filters and unmatched toggles.
  • A polished, single‑page interactive dashboard that stakeholders can use without setup.
  • Clear, AI‑style summaries embedded into the experience to accelerate understanding.
  • A winter‑themed executive deck suitable for company demos.

What we learned

  • Alignment across inspections is as much data hygiene as math—unit/format normalization matters.
  • Modular separation (alignment → matching → growth → UI) keeps the system understandable and testable.
  • Small UI details (bin toggles, per‑plot color bars, tidy layout) dramatically improve analyst speed.
  • Build for resilience: default confidence bands, flexible column selection, and a self‑contained inline page prevent demo‑time surprises.

What's next for RCP Anomaly Detection

  • Validation dataset (as mentioned) will let us strengthen the pipeline and benchmark accuracy.
  • Today’s approach is largely rule‑based due to limited data; with past trends and labels, we’ll evolve to an ML matching pipeline (learned similarity + confidence estimation).
  • Add unmatched counts in subtitles; consider tabbed layouts and richer filtering.
  • Publish internally with SSO and automate refresh with each new ILI run.
  • Scale to multiple segments and add health scoring for risk‑based prioritization.

Built with

  • Languages & Libraries: Python 3.13, Pandas, Plotly (graph_objects/express), python‑pptx
  • Environment & Tools: Jupyter Notebook (ili_eda.ipynb), VS Code, virtualenv on macOS
  • UI/Controls: Web‑side Plotly buttons (bin toggles), confidence/unmatched filters
  • Optional AI: OpenAI summaries (with local fallback)
  • Data: CSVs per run (final results, matches/new/missing/aligned)

Try it locally (optional)

# Open the single-file dashboard
open /Users/satvikaakati/Desktop/Hack/outputs/dashboard_all_in_one_inline.html

# Open the winter-themed presentation
open /Users/satvikaakati/Desktop/Hack/outputs/presentation_company_overview_winter_v2.pptx

Built With

  • confidence/unmatched-filters-??-?optional-ai:-openai-summaries-(with-local-fallback)-??-?data:-csvs-per-run-(final-results
  • pandas
  • plotly-(graph-objects/express)
  • python
  • python?pptx-??-?environment-&-tools:-jupyter-notebook-(??ili-eda.ipynb??)
  • virtualenv-on-macos-??-?ui/controls:-web?side-plotly-buttons-(bin-toggles)
  • vs-code
Share this project:

Updates