Inspiration

The U.S. operates around 200,000 miles of high-voltage transmission lines that have to be inspected continuously to prevent outages. Today that inspection bottleneck sits on a human watching drone footage frame-by-frame at roughly 25 miles per analyst per day. Vegetation contact is the leading cause of large-scale grid outages in North America and is regulated under NERC FAC-003-4, but spotting an encroaching tree or a shattered porcelain disk in hours of aerial footage is exactly the task fatigue degrades fastest.

We started GridSight from a single bet: video foundation models can replace frame-by-frame human review for the two anomaly types that drive most of the regulatory and reliability pressure on transmission operators insulator damage and vegetation encroachment if the system is honest about its confidence and grounded in real regulatory thresholds rather than heuristic "anomaly scores."

What it does

GridSight ingests the two standard outputs of any drone inspection, a video file and a per-second telemetry stream (CSV or DJI SRT) and produces georeferenced, severity-scored findings ready to drop into a utility work-order system.

  • Marengo 3.0 indexes the curated video as a single multimodal asset. Natural-language queries ("missing or shattered porcelain insulator", "tree branches close to or touching power line conductors") surface candidate timestamps.
  • Pegasus 1.2 describes a 15-second evidence clip around each candidate with a structured JSON schema: component_type, condition (intact / damaged / contaminated / unclear), specific_defects, and a confidence tier.
  • A rules engine maps each finding to a severity tier anchored in NERC FAC-003-4 Minimum Vegetation Clearance Distance (4.3 ft at 345 kV) and standard insulator failure modes. A combined_confidence field fuses Marengo similarity and Pegasus confidence for sorting.
  • Telemetry lookup attaches GPS, AGL altitude, heading, and ground speed to every finding.
  • Outputs: findings.json, findings.csv, findings.geojson, evidence clips, plus a Next.js dashboard with a Leaflet map, severity-coded pins, telemetry inspector, evidence playback, severity-heatmap timeline, and a downloadable findings-list PDF (browser-printable, no backend).

A deliberate design choice: the pipeline emits a record for every observed asset, including healthy ones (severity: no_action). The dashboard filters by condition rather than the pipeline filtering before output. That keeps Marengo's false-positive surface visible instead of hidden, and it leaves the architecture compatible with full-inventory monitoring as a Workflow 03 extension.

How we built it

Two top-level units that talk only through static files on disk:

Python pipeline (pipeline/): seven stages: ingest, Marengo index, Marengo detect, clip extraction, Pegasus describe, severity + locate, exports. AWS Bedrock Runtime in us-east-1. Marengo uses the async pattern (start_async_invoke → poll → S3 embeddings); Pegasus uses the sync pattern (invoke_model) per clip. ffmpeg for clip extraction. A real DJI SRT parser at scripts/srt_to_csv.py so a judge from the drone industry could hand us a real export and our pipeline would process it.

Next.js dashboard (app/): App Router, TypeScript, react-leaflet, Tailwind. Reads app/public/data/*.json and app/public/clips/*.mp4 as static assets: no API layer between Python and the UI, no live AWS calls in the dashboard, no auth, no backend. Anyone can clone the repo and run npm run dev to see the canonical run without Bedrock access. A single POST /api/reanalyze route handler spawns the pipeline as a detached subprocess for pre-stage demo prep, and the dashboard polls a run_status.json file for progress.

Validation harness: temporal IoU matching with greedy descending pairing, both a strict-IoU read and a clip-normalized read (ground-truth windows shorter than 15 s are expanded around their midpoint to match the product's fixed clip length), full FP/FN attribution, severity calibration on matched pairs.

Severity rules: NERC FAC-003-4 MVCD numbers expressed as multiples of 4.3 ft (at 345 kV), insulator failure modes from industry references. Voltage-class agnostic by design, switching to 230 / 500 / 765 kV is a one-line config change.

Challenges we ran into

  • Pegasus drifted off the JSON format more than once. We landed on an explicit schema-with-example prompt plus a fallback parser that handles trailing commas and stray prose around the JSON.
  • Marengo recall was variable for vegetation. We addressed it by running multiple query phrasings per anomaly type and merging timestamps within a 10-second window. Vegetation F1 climbed from 0.15 (strict IoU) to 0.62 (clip-normalized) once we measured what the product actually ships.
  • YouTube footage strips drone telemetry, so spatial context had to come from somewhere. Rather than hand-wave, we picked a real Illinois transmission corridor, generated per-second telemetry in standard format along it, disclosed the simulation in the README and demo, and shipped a real DJI SRT parser so the production-compat claim is concrete.
  • Ground-truth windows were as short as 1–4 seconds against our fixed 15-second evidence clips, which made strict IoU ≥ 0.5 unreachable for short defects even with perfect localization. We added a clip-normalized companion metric (publicly documented, reproducible) and reported both numbers. The strict-IoU number stays in the report as the conservative boundary score.
  • react-leaflet doesn't play well with Next.js SSR. Standard fix: dynamic(() => import(...), { ssr: false }).
  • Pipeline ↔ dashboard contract drift is the kind of bug that hides until demo day. We wrote a TypeScript Finding interface in app/types/findings.ts so any shape mismatch surfaces at npm run dev rather than as a blank panel.

Accomplishments that we're proud of

  • End-to-end flow from raw 13:32 of 1080p footage to a georeferenced, severity-scored, evidence-clipped findings table, running in about 6 minutes cold-cache on Bedrock.
  • Two-pass video understanding (Marengo retrieves → Pegasus describes) wired against both Bedrock invocation patterns, with on-disk caches that make iteration on severity rules and exports complete in seconds.
  • A dashboard that doesn't need AWS credentials to run. Judges can clone the repo and see the canonical demo in one command.
  • A validation report that publishes the unflattering numbers, strict-IoU F1 of 0.17 alongside the clip-normalized 0.42, every FP and FN attributed, instead of cherry-picking one read.
  • An operational impact brief grounded in the challenge brief's own throughput, cost, and per-incident numbers. Not "10×" hand-waving.
  • A real DJI SRT parser, not a stub.

What we learned

  • Retrieve-then-describe beats describe-everything. Marengo's per-clip embedding catches motion and scene context that single-frame CV misses, and using Pegasus only on the candidate clips keeps cost and latency tractable.
  • Asset-centric beats anomaly-only. Recording healthy assets in the same schema as defects made the validation harness simpler, kept the false-positive surface visible, and left a clean substrate for full-inventory monitoring and SCADA correlation.
  • Static files are an underrated contract. No API layer between the pipeline and the dashboard meant no contract drift, no spinners on stage, and no "demo gods, please" before the live presentation.
  • Anchor severity in regulation, not heuristics. Tiering vegetation against MVCD multiples, not similarity scores, made every severity claim defensible to a domain expert.
  • Measure what the product actually ships. Strict IoU ≥ 0.5 against 1-second ground-truth windows wasn't the right read for a system whose output is fixed-length 15-second clips. Adding the clip-normalized metric and being transparent about both, was the honest answer.

What's next for GridSight

  • Workflow 03 maintenance correlation. Joining findings against a maintenance-history CSV by asset ID, with a composite risk score = visual_severity_weight × maintenance_recency_weight. The asset-centric data model already supports it.
  • Broad-version queries. Adding inventory queries to the active set turns the dashboard from "we found problems" into "we provide visibility into your full asset base." About 30–60 minutes of pipeline work plus a coverage-summary header in the dashboard.
  • Cross-run predictive maintenance. Hashing each finding's GPS to a stable asset identifier and comparing condition assessments across inspection dates to surface degradation progression.
  • Real drone footage with native telemetry. The DJI SRT parser already handles it; we'd love to point GridSight at an actual utility's raw inspection export.
  • Field-crew route optimization layered on the existing GeoJSON output, dispatching ground crews by severity-weighted urgency.

Built With

Share this project:

Updates