Aegis

Know the storm before your bits flip. Aegis is a system that turns live space weather and device telemetry into actionable radiation risk for edge compute—so teams can throttle writes, checkpoint work, and ride out solar events instead of discovering corruption after the fact.

We combine NOAA L1 feeds (solar wind, magnetometer, differential protons, X-rays), a LightGBM forecast trained on years of GOES-18 + ACE/DSCOVR history, and a transparent risk layer that maps environment + wear into severity tiers and recommended mitigations. A Next.js dashboard shows fleet health at a glance; device drill-downs surface live telemetry, factor breakdowns, and forecast curves. The dashboard polls an ESP32 gateway for live Geiger counts and per-chip NVS wear data, persisting snapshots into Postgres on every forecast cycle.

Scope: Aegis forecasts environmental radiation severity and operational risk—it does not claim to predict individual bit flips. That boundary keeps the science honest and the product defensible.


Why it matters

Solar storms and elevated particle flux raise the odds of silent data errors, especially on constrained flash and unshielded nodes. Operators rarely get a single “radiation dial” tied to now and the next few hours. Aegis closes that gap with:

  • Minute-resolved context from the same differential channels the model was trained on (not a mismatched integral feed).
  • Multi-horizon outlook (e.g. 2h curve from the Flask service, 6h/12h scalars from merged NOAA + optional hosted ML).
  • Risk + actions the UI can explain—tiers, factors, and copy-paste-style mitigations for demos.

Architecture (high level)

flowchart LR
  subgraph ingest [Ingest]
    NOAA[NOAA SWPC JSON]
    ESP[ESP32 gateway]
  end
  subgraph compute [Compute]
    Flask[Flask forecast :3002]
    Next[Next.js dashboard :3000]
  end
  subgraph optional [Optional]
    DBX[Databricks serving]
    Remote[REMOTE_FORECAST_URL]
  end
  NOAA --> Next
  Flask --> Next
  Remote --> Next
  DBX --> Next
  ESP --> Next
Piece Role
dashboard/ Next.js 15 app: fleet grid, device pages, GET /api/forecast, GET /api/risk, tRPC + Drizzle scaffold. Polls the ESP32 gateway on each forecast cycle to persist Geiger + NVS wear data.
forecast_service/ Flask POST /forecast — 120-step proton trajectory from the baseline model bundle in artifacts/.
training/ LightGBM baseline + onset-classifier scaffolding; Databricks copies under databricks/.
data/cleaned_data/ Pipelines for ml_training_data_v2 / v3 (Parquet); large artifacts via Git LFS.

Quick start

1. Clone with Git LFS

Large datasets (ml_training_data_v2.csv, Parquet) are stored with Git LFS. After clone:

git lfs install
git lfs pull

2. Install everything

From the repo root:

sh install-all.sh

This installs dashboard npm dependencies and Python deps for forecast_service.

3. Environment files

Create .env files from the examples (values are local-dev defaults; adjust as needed):

Package Copy from
Dashboard dashboard/.env.exampledashboard/.env

Dashboard essentials

  • DATABASE_URL — Postgres URL (Drizzle / legacy demos). The forecast routes can run without heavy DB usage, but the template expects this set.
  • FLASK_FORECAST_URL — Default http://localhost:3002 so the app can merge 2h model output with live NOAA.

Optional env (see dashboard/.env.example)

  • ESP32_GATEWAY_BASE_URL — URL of the ESP32 gateway; polled each forecast cycle for Geiger CPM and per-chip NVS wear.
  • REMOTE_FORECAST_URL — Hosted GET /api/forecast-compatible JSON for 6h/12h blocks.
  • DATABRICKS_FORECAST_URL + DATABRICKS_TOKEN — Wire in model serving that accepts dataframe_records.

4. Run the stack

One terminal (both services):

sh start-all.sh

This starts:

Press Ctrl+C to stop all.

Or run à la carte

# Terminal A — dashboard
cd dashboard && npm run dev

# Terminal B — forecast API
python -m forecast_service.app

How to use the app (demo flow)

Fleet overview

  1. Open http://localhost:3000.
  2. You’ll see a fleet grid of cards (demo nodes), an alert banner, and a top bar with sync hints.
  3. Each card shows risk tier, flip probability, wear, and service status (telemetry / forecast / wear).

Device detail

  1. Click any card (or go directly to /device/demo-node-01, demo-node-02, or demo-node-03).
  2. Left column: risk summary, live telemetry (radiation / magnetic), wear detail.
  3. Right column: live forecast and live risk panels (fed from the merged forecast + risk APIs when the backend is up), factor breakdown, forecast chart, risk history, recommended actions.
  4. Bottom: computation demo strip for the “workload under stress” narrative.

Note: Fleet and device body content still flows from dashboard/src/lib/mock-data.ts for fast UI iteration. Forecast/risk panels use the live /api/forecast and /api/risk integration path described in dashboard/README.md. Swap mock imports for tRPC/DB when you wire persistent devices.

API smoke tests (for judges / integration)

With the dashboard running:

More detail: dashboard/README.md, forecast_service/README.md.


npm from repo root

There is no single root node_modules; each package is separate. After install-all.sh:

Command What it runs
npm run dev Next.js dev (dashboard/)
npm run build Production build
npm test Dashboard Vitest
npm run lint ESLint (dashboard)

Data & ML (subsystem overview)

  • Raw sources (local, gitignored): ACE/DSCOVR solar wind, GOES-18 SGPS proton CSVs, GOES-18 XRS NetCDF—see CLAUDE.md for paths and the v3 feature table.
  • Canonical training file: data/cleaned_data/ml_training_data_v3.parquet (v2 + XRS derivatives).
  • Scripts: clean_dataset_v2.py, fetch_xray.py, merge_xray_v3.py, etc., with a suggested venv under data/.venv/ (pandas, numpy, pyarrow, xarray, netCDF4, tqdm).
  • Training: training/aegis_baseline.py — per-horizon regression on forward-max log10(J_gt_10MeV) with chronological splits (validation includes the May 2024 Gannon storm window).

Team

He ++ — Sean · Deep · Evin · Nico · Ethan

(Subsystem notes and API contracts live in claude/CLAUDE.md and CLAUDE.md.)


License / hackathon

Built for a hackathon demo: verify assumptions before any production deployment; space-weather products and model outputs are not a substitute for mission-critical radiation hardening analysis.

Share this project:

Updates