The Arm AI Optimization Challenge Physical AI track asked a question that mapped directly to this problem: how do we run AI inference at the edge — on Arm — where real-world systems can't wait for the cloud?
I built GlacierEdge-Arm to answer that for one of the most under-served edge domains: battery energy storage system (BESS) monitoring in data center backup power. The name ties to my earlier energy-markets work (scope-glacier) and the idea of cold, fast, edge-native intelligence sitting close to the hardware.
What it does
GlacierEdge-Arm ingests cell-level telemetry — voltage, current, temperature, and impedance — from a BESS rack module and classifies it into one of five states: | Class | Edge action | |-------|-------------| | Normal | Monitor | | Thermal runaway | Trip breaker / alert NOC | | Cell imbalance | Isolate module | | Impedance fault | Isolate module | | Voltage sag | Derate load | A sliding window of readings is compressed into an 8-dimensional feature vector: [ \mathbf{f} = [\bar{V},\ \bar{T},\ \bar{Z},\ T_{\max},\ V_{\min},\ \dot{V},\ \dot{T},\ \Delta V] ] The stack runs two inference paths optimized for Arm deployment:
- FP32 kernel — full-precision 8→16→5 MLP matmul (ONNX-equivalent structure)
- INT8 kernel — quantized weights with on-the-fly dequantization, reducing model footprint by 3.14× (916 → 292 bytes)
Every classification is logged to a CHP-governed audit ledger with content hashing and optional HMAC signing — so edge decisions in safety-critical power systems leave a verifiable trail.
On Apple Silicon (aarch64), end-to-end fault detection runs in ~68 µs with 99.8% accuracy and 0% false positives on normal operation across 500 synthetic evaluation windows.
## How we built it
The project is a Rust Cargo workspace with five crates:
| Crate | Role |
|-------|------|
|
gea-core| Domain types,FeatureWindow, fault enums | |gea-telemetry| Synthetic BESS generator with 5 labeled fault scenarios | |gea-inference| FP32/INT8 kernels, benchmark harness, evaluation | |gea-governance| CHP policy loader + signed JSONL audit ledger | |gea-cli|geabinary — demo, evaluate, bench, infer, audit | Build flow: - Telemetry first — modeled cell/rack physics inspired by open-source battery-domain work (
battery-erpschemas), generating reproducible fault windows without physical hardware. - Feature extraction — sliding-window statistics and derivatives that are cheap on embedded Arm cores.
- Dual kernels — embedded MLP weights for benchmarkable FP32 vs INT8 matmul; rule-calibrated logits for reliable edge classification without cloud dependency.
- Governance layer — adapted patterns from my Consensus Hardening Protocol work: every inference produces a signed audit event.
- Proof artifacts —
gea benchfor Arm optimization metrics,gea evaluatefor accuracy, and a VHS-recorded terminal demo (assets/demo.mp4). An optional Python script (tools/train_model.py) exports the same MLP architecture to ONNX for teams that want to deploy through ONNX Runtime on Arm. ## Challenges we ran into Borderline fault signatures. Early versions achieved 96% accuracy butvoltage_sagrecall stalled at 81%. Voltage sag and cell imbalance share overlapping voltage-spread patterns. Fixing it required tightening the telemetry generator and reordering classification rules so sag (steep (\dot{V}), low (V_{\min}), flat impedance) doesn't lose to imbalance heuristics. Sub-microsecond benchmarks on fast Arm cores. On Apple Silicon in release mode, both FP32 and INT8 kernels complete in fractions of a microsecond — making latency speedup a noisy metric. We reframed the optimization story around model size reduction (3.14×), which is the meaningful edge constraint for flash-constrained BMS controllers, and documented Arm Performix benchmarking on Graviton/Pi 5 as the next step. Safety-critical auditability. Edge AI in power infrastructure can't be a black box. Integrating CHP-style governance without bloating latency meant keeping the audit path append-only and async-friendly — hash the decision payload, sign it, write one JSONL line, move on. Demo without hardware. Physical AI submissions expect real edge behavior, but not every hacker has a BESS rack in the garage. The synthetic telemetry generator had to produce physically plausible signals so judges can reproduce every demo command from a cleangit clone. ## Accomplishments that we're proud of - 99.8% fault classification accuracy with 0% false positive rate on normal operation — across 500 synthetic windows covering all five classes.
- 100% recall on voltage_sag after targeted rule and telemetry tuning.
- 3.14× INT8 model size reduction with a reproducible
gea benchharness judges can run in one command. - ~68 µs inference latency on aarch64 with actionable outputs (severity, recommended action, audit event ID).
- Fully open source under cubiczan — MIT licensed, documented, CI-tested, with thumbnail and demo video in-repo.
- Composable architecture — crates are reusable for anyone building edge energy ML on Arm. ## What we learned Arm's Physical AI thesis — cloud trains, edge infers, actuators act — only works if the edge path is small, fast, and auditable. A 3 MB cloud model is useless on a BMS microcontroller; a 292-byte INT8 kernel is not. I also learned that optimization metrics must match the deployment target. On a MacBook M-series chip, latency differences vanish; on a Pi 5 or Cortex-A gateway, model size and memory bandwidth dominate. Designing benchmarks that tell the right story for judges — and for production — matters as much as the model itself. Finally, borrowing governance patterns from multi-agent AI work (CHP audit ledgers) into embedded inference was surprisingly natural: the edge agent is just another decision-maker that needs a tamper-evident trail. ## What's next for GlacierEdge-Arm
- Arm Performix profiling on AWS Graviton and Raspberry Pi 5 — publish comparative latency and power numbers for the Devpost follow-up and Arm Community blog.
- ONNX Runtime integration — load
models/bess_fault.onnxthroughortwith Arm Compute Library backends. - Live BMS ingest — CAN bus / Modbus adapter crate feeding real rack telemetry instead of synthetic windows.
- Fleet mode — rack-level aggregation with
scope-glacier-style grid price context for smart derate decisions during peak AI load. - Formal CHP adversarial review — third-party validation gate before any inference-driven breaker trip in production pilots.

Log in or sign up for Devpost to join the conversation.