SmartWatt: Adaptive Load Forecasting for India

Project Story — About the Project

Inspiration

In our department building, HVAC, lighting, and lab machines often stay on after hours. Cutting power by floor reduces waste but breaks experiments, overnight computations, and early classes. That small pain reflects a national need: reduce building energy use without blunt shutdowns. This motivated SmartWatt—an AI system that forecasts short-term load and triages anomalies that operators can trust.

SmartWatt — Few‑shot Load Forecasting & Calibrated Triage

What it does

SmartWatt delivers short‑term (24‑hour) building‑load forecasts from 168 hours of context, with few‑shot transfer on Indian buildings.
Uses pretrained IBM Granite Time Series – TinyTimeMixer (TTM) backbones (both r1 and r2 checkpoints), fine‑tuned on your windowed data.
Produces robust submissions via seed ensembling inside each family and cross‑family blending (R1+R2) for better generalization to the hidden leaderboard.
Keeps the model simple & stable for hackathon runtime: encoder frozen, decoder+head trained, RPT disabled, batch=32 (GA=2).

Approach

Data → Features → Windows

Construct 192‑step windows per building/region with roles (input: 168 / target: 24).
Managed the missing values in metadata by XGboost.
Add calendar features: hour, dayofweek, month, hour_sin, hour_cos.
Created around 25 features by feature engineering. Engineered featured include isweekend, area per *person, people per sqft ,fans per room, deviation from region area, is area missing, is inverter missing,etc.
One‑hot region using train‑only categories (prevents leakage); median‑impute numeric controls.
Enforce float32 everywhere to avoid object → torch dtype issues.

Preprocessing

Use TimeSeriesPreprocessor to format series; rely on TTM’s built‑in normalization (no manual global scaling).
Collator pads/crops to checkpoint CL = 512, pads masks; RPT kept off after ablation.

Modeling

Load TinyTimeMixerForPrediction from ibm-granite/granite-timeseries-ttm-r1 and …-r2.
Align input channels only; prune head from checkpoint FL → 24 without touching backbone geometry.

Two‑phase fine‑tuning

Phase‑1 (head‑only): lr_head = 1e-3, ~8 epochs, early‑stop on val MSE.
Phase‑2 (decoder+head): lr_head = 8e-4 (also tried 6e-4), lr_dec = 2e-4 (also 1.5e-4), 20–30 epochs, cosine + 7–10% warmup, head dropout = 0.2.
Encoder frozen (optional “micro‑unfreeze” A/B for 3–5 epochs at very low LR).

Validation & Ensembling

Save per‑seed validation predictions and test submissions.
Convex MSE blending across seeds within each family (weights on simplex; weak seeds get down‑weighted automatically).
R1↔R2 family blend via convex MSE on validation, then apply the learned weights to test.
We evaluated horizon/region calibration; it helped NLL offline but did not improve MSE LB, so the final pipeline is MSE‑only blending.

Inference

CL‑aware pad/crop; predict; inverse‑scale via preprocessor stats; clip ≥ 0;

Challenges

Per‑window vs global scaling: solved by using TTM’s internal scaler + strict dtype control.
Shape & context mismatches: handled with left‑pad to 512 and observed‑masking in the collator.
Seed stability & leakage: region dummies fit on train only; early stopping on eval MSE.
LB sensitivity: NLL‑oriented calibration hurt MSE on LB → removed.

Expected Impact

Operational: tighter day‑ahead forecasts for chiller scheduling/shiftable loads; fewer false alerts.
₹/CO₂: accuracy translates into tariff‑weighted scheduling and avoided peaks—foundation for ₹/day and kg‑CO₂/day reporting.
Scalable rollout: small trainable head/decoder; fast few‑shot onboarding across buildings/regions.

What we learned / Novelty

Seed + family blending matters: R1 and R2 learn complementary errors; convex blending consistently beats either family alone.
RPT off > on for this dataset/splits.
Strict preprocessing hygiene (float32, no object dtypes, leak‑safe region dummies) prevents silent degradations.
Simple beats clever on MSE LB: per‑horizon/region calibration that helps NLL didn’t help MSE → removed.

What’s next

Depth of ensemble (low‑risk, likely lift): expand to 5 seeds per family (e.g., R1: 40/42/50/17/73; R2: 40/42/44/19/79), convex‑blend per family → blend families; also vary data_seed to decorrelate loaders.
Targeted micro‑unfreeze (optional A/B): unfreeze last encoder block for 3–5 epochs with tiny LR (1e-5 → 5e-5) and ES patience = 2–3; stop if no gain.
Residual stacking (low‑medium risk): train a tiny residual regressor on validation residuals using exogenous features (hour/region) and apply to test. Freeze if public LB drops.
Model diversity: add PatchTSMixer (from tsfm_public) as a third family for a 3‑way blend (R1/R2/PTM).
Repro pack: one‑click script to train seeds → save val/test → run convex blends → emit final CSV.