Inspiration

When we first saw the ReverieHacks Datathon challenge, we wanted to take on something that felt both technical and real. Electricity theft caught our eye because it is not just a data problem, it is a social and economic one. We looked through public repositories and eventually found the State Grid Corporation of China dataset. It had over forty-two thousand customers and about a thousand daily consumption columns each. It felt overwhelming at first, but that scale was exactly why it inspired us: solving theft detection on data this messy and this large could actually make a difference in practice.

What it does

Our project, PowerWatch, predicts which customers are most likely to be stealing electricity. It does more than throw out a label. The model outputs calibrated probabilities, sets cost-aware thresholds based on inspection economics, and adjusts group-specific thresholds to reduce fairness gaps across usage profiles. In short, the pipeline goes from raw smart-meter data to reliable, auditable decisions that could be deployed.

How we built it

We built everything step by step in Jupyter notebooks. The data was heavy, so we engineered features that made sense for zero-inflated time series: means, medians, standard deviations, monthly seasonality, weekday versus weekend behavior, last-30-day stats, and peak ratios. We compared logistic regression and XGBoost, then chose XGBoost because it dominated on precision-recall. After that, we added isotonic calibration to turn raw scores into probabilities. We tuned thresholds using a profit curve with \$300 recovery value and \$50 inspection cost. Finally, we audited fairness by consumption quartiles and introduced per-group thresholds to align true positive rates.

Challenges we ran into

We ran into many small but frustrating issues. At one point, we had missing artifacts like chosen_threshold.txt which broke fairness plots. Later, we hit errors with pd.qcut because arrays were not one-dimensional. We had to fix grouping logic carefully. Another challenge was space limits in Codespaces when trying to rebuild docker containers, which forced us to switch parts of the workflow into local notebooks. And throughout, we had to balance interpretability and fairness without tanking utility. Getting calibration and fairness to work together without breaking evaluation scripts was a genuine puzzle.

Accomplishments that we're proud of

We are proud that the final system does not just achieve good numbers, but shows a balanced story. On the held-out set, average precision is 0.3359, about four times random. ROC-AUC is 0.7772. Calibration improves Brier from 0.1134 to 0.0699, better than the prevalence baseline. With a cost-optimal threshold at 0.16, we reach precision 0.3015 and recall 0.4930. Most importantly, the Equal Opportunity fairness gap shrinks dramatically. TPR gap falls from 0.6075 to 0.0649, an 89% reduction, with only a small utility trade-off. That balance of performance, reliability, and fairness is something we are genuinely proud of.

What we learned

We learned how important it is to structure a project around both metrics and principles. Average precision and ROC curves matter, but so do calibrated probabilities and fairness audits. We also learned that debugging machine learning pipelines is as much about engineering discipline as it is about modeling. Small errors like array shapes or missing files can derail progress if not caught early. On the positive side, we learned that fairness post-processing is possible without destroying business value, as long as thresholds are chosen carefully.

What's next

This project showed us the potential of fairness-aware AI for infrastructure. Next steps would include drift monitoring to catch changes in theft behavior over time, active learning loops to incorporate inspection feedback, and integration with real-time dashboards so operators can see probabilities, costs, and fairness impacts in one place. There is also room to explore richer features, such as tariff type or outage data, to push detection further. We think PowerWatch could evolve into a deployable platform for utilities.

Built With

Share this project:

Updates