GridGuard - Smart Power Management System

System architecture
Circuit diagram

Inspiration

Every month, my family receives an electricity bill and nobody in the house has any real understanding of where the money is going. The bill says we used "X units" — but which appliance? Which time of day? Why was it so much higher last month? There is zero transparency.

Beyond billing confusion, our homes are protected only by passive circuit breakers that react after a fault has already caused damage. There is no early warning, no intelligence, and no data about what is actually happening inside our walls. In India, outdated electrical infrastructure combined with this complete lack of consumer visibility leads to preventable electrical fires, wasted energy, and needlessly high bills every single month.

I wanted to build a system that gives every household the power of a professional electrician and an AI energy auditor — for the cost of a few electronic components.

What it does

GridGuard is a full-stack, edge-to-cloud smart power management system that solves both the safety and transparency problems simultaneously.

Real-Time Edge Safety: An ESP32-S3 microcontroller samples live voltage and current 50 times per second. A custom TensorFlow Lite machine learning model runs inference directly on the chip to classify the power signature as normal, overloaded, or a spike. When a fault is detected, a 5V safety relay physically breaks the circuit in approximately 340 milliseconds — completely independent of internet connectivity. Your home is protected even if the cloud is down.

Live Cloud Dashboard: All telemetry is streamed via MQTT and processed by a Python Flask backend running on Google Compute Engine. A real-time glassmorphism web dashboard displays live voltage, current, real power consumption, a 7-day usage history chart, and a continuously updated estimated electricity bill in Indian Rupees (₹).

AI Energy Auditor (Gemini API): The most powerful feature. The backend compiles historical sensor data, fault logs, and usage statistics, and sends them to the Google Gemini Pro API with a carefully engineered prompt that forces it to act as a Senior Energy Analyst — not a generic chatbot. It produces a deeply detailed, statistically grounded plain-language energy report. A matplotlib chart is generated alongside this analysis, and the entire multi-modal report is autonomously delivered to the user's Telegram app.

AI Chat Assistant: An embedded floating AI chat window on the dashboard lets users ask any question about their own data, such as "why was my bill high this week?" and receive a precise, context-aware answer grounded in their actual readings from the SQLite database.

How we built it

Firmware (C++ / PlatformIO): The ESP32-S3 firmware handles ADC sampling of the ZMPT101B voltage sensor and ACS712 current sensor, computes RMS values, runs the TFLite model for fault classification, controls the relay, and publishes JSON telemetry to the HiveMQ MQTT broker every 2 seconds.

ML Pipeline (Python / Keras): Over 13,320 labeled data points were collected across three conditions: normal operation, graduated overload, and sudden spike. A two-layer neural network was trained using Keras, then converted to .tflite format and quantized to int8 to fit within the microcontroller's RAM. Final accuracy: 94.7%.

Backend (Python / Flask): The server subscribes to MQTT, saves each reading to SQLite with accurate energy calculations using time-delta integration, and exposes a REST API. The Gemini API integration uses structured prompt engineering to ground the AI's responses in real data rather than producing generic advice.

Cloud Infrastructure (GCP / Docker): The entire backend is containerized using Docker and Docker Compose with a persistent volume for the SQLite database. A custom gcp-deploy.sh bash script fully automates provisioning a Google Compute Engine VM, enabling firewall rules, installing Docker, cloning the repository, and starting the container — pure infrastructure-as-code from a single command.

Frontend (HTML / CSS / JS): Vanilla JavaScript with Chart.js renders the real-time power line graph and 7-day bar chart. The UI uses a custom glassmorphism dark-mode design system with CSS animations for a premium, modern feel.

Challenges we ran into

Running ML on bare metal: Getting TensorFlow Lite to run reliably on the ESP32-S3 required careful tensor arena sizing and quantization tuning. The model had to fit within ~300KB of available RAM while still achieving high fault detection accuracy.

Eliminating false positives: Early versions of the Edge AI were too aggressive and tripped the safety relay during harmless momentary voltage fluctuations (like a refrigerator compressor starting). Tuning the confidence threshold and adding a fault-confirmation window was critical to making the system actually usable.

Thread-safe real-time data: Combining Flask's request-handling threads with the MQTT client's background thread, all sharing the same Python global state dictionary, required careful use of threading locks to prevent race conditions on live data.

Accurate energy calculation from uneven samples: Computing kWh from power samples arriving at inconsistent intervals required time-delta integration rather than simple averaging, to avoid significant cumulative billing errors over hours of operation.

Docker Compose version compatibility: The Google Cloud VM had an older version of docker-compose (v1.29.2) installed which has a known ContainerConfig bug with newer Docker image formats. Resolving this on a remote headless server with limited debugging tools was a significant operational challenge.

Project Enhancements

Since the initial prototype, GridGuard was improved in multiple areas to make it more reliable, more practical to demonstrate, and easier to scale.

1. Smarter Edge Intelligence

Upgraded from simple threshold-only behavior to a hybrid safety approach: ML-based state classification plus hard overcurrent protection.
Added structured class mapping for operating states (IDLE, LEVEL_1, LEVEL_2, FAULT) to improve interpretability during live demos.
Improved edge deployment readiness by using a lightweight TFLite model suitable for microcontroller inference.

2. Better Data & Model Workflow

Added ML dataset recording and labeling flow to collect real telemetry for training.
Introduced data preprocessing and synthetic fault augmentation to improve model robustness for rare fault patterns.
Streamlined model export so retrained models can be redeployed to firmware with minimal manual work.

3. Stronger Safety & Fault Handling

Implemented automatic relay cutoff logic for high-risk conditions, reducing dependence on cloud latency.
Added clearer fault-state transitions and logging for post-event diagnosis.
Improved reliability of protection behavior through overload/spike simulation scenarios.

4. Cloud Analytics Improvements

Expanded backend analytics from live-only monitoring to include historical trend analysis and statistical summaries.
Added deeper AI-assisted analysis flows for consumption patterns, anomaly interpretation, and action-oriented recommendations.
Improved report quality with structured outputs for easier judge and user readability.

5. Reporting & Alerting Upgrades

Enhanced notification/report pipeline to support rich, multi-step delivery (chart + analysis text).
Improved clarity of system events by separating warning/error and info logs.
Added practical utilities for seeding, verification, and diagnostics to make demos repeatable and stable.

6. Demo Readiness & Presentation Quality

Refined dashboard UX for clearer real-time visibility of voltage/current/power/state.
Added scenario simulation tooling so core features can be demonstrated without full hardware dependency at all times.
Consolidated documentation and run instructions for faster onboarding and smoother judging flow.

Accomplishments that we're proud of

Successfully running a real-time neural network on a microcontroller for sub-350ms fault detection — a latency that no cloud-dependent system could match.
Building an end-to-end system that spans from low-level C++ firmware all the way to a deployed cloud AI API, entirely as a solo developer.
Achieving 94.7% fault detection accuracy on a model small enough to run on an embedded device.
Engineering Gemini API prompts that produce genuinely insightful, data-grounded energy analysis rather than generic AI responses — making the AI actually useful for consumers who know nothing about electricity.
Deploying the full backend with a single automated script to Google Cloud, with the database safely persisted via Docker volumes.

What we learned

How to design, train, quantize, and deploy a TensorFlow Lite model for real microcontroller hardware — the entire embedded ML pipeline from scratch.
How to architect a robust, production-grade event-driven IoT pipeline using MQTT, threading, and REST APIs without sacrificing reliability.
How to write effective structured prompts for large language models to generate specific, data-grounded outputs instead of hallucinated generic responses.
The operational realities of cloud deployment: containerization, persistent storage, firewall configuration, SSH key management, and debugging on remote headless Linux servers.

What's next for GridGuard - Smart Power Management System

Appliance-level identification: Train a more granular ML model to recognize the power signature of specific appliances (refrigerator, washing machine, AC unit) so the system can report not just that power was consumed, but exactly which device consumed it.
Multi-circuit monitoring: Scale beyond a single circuit to monitor every room in a home simultaneously from a unified dashboard.
Dynamic tariff integration: Connect to Indian state electricity board APIs to use real-time, slab-based tariff rates (the tiered pricing structure used in India) for hyper-accurate, bill-matching cost calculations.
Predictive maintenance alerts: Use historical fault pattern analysis to detect slow degradation in appliances before they fail catastrophically.
Native mobile app: Build an iOS and Android companion app with live push notifications for fault events and weekly AI-generated energy summaries.

Built With

c++
chart.js
compute-engine
docker
esp32-s3
flask
google-cloud
google-gemini-api
matplotlib
mqtt
python
sqlite
telegram-bot-api
tensorflow-lite

Submitted to

Created by

I mainly contributed to the machine learning workflow of this project. My role was to help build the dataset pipeline, prepare clean training samples from the recorded voltage/current/power readings, and organize labels for the operating states (IDLE, LEVEL_1, LEVEL_2, FAULT). I spent most of my time improving data quality and consistency, because we found early on that model behavior depends heavily on how well the real sensor data is cleaned and labeled.

I also worked on training and evaluating the classification model, testing different settings, and checking whether predictions were stable across normal and fault-like patterns. I supported the model conversion/deployment step so the final trained model could run efficiently on-device in a lightweight format. A key part of my work was helping ensure the model was not just accurate in training, but practical for edge inference with limited resources.

This project taught me a lot about real-world ML beyond theory: class imbalance, noisy inputs, and the tradeoff between model size and reliability. I’m proud that my contribution helped make the AI part usable in an actual embedded power-monitoring system.

Vainavi Arun Kumar
Mohammed Rayan

Updates

Mohammed Rayan started this project — Feb 28, 2026 04:59 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.