Inspiration

One of the biggest reasons we built PulseGrid AI was because we realized how little we actually understood about cloud infrastructure when we started. Before this project, even the word cloud felt abstract to us. We knew it mattered, but we had never really broken down what it meant operationally, how failures spread through it, or how companies actually respond when things start going wrong.

As we learned more, we noticed a gap: a lot of infrastructure tools are great at detecting that something is broken, but much weaker at helping people understand why it is happening, how it is spreading, and what should happen next. That became the core motivation behind PulseGrid AI.

We wanted to build something that could make infrastructure incidents more understandable, especially for people who are still learning, while still feeling useful and relevant in a real operational setting.

What it does

PulseGrid AI is an incident reasoning and failure-modeling platform for cloud infrastructure.

A user can enter a live incident description, a hypothetical weakness, a prebuilt scenario, or uploaded organizational context such as remediation plans or architecture notes. PulseGrid then transforms that input into a structured dashboard that helps teams understand what is happening, estimate how the issue may spread, and generate mitigation guidance for response and prevention.

At the center of the project is our seven-layer Failure Propagation Chain, which models incidents from upstream pressure and structural fragility all the way to telemetry, user-visible degradation, and business impact.

How we built it

We built PulseGrid as a lightweight full-stack application with a Python backend and a single-page frontend.

On the backend, we created the incident reasoning flow, structured signal extraction, scenario logic, blast-radius modeling, and mitigation generation. On the frontend, we built an interface that progressively constructs the dashboard as the user provides more information, so the analysis feels live instead of delayed.

A big part of this project was also the way we built it. This was one of our first times seriously experimenting with tools like Claude Code / Claude Cowork and ChatGPT Codex as development partners. That was a learning experience on its own. We were not just building PulseGrid — we were also learning how to collaborate with AI tools to move faster, debug faster, and think through architecture decisions in real time.

Challenges we ran into

One of our biggest challenges was that we were learning the space while building in it. We were trying to create a cloud incident reasoning system while still developing our own understanding of how cloud systems, dependencies, and failure propagation actually work. That meant a lot of reading, a lot of reframing, and a lot of moments where we had to slow down and make sure we actually understood the concepts we were trying to represent.

Another challenge was avoiding the trap of making the project feel like just another chatbot. We wanted PulseGrid to feel like an actual reasoning system with structure behind it, not just a generic prompt box with a fancy answer.

We also had to figure out how to balance deterministic logic with AI-assisted reasoning, how to make the UI feel live while the diagnosis was being built, and how to support both guided inputs and open-ended incident descriptions.

Accomplishments that we're proud of

We are proud that we took a topic we did not deeply understand at the beginning and turned it into something structured, visual, and interactive.

We are proud that PulseGrid goes beyond simple alerting and tries to present infrastructure failure as a connected system of causes, propagation, and impact. We are also proud of the seven-layer Failure Propagation Chain framework, the live dashboard experience, the blast-radius reasoning, and the ability to bring uploaded organizational context into the analysis.

On a personal level, we are also proud that this project pushed us to experiment with new tools, especially Claude and Codex, and taught us how AI can be used as a real collaborator during development rather than just a question-answering tool.

What we learned

We learned a lot about cloud infrastructure, incident response, dependency risk, and how technical failures can ripple into user and business impact.

We also learned that building trust in an AI-assisted product is hard. It is not enough for the system to sound intelligent — it has to reason in a way that feels disciplined, useful, and grounded.

Beyond the product itself, we learned how to work with AI coding tools in a more serious way. This project gave us hands-on experience using them not just for snippets, but for iteration, debugging, refactoring, and product thinking.

What's next for PulseGrid AI

Our next step is making PulseGrid more evidence-driven and context-aware.

We want to improve diagnosis generation so it comes more from first principles and less from rigid scenario buckets. We also want to improve how uploaded context is incorporated into wizard prefill and overall reasoning, strengthen the distinction between root conditions and downstream amplification, and make the mitigation guidance even more adaptive.

Long term, we want PulseGrid to help teams not only detect failure, but understand it, respond to it, and reduce the chance of repeat failure.

Built With

Share this project:

Updates