Spike Trace

Architecture Diagram

Inspiration

Traditional observability tools tell you what broke. SpikeTrace asks a harder question: what did it cost? Not just in dollars—but in carbon and customer trust.

SpikeTrace was born to connect SRE, Finance, and ESG into one investigative loop—a true Triple Bottom Line commander that treats a carbon spike with the same urgency as a revenue drop.

What it does

SpikeTrace acts as an AI-powered Commander that investigates every incident through three lenses: users, carbon, and revenue.

Imagine an e-commerce giant like Amazon: An incident isn’t just a 500 error. It’s:

A customer who couldn’t buy a gift
Revenue that never landed
Servers burning energy on pointless retries

SpikeTrace connects those dots instantly.

The SpikeTrace Ecosystem

Flowchart

At the center is the SpikeTrace Commander Agent — the brain that translates natural language into deep telemetry correlation.

1️⃣ Observability Correlation (Powered by ES|QL)

The agent queries across four dimensions:

User Impact → Failed checkouts, latency spikes, rising error rates
Environmental Impact → Wasted CO₂, excess runtime, inefficient retries
Deployments & Logs → Correlates failures to specific releases (spiketrace-deployments-*)
Business Impact → Orders affected and real revenue loss

It doesn’t just detect a spike — it explains it in business terms.

2️⃣ Autonomous Remediation

Once impact is quantified, SpikeTrace moves from insight to action:

Creates a ticket in Jira Service Management
Notifies the team instantly in Slack

Investigation → Decision → Execution. All in one conversational flow.

How we built it

Elastic Stack → Indexed synthetic logs and carbon metrics
Elastic Agent Builder → Created the “SpikeTrace Commander”
Gemini via Vertex AI → Advanced reasoning + multi-tool orchestration
Workflow Connectors → Chained Jira + Slack automation
Strands Framework → A2A bridge between local frontend and Elastic Agent
Kibana Dashboard → Visualized CO₂, latency, and emissions spikes in one pane

Challenges

The hardest part? Multi-dimensional correlation.

Mapping an UpstreamTimeout log to:

A specific carbon emission increase
A deployment version
A quantified revenue loss

All without LLM hallucinations.

We solved it with strict grounding, structured prompts, and precise ES|QL aggregation—forcing the agent to reason only from real telemetry.

What we’re proud of

🌱 Triple Bottom Line Logic — Carbon waste prioritized alongside revenue
⚡ Zero-Touch Remediation — Discovery to Jira ticket in one turn
🔗 Seamless A2A Integration — Frontend to Elastic Agent via Strands

What we learned

Sustainability isn’t separate from reliability. Sustainability is an observability problem.

When systems fail or retry endlessly, they don’t just hurt SLAs—they burn energy and revenue.

Building SpikeTrace proved that AI agents shouldn’t just explain incidents. They should quantify impact and trigger action.

What’s next

🔮 Predictive Remediation — Use Elastic ML to detect carbon spikes before they happen
☁️ Multi-Cloud Intelligence — Correlate incidents across AWS, GCP, and Azure

SpikeTrace is evolving from an incident investigator… into a proactive sustainability commander.

Built With

a2a
elastic
elastic-agent-builder
elasticsearch
kibana
python
strands-framework

Updates

Aishwarya Nathani started this project — Feb 27, 2026 11:14 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.