Inspiration
Every finance and ops team we've talked to has the same frustrating problem: their monitoring tools are great at detecting issues but terrible at learning from mistakes. A Datadog alert fires, someone triages it, marks it as a false positive—and next week, the exact same alert fires again. Nothing was learned. Nothing improved. We wanted to build an agent that actually closes this loop—one that remembers what you told it and visibly changes its behavior the next time it runs.
What it does
OpsIQ is a self-improving multi-agent system for operational intelligence. It autonomously ingests signals from Datadog and Lightdash, detects billing anomalies (duplicate refunds, underbilling, refund spikes, suspicious credits), and reasons about them using Groq's LLM at every decision point. It takes governed actions through Airia pipelines—creating remediation cases, dispatching alerts, and routing approval tasks—while scoring risk with Modulate sentiment analysis.
The real magic is the self-improvement loop. When you mark a case as a false positive, the Memory Agent reasons about what went wrong and adjusts specific detection thresholds. Rerun triage and the output is measurably different: confidence scores drop, impact estimates shrink, and the LLM explains exactly why. This isn't a mockup — it's a working closed-loop system where every piece of feedback makes the agent smarter.
How we built it
We designed a 6-agent architecture on FastAPI + Streamlit: a Monitor Agent for signal ingestion, a Triage Agent for anomaly detection and scoring, an Orchestrator that uses Groq LLM at 5 reasoning points, a Memory Agent for feedback-driven learning, an Evaluator Agent for quality scoring, and an Analyst Agent for natural language Q&A with charts. Data flows through DuckDB for analytics and SQLite for persistent state. All 4 sponsor tools (Datadog, Lightdash, Airia, Modulate) are integrated as adapters with graceful mock/real mode fallback. We wrote 156 tests covering every layer of the stack.
Challenges we ran into
The hardest part was making the self-improvement loop actually work end-to-end. It's easy to claim "the system learns from feedback" — it's much harder to make the LLM's reasoning translate into concrete threshold changes that produce measurably different output on rerun. We had to carefully design bounded parameter adjustments (false positive penalty capped at 50%, detection windows narrowing by 0.5h per feedback) so the system improves conservatively without overcorrecting.
Getting 4 sponsor APIs to work together in a single autonomous pipeline was also tricky — each has different auth patterns, response formats, and failure modes. Building the graceful degradation system was essential for keeping the demo reliable regardless of which API keys are available.
Accomplishments that we're proud of
The self-improvement loop is real and visible — not a mockup. You can mark a case, rerun triage, and watch the numbers change. LLM reasoning traces are fully transparent — you can read exactly how the AI thinks at every step. We achieved 156 passing tests with zero failures, and all 4 sponsor integrations work together in a single autonomous pipeline rather than being bolted on separately. The 3-minute demo flow tells a complete story from signal detection to self-improvement.
What we learned
Building a truly self-improving agent requires much more than just calling an LLM. The key insight was that the improvement loop needs to be bounded and conservative — small threshold adjustments beat large ones, and every change needs a human-readable explanation. An AI that can explain why it changed its behavior is far more trustworthy than one that silently gets better. We also learned that LLM reasoning is most valuable not for generating text, but for making decisions — analyzing signals, deciding priorities, choosing actions.
What's next for OpsIQ: Self-Improving Operational Intelligence Agent
We want to add real-time signal streaming via WebSockets, ML-based anomaly detection alongside our rule-based detectors, multi-tenant support with per-organization memory and thresholds, and integrations with more data sources like Stripe, Salesforce, and Snowflake. The long-term vision is an ops agent that doesn't just respond to problems — it anticipates them, learns from every interaction, and gets measurably better every week.
Log in or sign up for Devpost to join the conversation.