RegWatch - Multi-Agent Compliance Detection

RegWatch: When AI Agents Argue, Compliance Gets Better

The "Oh Crap" Moment That Started It All

Picture this: A fintech company gets slapped with a $2M fine because someone missed a single paragraph in a 47-page Basel III update. Their compliance officer spent 4 hours every week manually checking regulatory sites, cross-referencing internal systems, and still... they missed it.

That's when we thought: What if we had AI agents that literally argue with each other about compliance?

The Problem Nobody Talks About

Compliance monitoring is boring, manual, and terrifying. Every week, companies need to:

Check 50+ regulatory sources (GDPR, PCI-DSS, Basel III, SOX, you name it)
Figure out which internal systems are affected
Assign work to engineering teams
Document everything for auditors

Time spent: 4 hours per framework. Margin for error: Zero. Consequences of missing something: Career-ending.

And here's the kicker—traditional automation doesn't work because regulations are written in legalese, and mapping them to specific infrastructure components requires human judgment. Or does it?

Our Idea: Make Disagreement a Feature

Most AI systems give you one answer. We built a system where two AI agents deliberately challenge each other.

Here's how it works:

Detection Agent scans new regulations and says: "Hey, this Basel III update about capital requirements affects 7 of our components."
Reviewer Agent goes full skeptic mode: "Hold up. Let me re-evaluate each one. You said Trade Surveillance is affected with 68% confidence, but that system monitors market abuse, not capital ratios. That's a false positive. I'm only 25% confident. REJECTED."
The disagreement gets logged. When agents differ by more than 15%, humans step in. Otherwise, approved findings go straight to Jira.

The magic? That 43% disagreement on Trade Surveillance prevented an engineering team from wasting 16 hours on unnecessary work.

What We Built

We used the Elastic Agent Builder Hackathon (Jan 22 - Feb 27, 2026) as our deadline and built:

The Data Foundation

17 real regulations indexed in Elasticsearch (GDPR articles, PCI-DSS requirements, Basel III circulars)
32 product components with metadata (owners, tech stack, compliance tags)
Semantic embeddings (384 dimensions) for smart matching
All running on Elastic Cloud Serverless (zero cost!)

The Agents (Built in Agent Builder)

Detection Agent:

Queries regulatory_circulars index using ES|QL
Finds regulations published in the last 7 days
Searches product_configs for semantic matches
Calculates confidence scores: ES relevance (40%) + framework tags (35%) + category overlap (25%)
Returns findings with confidence ≥ 0.50

Reviewer Agent:

Takes Detection Agent's findings
Re-evaluates INDEPENDENTLY (doesn't trust the first answer)
Uses stricter scoring rules:
- Component only logs but doesn't process? -0.20
- Framework tag present but doesn't handle specific requirement? -0.30
- Regulation mentions action component doesn't perform? -0.40
Calculates delta: |Reviewer Score - Detection Score|
Decision logic:
- Delta < 15% AND reviewer ≥ 0.70 → APPROVED
- Delta ≥ 15% → ESCALATED (human review)
- Reviewer < 0.50 → REJECTED (false positive)

The Workflow (Elastic Workflows - Tech Preview)

We orchestrated both agents using the brand-new Elastic Workflows feature (literally released during the hackathon!):

steps:
  - name: run_detection_agent
    type: ai.agent
    with:
      agent_id: detection-agent
      message: "Find regulations from last 7 days..."

  - name: run_reviewer_agent
    type: ai.agent
    with:
      agent_id: reviewer-agent
      message: "Review these findings: {{ steps.run_detection_agent.output }}"

Agents automatically pass data to each other. No manual intervention.

The Automation Scripts

Data ingestion: Generates realistic regulations every 4 hours
Notifications: Sends email/Slack alerts to compliance officers every 5 minutes
Webhook server: FastAPI endpoint receives compliance alerts

Total workflow time: Detection → Review → Notification = 8 minutes

The "Holy Sh*t" Moment

We were testing the system with a Basel III regulation about minimum capital requirements. Detection Agent flagged "Trade Surveillance and Market Abuse Monitoring" with 68% confidence.

Reviewer Agent came back with: 0.25 confidence

Delta: 43% (way above our 15% threshold)

Reasoning: "Basel III PILLAR1-2025-01 is about minimum capital requirements. It mandates banks maintain specific capital ratios. Trade Surveillance monitors trading activity for market abuse (insider dealing, manipulation). The disconnect: Capital requirements ≠ Market abuse monitoring."

We just prevented a false positive. No manual review. No wasted engineering effort. Just two AI agents doing what they're built to do—argue until they find the truth.

What We Learned (The Hard Way)

1. Elastic Workflows Literally Dropped Mid-Hackathon

Workflows went GA on January 22, 2026—day 1 of the hackathon. Documentation was sparse. We had to reverse-engineer YAML syntax from example workflows on GitHub. The step type agent-builder.chat didn't work. We tried ai-assistant, agent.chat, then finally found ai.agent in a blog post. Worth it.

2. Agents Hallucinate When They're Too Confident

Our first Detection Agent had 0.30 as the minimum confidence threshold. It found "matches" everywhere. We raised it to 0.50, and false positives dropped 60%. Lesson: Make your agents earn their confidence.

3. Disagreement Resolution Is the Killer Feature

Initially, we thought the innovation was "automated compliance monitoring." Nope. Every demo we showed, people lit up when they saw the 43% disagreement. That's when we realized: autonomous validation is more valuable than autonomous detection.

4. ES|QL Is Ridiculously Powerful

Being able to query Elasticsearch with natural SQL-like syntax inside agents? Game-changer. Detection Agent uses this to find recent regulations:

FROM regulatory_circulars 
| WHERE published_date > NOW() - 7 days
| KEEP regulation_id, title, framework, severity

No complex query DSL. Just clean, readable queries.

Challenges We Faced

Challenge 1: Agent Builder in Serverless vs. Hosted We initially used a hosted deployment. Agent Builder wasn't visible. Spent 2 hours debugging. Turns out, Agent Builder is only in Serverless projects during tech preview. Switched deployment. Problem solved.

Challenge 2: Passing Data Between Agents in Workflows Agents return complex JSON objects. Getting Reviewer Agent to read Detection Agent's output was tricky. The template syntax {{ steps.detection.output }} returned [object Object]. We tried {{ steps.detection.output.message | json }} with mixed results. Final solution: simplified the Detection Agent's output format.

Challenge 3: Realistic Test Data We couldn't use real company data (privacy issues). So we built a synthetic data generator that creates:

Realistic regulation text with frameworks, severity, effective dates
Product components with tech stacks, owners, compliance tags
Semantic embeddings that actually make sense

Took a full day. Worth it for the demo.

Challenge 4: Making Disagreement Visible Users need to SEE the disagreement. We built:

A visual diff showing both confidence scores side-by-side
Delta percentage calculation
Color coding (green = approved, yellow = escalated, red = rejected)
Reasoning explanations in plain English

This turned abstract AI behavior into something compliance officers could trust.

The Tech Stack

Elasticsearch 9.3.0 (Serverless) - Data storage, semantic search
Elastic Agent Builder (GA) - Agent creation and management
Elastic Workflows (Tech Preview) - Agent orchestration
ES|QL - Query language for agents
Python - Automation scripts (data ingestion, notifications)
FastAPI - Webhook server
SentenceTransformers - Generating 384-dim embeddings
Cost: $0 (Elastic Cloud trial + serverless)

The Numbers That Matter

Metric	Before	After	Improvement
Time per framework	4 hours	8 minutes	97% faster
False positives caught	Manual QA	Automatic	43% delta example
Engineering hours saved	N/A	16 hrs/week	Per rejected finding
Regulatory sources checked	3-5	50+	10x coverage
Human involvement	100%	~15% (escalations only)	85% reduction

What's Next

If we had more time (or funding), here's what we'd build:

AWS Resource Mapping: Turn "update authentication component" into "enable MFA on arn:aws:iam::123456789:role/auth-service"
Jira Integration: Approved findings automatically create tickets with:
- Component owner assigned
- Severity-based priority
- Link to regulation source
- Estimated effort based on historical data
Feedback Loop: When humans override agent decisions, feed that back as training data to improve confidence scoring
Multi-Framework Correlation: "This GDPR update + that PCI-DSS change = you need to update TWO systems, not one"
Regulatory Change Prediction: "Basel IV is in draft. Here's what's likely to be affected when it passes."

Why This Matters

Compliance isn't sexy. But it's the difference between a thriving fintech and a bankrupt one. Every week, companies pay millions in fines for missing regulatory updates that were publicly available.

RegWatch doesn't just automate compliance monitoring—it makes disagreement systematic, trackable, and valuable.

When two AI agents argue about whether a component is affected, and one catches what the other missed, that's not a bug. That's the whole point.

Try It Yourself

GitHub:
Demo Video:
Elastic Cloud: You can replicate this with a free trial

Built with coffee, determination, and the Elastic Agent Builder Hackathon deadline looming. 🚀

Team: Xyphor Hackathon: Elastic Agent Builder (Jan 22 - Feb 27, 2026)
Built with: Agent Builder (GA), Elastic Workflows (Tech Preview), ES|QL, Python

Built With

elastic-agent-builder
elastic-cloud-serverless
elastic-workflows
elasticsearch-9.3.0
es|ql
fastapi
huggingface-transformers
json
kibana
markdown
microsoft-teams-api
n8n-(workflow-automation)
pydantic
python-3.11
requests
rest-apis
sentencetransformers
slack-api
smtp
webhooks
yaml

Updates

- Xyphor started this project — Feb 27, 2026 03:54 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.