RegWatch - Multi-Agent Compliance Detection

RegWatch: When AI Agents Argue, Compliance Gets Better

Elastic Cloud Agent Builder Workflows

The "Oh Crap" Moment That Started It All

Picture this: A fintech company gets slapped with a $2M fine because someone missed a single paragraph in a 47-page Basel III update. Their compliance officer spent 4 hours every week manually checking regulatory sites, cross-referencing internal systems, and still... they missed it.

That's when we thought: What if we had AI agents that literally argue with each other about compliance?

The Problem Nobody Talks About

Compliance monitoring is boring, manual, and terrifying. Every week, companies need to:

  • Check 50+ regulatory sources (GDPR, PCI-DSS, Basel III, SOX, you name it)
  • Figure out which internal systems are affected
  • Assign work to engineering teams
  • Document everything for auditors

Time spent: 4 hours per framework. Margin for error: Zero. Consequences of missing something: Career-ending.

And here's the kicker—traditional automation doesn't work because regulations are written in legalese, and mapping them to specific infrastructure components requires human judgment. Or does it?

Our Idea: Make Disagreement a Feature

Most AI systems give you one answer. We built a system where two AI agents deliberately challenge each other.

Here's how it works:

  1. Detection Agent scans new regulations and says: "Hey, this Basel III update about capital requirements affects 7 of our components."

  2. Reviewer Agent goes full skeptic mode: "Hold up. Let me re-evaluate each one. You said Trade Surveillance is affected with 68% confidence, but that system monitors market abuse, not capital ratios. That's a false positive. I'm only 25% confident. REJECTED."

  3. The disagreement gets logged. When agents differ by more than 15%, humans step in. Otherwise, approved findings go straight to Jira.

The magic? That 43% disagreement on Trade Surveillance prevented an engineering team from wasting 16 hours on unnecessary work.

What We Built

We used the Elastic Agent Builder Hackathon (Jan 22 - Feb 27, 2026) as our deadline and built:

The Data Foundation

  • 17 real regulations indexed in Elasticsearch (GDPR articles, PCI-DSS requirements, Basel III circulars)
  • 32 product components with metadata (owners, tech stack, compliance tags)
  • Semantic embeddings (384 dimensions) for smart matching
  • All running on Elastic Cloud Serverless (zero cost!)

The Agents (Built in Agent Builder)

Detection Agent:

  • Queries regulatory_circulars index using ES|QL
  • Finds regulations published in the last 7 days
  • Searches product_configs for semantic matches
  • Calculates confidence scores: ES relevance (40%) + framework tags (35%) + category overlap (25%)
  • Returns findings with confidence ≥ 0.50

Reviewer Agent:

  • Takes Detection Agent's findings
  • Re-evaluates INDEPENDENTLY (doesn't trust the first answer)
  • Uses stricter scoring rules:
    • Component only logs but doesn't process? -0.20
    • Framework tag present but doesn't handle specific requirement? -0.30
    • Regulation mentions action component doesn't perform? -0.40
  • Calculates delta: |Reviewer Score - Detection Score|
  • Decision logic:
    • Delta < 15% AND reviewer ≥ 0.70 → APPROVED
    • Delta ≥ 15% → ESCALATED (human review)
    • Reviewer < 0.50 → REJECTED (false positive)

The Workflow (Elastic Workflows - Tech Preview)

We orchestrated both agents using the brand-new Elastic Workflows feature (literally released during the hackathon!):

steps:
  - name: run_detection_agent
    type: ai.agent
    with:
      agent_id: detection-agent
      message: "Find regulations from last 7 days..."

  - name: run_reviewer_agent
    type: ai.agent
    with:
      agent_id: reviewer-agent
      message: "Review these findings: {{ steps.run_detection_agent.output }}"

Agents automatically pass data to each other. No manual intervention.

The Automation Scripts

  • Data ingestion: Generates realistic regulations every 4 hours
  • Notifications: Sends email/Slack alerts to compliance officers every 5 minutes
  • Webhook server: FastAPI endpoint receives compliance alerts

Total workflow time: Detection → Review → Notification = 8 minutes

The "Holy Sh*t" Moment

We were testing the system with a Basel III regulation about minimum capital requirements. Detection Agent flagged "Trade Surveillance and Market Abuse Monitoring" with 68% confidence.

Reviewer Agent came back with: 0.25 confidence

Delta: 43% (way above our 15% threshold)

Reasoning: "Basel III PILLAR1-2025-01 is about minimum capital requirements. It mandates banks maintain specific capital ratios. Trade Surveillance monitors trading activity for market abuse (insider dealing, manipulation). The disconnect: Capital requirements ≠ Market abuse monitoring."

We just prevented a false positive. No manual review. No wasted engineering effort. Just two AI agents doing what they're built to do—argue until they find the truth.

What We Learned (The Hard Way)

1. Elastic Workflows Literally Dropped Mid-Hackathon

Workflows went GA on January 22, 2026—day 1 of the hackathon. Documentation was sparse. We had to reverse-engineer YAML syntax from example workflows on GitHub. The step type agent-builder.chat didn't work. We tried ai-assistant, agent.chat, then finally found ai.agent in a blog post. Worth it.

2. Agents Hallucinate When They're Too Confident

Our first Detection Agent had 0.30 as the minimum confidence threshold. It found "matches" everywhere. We raised it to 0.50, and false positives dropped 60%. Lesson: Make your agents earn their confidence.

3. Disagreement Resolution Is the Killer Feature

Initially, we thought the innovation was "automated compliance monitoring." Nope. Every demo we showed, people lit up when they saw the 43% disagreement. That's when we realized: autonomous validation is more valuable than autonomous detection.

4. ES|QL Is Ridiculously Powerful

Being able to query Elasticsearch with natural SQL-like syntax inside agents? Game-changer. Detection Agent uses this to find recent regulations:

FROM regulatory_circulars 
| WHERE published_date > NOW() - 7 days
| KEEP regulation_id, title, framework, severity

No complex query DSL. Just clean, readable queries.

Challenges We Faced

Challenge 1: Agent Builder in Serverless vs. Hosted We initially used a hosted deployment. Agent Builder wasn't visible. Spent 2 hours debugging. Turns out, Agent Builder is only in Serverless projects during tech preview. Switched deployment. Problem solved.

Challenge 2: Passing Data Between Agents in Workflows Agents return complex JSON objects. Getting Reviewer Agent to read Detection Agent's output was tricky. The template syntax {{ steps.detection.output }} returned [object Object]. We tried {{ steps.detection.output.message | json }} with mixed results. Final solution: simplified the Detection Agent's output format.

Challenge 3: Realistic Test Data We couldn't use real company data (privacy issues). So we built a synthetic data generator that creates:

  • Realistic regulation text with frameworks, severity, effective dates
  • Product components with tech stacks, owners, compliance tags
  • Semantic embeddings that actually make sense

Took a full day. Worth it for the demo.

Challenge 4: Making Disagreement Visible Users need to SEE the disagreement. We built:

  • A visual diff showing both confidence scores side-by-side
  • Delta percentage calculation
  • Color coding (green = approved, yellow = escalated, red = rejected)
  • Reasoning explanations in plain English

This turned abstract AI behavior into something compliance officers could trust.

The Tech Stack

  • Elasticsearch 9.3.0 (Serverless) - Data storage, semantic search
  • Elastic Agent Builder (GA) - Agent creation and management
  • Elastic Workflows (Tech Preview) - Agent orchestration
  • ES|QL - Query language for agents
  • Python - Automation scripts (data ingestion, notifications)
  • FastAPI - Webhook server
  • SentenceTransformers - Generating 384-dim embeddings
  • Cost: $0 (Elastic Cloud trial + serverless)

The Numbers That Matter

Metric Before After Improvement
Time per framework 4 hours 8 minutes 97% faster
False positives caught Manual QA Automatic 43% delta example
Engineering hours saved N/A 16 hrs/week Per rejected finding
Regulatory sources checked 3-5 50+ 10x coverage
Human involvement 100% ~15% (escalations only) 85% reduction

What's Next

If we had more time (or funding), here's what we'd build:

  1. AWS Resource Mapping: Turn "update authentication component" into "enable MFA on arn:aws:iam::123456789:role/auth-service"

  2. Jira Integration: Approved findings automatically create tickets with:

    • Component owner assigned
    • Severity-based priority
    • Link to regulation source
    • Estimated effort based on historical data
  3. Feedback Loop: When humans override agent decisions, feed that back as training data to improve confidence scoring

  4. Multi-Framework Correlation: "This GDPR update + that PCI-DSS change = you need to update TWO systems, not one"

  5. Regulatory Change Prediction: "Basel IV is in draft. Here's what's likely to be affected when it passes."

Why This Matters

Compliance isn't sexy. But it's the difference between a thriving fintech and a bankrupt one. Every week, companies pay millions in fines for missing regulatory updates that were publicly available.

RegWatch doesn't just automate compliance monitoring—it makes disagreement systematic, trackable, and valuable.

When two AI agents argue about whether a component is affected, and one catches what the other missed, that's not a bug. That's the whole point.

Try It Yourself

  • GitHub:
  • Demo Video:
  • Elastic Cloud: You can replicate this with a free trial

Built with coffee, determination, and the Elastic Agent Builder Hackathon deadline looming. 🚀


Team: Xyphor Hackathon: Elastic Agent Builder (Jan 22 - Feb 27, 2026)
Built with: Agent Builder (GA), Elastic Workflows (Tech Preview), ES|QL, Python

Built With

  • elastic-agent-builder
  • elastic-cloud-serverless
  • elastic-workflows
  • elasticsearch-9.3.0
  • es|ql
  • fastapi
  • huggingface-transformers
  • json
  • kibana
  • markdown
  • microsoft-teams-api
  • n8n-(workflow-automation)
  • pydantic
  • python-3.11
  • requests
  • rest-apis
  • sentencetransformers
  • slack-api
  • smtp
  • webhooks
  • yaml
Share this project:

Updates