Inspiration

Legal eDiscovery is a $15B industry where attorneys manually review thousands of documents to find the needle in the haystack — the one email that contradicts a deposition, the access log that proves someone lied under oath. A single case can take months and cost millions in billable hours.

We asked: what if a swarm of specialized AI agents could do this autonomously?
Not one general-purpose chatbot, but seven purpose-built investigators — each with their own tools, expertise, and role — coordinated through a structured investigation protocol. The kind of multi-agent orchestration that Elastic Agent Builder was designed for.


What It Does

ARGUS orchestrates 7 AI agents through a 6-phase investigation pipeline that mirrors how real legal investigation teams work:

1. Intake & Classification

A Document Classifier agent scans all evidence, tags relevance, and identifies key entities.

2. Relationship Mapping

A Relationship Mapper agent analyzes communication patterns between all parties, flagging suspicious external contacts.

3. Timeline Construction

A Timeline Builder agent constructs a chronological narrative, identifying escalation points and correlated events.

4. Pattern Detection & Sentiment Analysis (Parallel)

Two agents run simultaneously:

  • Pattern Detector — Finds anomalous file access patterns (after-hours downloads, bulk transfers, deletion spikes).
  • Sentiment Analyzer — Tracks behavioral shifts in communication tone.

5. Contradiction Hunting

A Contradiction Hunter agent cross-references sworn deposition testimony against documentary evidence, identifying potential perjury.

6. Synthesis & Final Report

A Lead Investigator agent reviews all findings and produces a structured legal report with case strength assessment and recommended actions.


Data Interaction

Each agent uses ES|QL queries and Index Search tools to interrogate real Elasticsearch data:

  • Emails
  • File access logs
  • Chat messages
  • Depositions
  • Contracts
  • Personnel records

Results stream in real-time to a dashboard featuring:

  • Relationship network graph
  • Evidence timeline
  • Contradiction panel
  • Live agent activity feed

Demo Case Results

Case: NovaTech v. Marcus Chen — suspected IP theft

ARGUS Output (≈10 minutes):

  • 69 findings
  • 12 mapped relationships
  • 16 key timeline events
  • 4 material contradictions between testimony and documents

How We Built It

Backend

  • FastAPI server orchestrating agents via the Agent Builder Converse API.
  • Agents deployed with specialized system instructions and curated toolsets.
  • Structured JSON parsing and typed WebSocket event emission:
    • relationship_found
    • timeline_event
    • contradiction_found
    • anomaly_detected

Agents & Tools

  • 7 agents
  • 30 tools total
    • 24 ES|QL tools for structured queries
    • 6 Index Search tools for semantic evidence discovery

Frontend

  • Next.js with real-time WebSocket streaming
  • Force-directed network graph visualization
  • Live-updating timeline, evidence board, and contradiction panels
  • Automatic final report generation with case strength scoring

Data Layer

Six Elasticsearch indices containing:

  • Emails
  • Documents
  • File access logs
  • Chat messages
  • Depositions
  • People directory

Synthetic but realistic dataset modeling a corporate IP theft scenario:

  • 750+ emails
  • 2000+ file access records
  • Depositions
  • Internal documents

Challenges We Ran Into

Multi-Agent Orchestration Is a State Management Nightmare

Coordinating 7 agents across 6 phases — with Phase 4 running two agents in parallel — meant solving real concurrency problems. Each agent maintains its own conversation state via the Converse API, emits different event types (relationship_found, timeline_event, contradiction_found, anomaly_detected), and writes findings back to Elasticsearch while simultaneously streaming to the frontend via WebSocket. Race conditions between parallel agents writing to the same findings index, conversation ID management across API calls, and ensuring the frontend state machine correctly handles interleaved events from concurrent agents required careful architectural decisions. We couldn't just fire-and-forget — later phases depend on earlier agents having indexed their findings.

Structured Output From Unstructured Reasoning

The fundamental tension: agents need freedom to reason deeply over evidence, but the dashboard needs typed, structured data. We couldn't use rigid output schemas without crippling the agents' investigative reasoning. Our solution was a dual-output protocol — agents produce rich analytical narratives and then emit a structured JSON block at the end. But getting this reliable across 7 different agents with different output schemas (relationships need source/target/link_type, contradictions need claim/evidence_against/deposition_ref, anomalies need metric_value/metric_unit) required iterating on prompt engineering, building a multi-strategy JSON extraction pipeline, and implementing graceful fallbacks that still populate the dashboard when an agent deviates from format.

Entity Resolution Across Independent Agents

Each agent independently discovers and references people — the Relationship Mapper finds "Marcus Chen" in email headers, the Pattern Detector sees "marcus-chen" in file access logs, and the Contradiction Hunter reads "Mr. Chen" in depositions. Without a shared entity resolution layer, the network graph fragments into disconnected nodes for the same person. We built a normalization pipeline that handles slug conversion, partial name matching, and last-name fallback to maintain a consistent entity graph across all agents — critical because the entire value proposition of multi-agent investigation is connecting findings across agents.

Designing the Investigation Protocol Itself

The hardest challenge wasn't code — it was designing which agents exist, what tools each one gets, what order they run in, and what runs in parallel. Give an agent too many tools and it wastes turns exploring irrelevant data. Give it too few and it can't cross-reference evidence. We went through multiple iterations of the phase structure before landing on the current 6-phase pipeline, where each phase's output becomes the next phase's context. The decision to run Pattern Detection and Sentiment Analysis in parallel (Phase 4) while keeping Contradiction Hunting sequential (Phase 5, after all evidence is indexed) was a deliberate architectural choice — contradictions require all prior findings to be queryable.


Accomplishments We’re Proud Of

  • Zero simulation, zero hardcoded data — All findings come from real agents querying real indices.
  • 100% JSON parse rate — Every agent response produces structured dashboard data.
  • Parallel agent execution — True concurrency in Phase 4.
  • Genuinely useful investigations — Real coordinated patterns and contradictions discovered.

What We Learned

Agent Builder excels at multi-agent architectures. The combination of:

  • ES|QL tools → precision
  • Index Search tools → exploration

…gives agents both surgical accuracy and discovery capability.

Key Insight: Don’t fight the LLM’s natural format. Let agents produce rich markdown, then extract structured JSON blocks. You get human-readable reports and machine-parseable data.


What’s Next for ARGUS

General-Purpose Case Support

Dynamic prompt generation from case metadata to support any investigation type.

Real Document Ingestion

Upload PDFs, CSVs, PSTs, XLSX files — automatically parsed and indexed.

Agent Memory Across Phases

Later agents receive earlier summaries, building cumulative investigative context.

Confidence Calibration

Track corroborated findings across agents to build evidence chain scoring.

Built With

Share this project:

Updates