Devil's Advocate and Jury

Inspiration

High-stakes domains (research, law, journalism, policy) break down when models “sound right” but fabricate citations or overstate what sources actually say.
Traditional RAG helps retrieval, but it doesn’t verify that the model’s claims accurately reflect the retrieved text.
We wanted a single pipeline that: debates a claim from both sides, forces every argument to be sourced, and then audits those sources automatically.

The Devil’s Advocate & The Jury is an evidence-validation debate system:

Input: a topic, hypothesis, or policy statement + selected knowledge base (medical, legal, academic, news, etc.)
Debate: two agents argue opposing sides using only documents retrieved from Elasticsearch:
- A1 (Proponent): builds the strongest evidence-backed case for the claim
- A2 (Devil’s Advocate): rebuts, probes for gaps, surfaces counter-evidence and bias
Verification: a Jury agent re-checks every citation by fetching the cited documents and validating whether the quoted/claimed support is real.
Output: a transparent verdict report:
- winner (or synthesis if both converge)
- score breakdown (logic, evidence quality, rebuttal strength, citation accuracy)
- citation audit trail (supported / weakly supported / unsupported / missing)

Core guarantee: no-hallucination contract — misquoted or invented citations are detected and penalized.

Designed a knowledge-base index optimized for retrieval + audit:
- Hybrid search: BM25 + semantic retrieval
- Chunked documents: each source split into passage chunks for precise citation
- Stored metadata for provenance (title, author, publication date, source URL, domain, license, etc.)
Added citation-friendly fields so agents can reference exact material:
- text, title, authors, year, source, url

Enabled ELSER-powered semantic search so agents can retrieve relevant evidence even when user phrasing doesn’t match keywords.
Used hybrid ranking to keep “exact legal/statutory wording” strong while still catching semantic paraphrases.

Implemented three personas as distinct agents:
- A1: evidence-first constructive case builder
- A2: adversarial rebuttal + bias/assumption detector
- Jury: citation auditor + multi-lens scorer (Empiricist / Philosopher / Economist / Historian / Humanist / Religious Scholar)
Exposed a single constrained tool to A1/A2:
- search_knowledge_base(query, top_k, filters) returning chunks + IDs
Enforced a structured citation format that embeds IDs:
- [Source: title | doc_id=... | chunk_id=... | year=...]

Built a turn-based workflow that chains calls without a custom application server:

For each cited doc_id/chunk_id, Jury re-fetches the authoritative text using ES|QL (or equivalent retrieval step).
Validates support using a layered approach:
- Existence check: cited IDs must exist
- Quote/claim alignment: compare claimed support to retrieved chunk text
- Semantic consistency: similarity check to detect paraphrase misrepresentation
Applies penalties:
- fabricated/missing source → heavy citation accuracy penalty
- weak or non-supporting quote → partial penalty + flagged in audit

Citation precision: getting agents to cite the exact chunk that supports the exact claim, not just “something nearby.”
Chunking tradeoffs: chunks too small lose context; too large reduce retrieval accuracy and make audits ambiguous.
Hallucination-resistant prompting: ensuring agents never “fill gaps” when retrieval is thin—especially in adversarial debate.
Bias and source quality: “more citations” isn’t better if they’re low-quality or ideologically skewed.
Scoring fairness: balancing evidence volume vs. evidence strength, and penalizing unsupported confidence.
Latency/throughput: multi-turn debate + per-citation verification can be expensive without batching and caching.

End-to-end pipeline: debate → verification → scored verdict in a single orchestrated workflow.
Hard anti-hallucination mechanism: citations are not just included—they’re audited against the original indexed text.
Configurable Jury lenses: same debate can be evaluated through empirical, logical, economic, historical, and ethical frames.
Transparent outputs: users can see exactly which claims were supported, weakly supported, or rejected.
Hybrid retrieval with ELSER: improved recall for semantically related evidence while preserving keyword fidelity for technical domains.

Verification is not a UX feature—it’s the product: without an audit layer, RAG still allows confident misrepresentation.
Good retrieval depends as much on index design and chunking as it does on embeddings.
Multi-agent debate increases coverage, but without guardrails it can amplify confident errors—so post-hoc verification must be first-class.
Scoring systems need to reward “correct uncertainty” and penalize “unsupported certainty,” not just rhetorical strength.

Stronger citation alignment: highlight exact supporting spans (offset-level), not just chunk-level references.
Source quality weighting: automatic signals for peer-review status, jurisdictional authority, publisher reputation, recency, and conflicts of interest.
Bias reporting module: structured disclosure (who funded, editorial stance, selection effects) rather than a vague “bias flag.”
Active ingestion: on-demand crawling/upload + immediate indexing when debate needs missing coverage.
User-facing UI in Kibana: interactive verdict report with clickable citations, side-by-side quote comparison, and per-claim audit status.
Zotero/Mendeley integration: one-click knowledge base population and citation export.
Multilingual support: multilingual embeddings + cross-language retrieval for global policy and legal research.
Voice briefing mode: debate and verdict delivered as an audio summary with distinct agent voices.

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.