Inspiration
Defense contracts are among the most complex legal documents in existence — hundreds of clauses, buried liabilities, ambiguous language, and consequences that can cost millions or compromise national security. Yet most procurement officers and legal teams still rely on manual review, which is slow, inconsistent, and dangerously prone to missing hidden risks.
We were inspired by one question: what if an AI system could read a defense contract the way a team of 8 specialized experts would — simultaneously, exhaustively, and without ever hallucinating a finding?
That became Defence Contract Risk Score.
What it does
Defence Contract Risk Score is an autonomous multi-agent AI system that analyzes defense contracts end-to-end and surfaces every risk — including hidden ones buried in legal language.
Upload any defense contract PDF and the system:
- Parses every clause automatically
- Routes each clause to 7 specialized AI agents (Compliance, Security, IP, Financial, Interaction, Missing-Clause, and Verifier)
- Detects surface-level AND hidden risks in each clause
- Generates an immediate mitigation action for every risk found
- Flags missing clauses that should exist but don't (a risk in itself)
- Produces a 10-dimension danger score per clause
- Delivers a final traffic-light verdict: GREEN (Allow), YELLOW (Escalate), or RED (Block)
Crucially, no risk is ever reported unless it is verified against the actual contract text — eliminating false confidence through our built-in anti-hallucination layer.
How we built it
We designed an 8-agent pipeline orchestrated through LangGraph:
Router Agent — receives the contract PDF, extracts text via pdfplumber, segments it into clause types, and dispatches each to the right agent.
Compliance Agent — checks clauses against defense regulations and procurement standards.
Security Agent — identifies data handling, operational security, and information disclosure risks.
IP Agent — detects intellectual property traps, ownership ambiguities, and licensing landmines.
Financial Agent — uncovers hidden penalties, payment triggers, indemnification traps, and liability caps.
Interaction Agent — reviews obligations between parties, conflict-of- interest clauses, and termination conditions.
Missing-Clause Agent — identifies what is absent from the contract that should legally or strategically be present.
Verifier Agent — the anti-hallucination layer. Every finding from every agent is cross-checked against the source text before it reaches the user. If a risk cannot be grounded in the actual contract, it is dropped.
The frontend dashboard was built with Streamlit and deployed on Hugging Face Spaces, powered by Claude (claude-sonnet-4-20250514) via the Anthropic API.
Challenges we ran into
Hallucination in high-stakes contexts — Standard LLM outputs confidently report risks that don't exist. In defense contexts, a false positive can block a valid contract; a false negative can pass a dangerous one. Building the Verifier agent to reliably ground every finding in source text was the hardest engineering problem we solved.
Hidden risk detection — Surface risks are easy. The real challenge was teaching agents to reason about what a clause implies beyond what it literally says — for example, a payment clause that appears fair but contains a force majeure carve-out that effectively eliminates liability for the counterparty.
Missing-clause logic — Detecting what is not in a document requires the system to reason from a known baseline of what should be present. Building that baseline for defense contracts — where requirements vary by jurisdiction, contract type, and classification level — required significant prompt engineering.
Agent coordination without cascading errors — One agent's wrong output feeding into the next agent multiplies errors. We implemented strict output schemas per agent so each stage validates its input before processing.
Accomplishments that we're proud of
Zero hallucination policy — Our Verifier agent successfully filters ungrounded findings before they reach the user. Every risk shown in the dashboard links back to the exact clause that triggered it.
Hidden risk detection — The system caught financial and IP risks in test contracts that manual reviewers had previously missed, including a buried indemnification clause that shifted full liability to the buyer.
Missing-clause detection — The Missing-Clause agent operates independently of what is in the document, reasoning purely from what defense contracts of this type should contain. This is a capability most contract AI tools don't have.
End-to-end in under 60 seconds — A contract that would take a legal team hours to review is fully analyzed, risk-scored across 10 dimensions, and presented with mitigation actions in under a minute.
Actionable output, not just alerts — Every detected risk comes paired with a specific recommended mitigation action, making the output usable immediately without requiring legal expertise to interpret.
What we learned
Anti-hallucination is an architecture decision, not a prompt trick — You cannot instruct an LLM to "not hallucinate." You have to build a separate verification step that treats every prior output as untrusted until grounded.
Specialization beats generalization for complex documents — A single general-purpose "contract reviewer" prompt performs far worse than 7 agents each expert in one risk domain. Narrow focus produces sharper findings.
Absence is as dangerous as presence — The most valuable insight from building the Missing-Clause agent: what a contract deliberately omits is often more telling than what it includes.
Multi-agent orchestration requires rigid interfaces — Agents that pass freeform text to each other cascade errors. Structured output schemas between every agent stage were essential for reliability.
What's next for DEFENCE CONTRACT RISK SCORE
Jurisdiction-aware compliance — Expanding the Compliance Agent to understand DFARS, ITAR, NATO STANAG, and country-specific defense procurement regulations, not just general contract law.
Clause-level negotiation suggestions — Beyond flagging risks, the system will draft alternative clause language that a procurement officer can propose directly to the counterparty.
Multi-contract comparison — Analyze a new contract against a library of previously reviewed contracts to detect unusual deviations that may indicate intentional manipulation.
Classification-aware processing — Handle contracts with varying classification levels with appropriate data handling and on-premise deployment options for classified environments.
Human-in-the-loop escalation workflow — Integrate with enterprise approval systems so that RED-flagged contracts automatically trigger a legal review workflow with the exact risk summary pre-populated.
Built With
- claude
- code
Log in or sign up for Devpost to join the conversation.