AI Judge — Project Story

What Inspired Us

Conflict is universal. But access to neutral, informed resolution is not.

Most people have been in a situation where they needed someone to just tell them — who is right here, and what does the law actually say? A landlord keeping a security deposit. A neighbor damaging your property. A freelancer not getting paid. These disputes are small enough that hiring a lawyer makes no economic sense, but large enough to destroy relationships and cause real financial harm.

The spark came from one observation: every small claims judge, before the hearing even begins, tells both parties to go outside and try to settle. That moment — two frustrated people in a hallway with no legal knowledge and no neutral voice — is where most conflicts either get resolved or permanently break down.

We wanted to build that neutral voice. Not just for small claims court, but for any conflict between two parties anywhere — neighbors, tenants, co-workers, community groups, families. The courthouse is the last resort. We're building the step before it.

What We Built

AI Judge is a two-party civil mediation platform powered by Claude.

Both parties submit their account of a dispute privately — their story, evidence, and what resolution they're seeking. The AI Judge then:

Analyzes both sides simultaneously
Identifies what each party is legally correct and incorrect about, with a merit score
Cites real applicable statutes for the jurisdiction (e.g. California Civil Code §1950.5 for security deposits)
Delivers a structured ruling with a specific, actionable resolution
Allows either party to submit a rebuttal with new arguments or evidence
Re-evaluates with each rebuttal — arguing back using the opposing party's claims
Provides a direct communication channel between parties, with the Judge intervening if the conversation turns hostile

The key design principle: this is not a chatbot giving advice to one person. It is an adversarial, structured analysis of two competing accounts — where each party sees exactly what they got wrong, backed by law.

How We Built It

The entire application is a single HTML file — no backend, no database, deployable instantly on GitHub Pages or any static host.

Stack:

Vanilla HTML, CSS, JavaScript (zero frameworks)
Anthropic Claude API (claude-sonnet-4-20250514) called directly from the client
Custom system prompt engineering for anti-hallucination legal reasoning

The core of the system is the Judge's system prompt. The critical constraint we enforced:

"Never hallucinate laws. Only cite statutes you are certain exist. If unsure of the exact statute number, cite the general legal principle and explicitly flag 'verify exact statute.' Be specific with amounts and timelines when the law specifies them. When one party is wrong, say so directly. Do not soften clear legal violations."

The JSON response schema enforces structure — correctness percentages, point-by-point right/wrong lists, law citations with relevance explanations, and a concrete resolution with dollar amounts where applicable.

On rebuttal rounds, the prompt explicitly instructs the model to reference the opposing party's original claims when evaluating the rebuttal — so it's not just responding to one side, it's actively weighing both positions against each other in real time.

Challenges We Faced

1. Preventing hallucinated laws

This was the hardest problem. An AI that confidently cites a fake statute is worse than no citation at all — it gives false confidence. Our solution was layered: explicit prohibition in the system prompt, instruction to flag uncertainty, and requiring the model to explain how each law applies to the specific facts rather than just citing it. This forces grounded reasoning rather than pattern-matched legal-sounding output.

2. True adversarial balance

Most LLM applications serve one user. This serves two simultaneously opposing users. The prompt had to be designed so neither party's framing dominated the analysis — the Judge had to be genuinely neutral while still being willing to say "you are legally wrong about this." Getting that balance — firm but fair — took significant iteration.

3. Rebuttal rounds that actually update

Early versions of the rebuttal prompt produced rulings that barely changed regardless of what the rebuttal said. The fix was forcing the model to explicitly state what changed and why, and to reference the opposing party's position when evaluating new arguments. This created real back-and-forth rather than a static repeated ruling.

4. Scope without scope creep

AI Judge applies to disputes at every scale — personal, neighborhood, workplace, civic. The temptation was to build different flows for each case type. We resisted this. The insight is that the underlying structure is always the same: two sides, applicable law, factual disagreement, need for a neutral voice. One well-designed flow handles all of it.

What We Learned

The most important thing we learned is that access to neutral expertise is itself a form of equity.

When a landlord keeps a tenant's deposit, the landlord often has experience navigating these disputes. The tenant usually doesn't. That information asymmetry — not bad faith — is what drives most unresolved conflict. AI Judge levels that asymmetry. Both parties walk in knowing their legal position before they even start negotiating.

We also learned that structure is more valuable than sympathy in conflict resolution. People in disputes don't need to feel heard as much as they need to understand — clearly, specifically — where they stand. The structured ruling format (correctness %, point-by-point, law citations, concrete resolution) did more to move people toward agreement than any amount of empathetic framing.