Attest

💡 Inspiration

The challenge gave us a manufacturer whose data couldn't be trusted four days before a regulatory audit — duplicates, impossible values, orphaned references, unit errors. But the person accountable for that audit isn't an engineer. The brief named them perfectly: "the compliance officer who has never opened a database."

Almost every data tool we'd seen speaks to engineers — SQL, dashboards, trace logs. The person who actually has to sign off can't read any of it. So we built Attest: it finds, fixes, and explains broken data in plain English, where every decision is traceable to a concrete reason — never "the model said so."

## 🔍 What it does

Attest runs five agents over the raw records and turns them into a worklist a non-technical officer can act on cold:

Issue Detector — flags every problem, each with a concrete reason
Risk Prioritizer — sorts worst-first, and says why each ranks where it does
Remediation Planner — decides fix / flag / escalate
Audit Reporter — writes a one-page, signable, downloadable summary
Remediation Executor — auto-applies the safe fixes, and lets a human describe a fix in plain language that it turns into a previewable rule

On the benchmark (5,000 records) it surfaced 285 issues — 99 critical — and safely auto-removed 130 duplicates (5,000 → 4,870 rows).

## 🛠️ How we built it

A Python pipeline where agents never call each other directly — each reads the previous agent's output from a shared memory layer (Cognee) and writes its own back, so the handoffs are real and inspectable. The detection agents are deterministic on purpose: rules make every finding auditable. We used Claude only where judgment helps — the narrative summary and the human-in-the-loop fix automation.

Severity is explicit, not a black box:

$$ $$

where $b_c$ is the base severity for the issue category, bumped by one when the bad record has already shipped (the error likely escaped before the audit). Unit/weight errors are caught relative to each part's own history — we flag a record when

$$ w > 5\,\tilde{w}_p \quad\text{or}\quad w < \tfrac{1}{5}\,\tilde{w}_p $$ w > 5\,\tilde{w}_p \quad\text{or}\quad w < \tfrac{1}{5}\,\tilde{w}_p $$

for part $p$ with median weight $\tilde{w}_p$. The front end is a FastAPI app with a guided flow — See the data → Watch the agents → Review & sign → Take action — where you can click any agent to inspect its tool call, input, output, and handoff.

## 📚 What we learned

Detection is the easy part; legibility is the product. Our first UI dumped 285 rows and confused everyone. The win was turning it into a guided, plain-language story.
Deterministic beats clever for trust. Auditable reasons matter more than model magic when someone has to sign the result.
Humans and models check each other. When we fed the fix-automation a wrong instruction ("divide weights by 1000"), the model inspected the data ($609.4 / 61.5 \approx 9.9\times$) and corrected us: "that's ~10×, not 1000× — divide by 10." Real human-in-the-loop.

## 🧗 Challenges we faced

Dependency hell. Installing the Cognee SDK silently downgraded our anthropic client and broke every LLM call with an httpx … proxies error. We isolated a clean virtual environment, made the Cognee SDK optional, and kept the shared-memory interface so the demo never breaks.
Precision vs. recall. With ~850 seeded issues, we chose precision + explainability over chasing recall — a confident, correct, signable worklist beats a noisy one.
Making collaboration visible. It took deliberate design to route every handoff through memory so a judge can actually see Agent N+1 using Agent N's work.

Built With

claude
cognee
fastapi
python

Updates

AUW160150 Rodela started this project — Jun 07, 2026 05:40 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.