π‘ Inspiration
The challenge gave us a manufacturer whose data couldn't be trusted four days before a regulatory audit β duplicates, impossible values, orphaned references, unit errors. But the person accountable for that audit isn't an engineer. The brief named them perfectly: "the compliance officer who has never opened a database."
Almost every data tool we'd seen speaks to engineers β SQL, dashboards, trace logs. The person who actually has to sign off can't read any of it. So we built Attest: it finds, fixes, and explains broken data in plain English, where every decision is traceable to a concrete reason β never "the model said so."
## π What it does
Attest runs five agents over the raw records and turns them into a worklist a non-technical officer can act on cold:
- Issue Detector β flags every problem, each with a concrete reason
- Risk Prioritizer β sorts worst-first, and says why each ranks where it does
- Remediation Planner β decides fix / flag / escalate
- Audit Reporter β writes a one-page, signable, downloadable summary
- Remediation Executor β auto-applies the safe fixes, and lets a human describe a fix in plain language that it turns into a previewable rule
On the benchmark (5,000 records) it surfaced 285 issues β 99 critical β and safely auto-removed 130 duplicates (5,000 β 4,870 rows).
## π οΈ How we built it
A Python pipeline where agents never call each other directly β each reads the previous agent's output from a shared memory layer (Cognee) and writes its own back, so the handoffs are real and inspectable. The detection agents are deterministic on purpose: rules make every finding auditable. We used Claude only where judgment helps β the narrative summary and the human-in-the-loop fix automation.
Severity is explicit, not a black box:
$$ $$
where $b_c$ is the base severity for the issue category, bumped by one when the bad record has already shipped (the error likely escaped before the audit). Unit/weight errors are caught relative to each part's own history β we flag a record when
$$ w > 5\,\tilde{w}_p \quad\text{or}\quad w < \tfrac{1}{5}\,\tilde{w}_p $$ w > 5\,\tilde{w}_p \quad\text{or}\quad w < \tfrac{1}{5}\,\tilde{w}_p $$
for part $p$ with median weight $\tilde{w}_p$. The front end is a FastAPI app with a guided flow β See the data β Watch the agents β Review & sign β Take action β where you can click any agent to inspect its tool call, input, output, and handoff.
## π What we learned
- Detection is the easy part; legibility is the product. Our first UI dumped 285 rows and confused everyone. The win was turning it into a guided, plain-language story.
- Deterministic beats clever for trust. Auditable reasons matter more than model magic when someone has to sign the result.
- Humans and models check each other. When we fed the fix-automation a wrong instruction ("divide weights by 1000"), the model inspected the data ($609.4 / 61.5 \approx 9.9\times$) and corrected us: "that's ~10Γ, not 1000Γ β divide by 10." Real human-in-the-loop.
## π§ Challenges we faced
- Dependency hell. Installing the Cognee SDK silently downgraded our
anthropicclient and broke every LLM call with anhttpx β¦ proxieserror. We isolated a clean virtual environment, made the Cognee SDK optional, and kept the shared-memory interface so the demo never breaks. - Precision vs. recall. With ~850 seeded issues, we chose precision + explainability over chasing recall β a confident, correct, signable worklist beats a noisy one.
- Making collaboration visible. It took deliberate design to route every handoff through memory so a judge can actually see Agent N+1 using Agent N's work.
Built With
- claude
- cognee
- fastapi
- python
Log in or sign up for Devpost to join the conversation.