Inspiration
Every day, powerful people make promises. Politicians promise policy changes. CEOs promise product launches. Founders promise revenue targets. And then... nothing happens. Nobody checks. The promise dissolves into the noise. PolitiFact has been tracking promises manually since 2008 - 8 reporters, 533 promises, 17 years. I asked: what if an AI agent could do what 8 reporters do, but for 10,000 people, in real time, automatically?
What it does
Receipts is an autonomous AI agent that tracks whether powerful people keep their promises. You type any public figure's name. The agent:
- Searches the live web for their public statements
- Extracts concrete promises with deadlines
- Stores every promise in a structured database
- Checks whether deadlines were met using live evidence
- Publishes a permanent public verdict page — ✅ Kept / ❌ Broken / ⚠️ Partial / ❓ Unclear
- Runs autonomously every 24 hours to find new promises and update verdicts The output is public accountability infrastructure. Pages are permanently citable, AI-agent-discoverable, and updated without human intervention.
How we built it
The pipeline chains 5 sponsor tools:
- Nimble - searches the live web for public statements and evidence
- Claude (Anthropic) - extracts promises from raw text and determines verdicts
- ClickHouse - stores every promise, deadline, and verdict at scale
- Senso - saves AI-agent-discoverable drafts to the knowledge base
- Datadog - traces every LLM call with full cost and token observability
- GitHub Pages - serves public verdict pages automatically An autonomous scheduler runs every 24 hours (or every 10 minutes in demo mode), scanning all tracked people for new promises, checking expired deadlines, updating verdicts, rebuilding pages, and pushing to GitHub - all without human intervention.
Challenges we ran into
Dynamic promise extraction — News articles don't label promises as promises. Getting Claude to reliably distinguish "we hope to improve" from "we will launch by Q4" required careful prompt engineering and multiple iterations.
Nimble content depth — Initial searches returned titles and descriptions but not full article text. Solved by chaining Nimble's search API with their extract API to fetch and parse full HTML content.
Senso publishing — The REST API wasn't accessible from our network. Pivoted to the Senso CLI's engine draft command which worked instantly and reliably.
Deduplication — Running the agent multiple times accumulated similar promises. Built a ClickHouse deduplication query using argMax to keep the most recent verdict per unique promise.
Accomplishments that we're proud of
- Fully autonomous pipeline — the agent runs without any human input
- 41 promises tracked across 6 public figures on day one
- All 5 sponsor tools integrated meaningfully, not just superficially
- Live public website with real verdict pages anyone can visit and cite
- Built solo in a single hackathon day
What we learned
The hardest part of building accountability infrastructure isn't the technology - it's defining what counts as a promise. Vague statements, hedged language, and context-dependent commitments required more nuance than expected. Claude handles this surprisingly well with the right prompting. I also learned that agentic systems need to fail gracefully. Nimble timeouts, Senso API quirks, and ClickHouse deduplication issues all required real-time debugging and fallback strategies.
What's next for Receipts
- Agent payment rails (x402) — charge other AI agents micropayments per API query, making Receipts self-sustaining infrastructure
- Email alerts — notify journalists when a tracked deadline approaches or a verdict changes
- Confidence scoring UI — show users how certain the agent is about each verdict
- Expanded coverage — track corporate earnings promises, climate pledges, and startup pitch claims
- "Wikipedia for broken promises" — free to read, pay-per-query for machines
Log in or sign up for Devpost to join the conversation.