Loom: https://www.loom.com/share/76d58a0a84754b2e988931d0db5cedbc
Inspiration
- The Problem: Fewer than 1 in 20 adult cancer patients ever enroll in a clinical trial. The ultimate bottleneck isn't a shortage of trials—it's matching. Currently, a research coordinator must read 40-page protocols one at a time, hand-matching them against patients while critical eligibility windows quietly close. Meanwhile, the global trial landscape changes weekly.
- The Question: We kept coming back to one core thought: What would it look like if an autonomous agent ran the research office—not just searching, but acting, negotiating, and even transacting on what it learned?
- The Solution: Clinical trial accrual served as our perfect proving ground: high human stakes, unstructured data everywhere, and a workflow where "autonomous" translates into a tangible, real-world impact.
What it does
Cerebro acts as an autonomous research office for a cancer center, operating seamlessly across three distinct surfaces:
1. Coordinator View
Ingests live trials from ClinicalTrials.gov, builds an evidence-backed knowledge graph, and automatically matches each patient against every protocol. Instead of a binary match, it explains why: detailing which criteria are confirmed, which need a manual chart check, and which rule them out—with every single claim cited directly back to the registry. The human coordinator simply reviews and approves the agent-drafted outreach email.
2. Participant View
The patient's own AI agent autonomously negotiates enrollment terms (such as per-visit stipends, on-site vs. telehealth preferences, and scheduling logistics) with the trial's agent. This negotiation strictly abides by the Institutional Review Board (IRB)-approved compensation band, which is hard-coded into the server logic rather than relying on brittle prompt instructions.
3. The Market
Cerebro packages its proprietary trial intelligence into heavily cited briefs and sells them directly to other autonomous agents over the x402 protocol, settling transactions instantly in stablecoin. It is an agent that independently researches, transacts, and gets paid.
Safety Architecture: Every match, negotiation round, and graph edge is fully traceable. A clinician must explicitly verify and approve before any action is finalized.
How we built it
Cerebro runs on a tightly integrated four-service stack launched via a single command: docker compose up.
| Service Component | Technology | Core Functionality |
|---|---|---|
| TrialBridge KG | FastAPI | Knowledge-graph service storing evidence-backed trial/patient facts; runs the core matcher engine (reasons, blockers, missing-info). |
| Backend | Python / Node.js | Handles corpus ingestion, LLM-written cited briefs, the market loop, and the x402 payment seam. |
| App Server | Node.js / Python | Orchestrates patient seeding, execution of the negotiation engine, and the system replay journal. |
| Frontend | Next.js / Vite | Built with a custom "Sequoia" design system, force-directed graph visualizations, and generative UI streamed live via OpenUI (the agent renders answers in UI components, not prose). |
| Ingest Agent | TrueFoundry | Pulls real registry data dynamically and safely routes LLM calls through the TrueFoundry gateway. |
- The Architectural Throughline: A unified matching engine feeding multiple entry points. Every system journal entry is written in the exact same database transaction as the fact itself, making the "nothing is staged" guarantee literally true.
Challenges we ran into
- Keeping it Honest: Our hardest rule was self-imposed: zero fake data. Every single surface had to rely on a live API call, enforced by a rigorous build test that fails if any view attempts to import a fixture. We tore out hardcoded graphs, templated emails, and canned numbers multiple times to maintain this integrity.
- The Citation Gate vs. LLM Nondeterminism: The agent's quote-validation initially crashed ingestion regularly because LLM-extracted quotes would differ from raw registry text by a single bullet point or a unicode character (like
≥). We solved this by developing formatting-tolerant matching algorithms and per-claim salvage pipelines without compromising our citation guarantees. - Network & Payment Seams: Figuring out a true 402 handshake over local Docker networking—and then successfully deploying it over Render’s hostnames—required intensive configurations utilizing envsubst-templated Nginx, HTTPS upstreams, and SNI.
- Live Pipeline Race Conditions: The application originally seeded matches once at boot, meaning subsequent on-demand ingests never propagated. We refactored the architecture to actively watch the corpus and re-match on the fly without requiring a system restart.
Accomplishments that we're proud of
- Genuinely Live, Not a Mockup: We built a platform leveraging real
ClinicalTrials.govdata, a production knowledge graph, LLM-authored briefs, and a functioning HTTP 402 payment handshake. We even integrated a live network terminal directly into the market view so judges can cross-check every single request against their own browser developer tools. - Two Products, One Engine: The identical underlying corpus, graph, and monitoring backend successfully power both the clinical workflow for doctors and the autonomous agent-to-agent data economy.
- Safety as a Core Feature: Instead of relying on hand-waving disclaimers, the IRB compensation band boundaries and citation guardrails are enforced natively in server code. Clinicians trust Cerebro because its "unknown / needs chart check" designation intentionally combats LLM overconfidence.
- Frictionless Deployment: Going from empty volumes to an advanced, multi-agent live demo in a single terminal command.
What we learned
- "Autonomous" lives in information space first. The most credible version of an AI agent acting in medicine is one that independently searches, reasons, and drafts—while strictly routing final outbound actions to a human-in-the-loop gate. That gate is what earns clinical trust.
- The knowledge graph is the ultimate unlock. Parsing a clinical protocol deeply once and matching it against many patients dynamically is the true product insight. It makes the agent's reasoning completely explainable, which is essential in healthcare.
- Hard constraints beat soft prompts. Enforcing safety boundaries and citation requirements directly in the codebase (rather than relying on system prompts) is the only reliable way to put a nondeterministic LLM into a high-stakes workflow.
- Honesty is an architectural decision. When system journals and clinical facts are written together atomically, the question "Is this real?" always has a traceable, one-click answer.
What's next for Cerebro
- Continuous Surveillance: Transitioning the agent to re-scan the global trial landscape nightly across an entire active patient panel, proactively flagging alerts like: "Patient #14 now matches a trial that opened this Tuesday." This turns autonomy from a demo feature into a continuous utility.
- Live On-Chain Settlement: Upgrading the simulated facilitator to a real CDP wallet on Base, moving the entire agent-to-agent data economy fully on-chain.
- Deeper Eligibility Parsing: Progressing from regex heuristics to full clinical NLP and KG-native criteria extraction to achieve institutional, protocol-grade accuracy.
- EHR Integration: Connecting directly to live Electronic Health Record systems so that "needs chart check" data fields can automatically resolve themselves, paving the way for real-world coordinator pilots at active cancer centers.
Built With
- python
- render
- vite
Log in or sign up for Devpost to join the conversation.