## Inspiration

Drug-safety operations in U.S. hospitals still run on faxes in 2026. When the FDA issues a recall, the pharmacy team manually verifies it against multiple regulatory sites, hunts for affected patients in the EHR, and drafts three different role-specific notifications — a process that takes days to weeks while patients keep taking the recalled medication. The IOM estimates 44,000–98,000 Americans die every year from preventable medication errors.

The clinching insight came from a 2024 study where Mass General Brigham tried to automate consumer-level recall notification with Epic MyChart + FDA's Healthy Citizen API — and abandoned operational deployment because false-positive notifications caused unacceptable patient anxiety.

That's the unsolved problem. Verification — not detection — is what's missing. Anyone can ingest an RSS feed; nobody trusts the output enough to act on it. We wanted to build the autonomous verification layer that closes that gap.

## What it does

Reflex is an always-on agent swarm that turns FDA drug-safety signals into verified, cited, routed operational deliverables — in seconds, not weeks.

  • An autonomous monitor polls OpenFDA's Drug Enforcement Reports every 60 seconds, deduplicates against ClickHouse, and fires the swarm on novel recalls — with no human input.
  • An 11-agent swarm runs in three tiers (Ingest → Decide → Synthesize) wired with asyncio.gather:
    1. Inbound normalizes the recall payload
    2. Scout fans out three parallel web searches via NimbleWay (FDA, EMA, PubMed)
    3. Triage classifies severity using the FDA 21 CFR §7.3 rubric
    4. Recon finds historical analogs in ClickHouse
    5. Verify + Counter runs a confirmation pass AND an adversarial counter-evidence pass — actively hunting manufacturer rebuttals, sponsored studies, and regulator clarifications. If a meaningful contradiction surfaces, the verdict becomes requires_human.
    6. Cohort runs SQL against the patient store to identify affected patients + high-risk subgroup
    7. Substitute uses NVIDIA BioNeMo ESM2-650M protein embeddings to rank therapeutic alternatives by molecular-target cosine similarity (1280-d vectors)
    8. Routing + Comms drafts three role-specific communications (pharmacist memo, clinician alert, patient letter)
    9. Writer composes the canonical safety brief with citation enforcement
    10. Auditor HEAD-checks every citation URL
    11. Publisher ships the brief to Senso's cited.md (with a git-mirror fallback to GitHub raw)
  • A canvas-based "Agent Theater" renders all 11 agents + 6 source nodes (FDA, EMA, PubMed, ClickHouse, Senso, BioNeMo) with cursors that physically traverse between them on each tool call. The Counter agent spawns a red cursor when it surfaces a conflict.
  • A conversational voice agent with browser SpeechRecognition + SpeechSynthesis lets you literally talk to the swarm. It exposes 9 real tools to NVIDIA NIM Llama 3.3 70B (send memos, alert clinicians, notify patients, trigger workflows, publish briefs, run premium sub-briefs, list recalls, navigate, check wallet) — so saying "Take next steps for me" actually chains the three communication dispatches.
  • Premium sub-briefs are paywalled via the x402 protocol with real on-chain settlement on Base Sepolia through Coinbase CDP — the agent itself pays from a burner wallet, demonstrating agent-to-agent commerce.
  • A 2D + 3D molecule preview ships in every brief: PubChem renders the recalled drug's chemical structure; 3Dmol.js renders the target protein's cartoon directly from RCSB PDB.
  • A live cost telemetry dashboard at /pricing shows the actual NIM token spend, NimbleWay call counts, and rate-limit health in real time.

## How we built it

Architecture: Single Python process with a single asyncio loop. The orchestrator is asyncio.gather over coroutines — no Redis, no Celery, no Kafka. For a hackathon, every additional moving part is a demo risk.

Backend: FastAPI + asyncio.

Reasoning engine: All LLM calls go through one file (apps/api/tools/reasoning.py) using the OpenAI SDK pointed at NVIDIA NIM (integrate.api.nvidia.com/v1). Defaults: meta/llama-3.3-70b-instruct for text + tool calling, meta/llama-3.2-90b-vision-instruct for PDF/image entity extraction. The vendor SDK is isolated to this one file so swapping providers (or adding Anthropic) is a single boundary.

Protein embeddings: apps/api/tools/biology.py calls NVIDIA BioNeMo ESM2-650M for protein sequence → 1280-d mean-pooled embeddings, then computes cosine similarity client-side.

Observability: Wrapped uvicorn with ddtrace-run so every NIM call automatically appears as a Datadog LLM Observability span — zero per-call wiring.

State: ClickHouse Cloud holds adverse_events, agent_traces, published_briefs, patients, x402_transactions, workflows, monitor_seen, and the new outbox audit table. We discovered the cluster endpoint via the ClickHouse Cloud management API and provisioned the SQL password with a sha256+double-sha1 PATCH call.

Publishing: Senso v2 API (apiv2.senso.ai/api/v1, X-API-Key auth) creates a question, drafts content, and attempts publish. The draft always succeeds (visible in the Senso dashboard); a git push mirror to docs/cited/<slug>.md guarantees a public GitHub-raw URL even if the destination toggle isn't set.

Payments: eth-account + Base Sepolia RPC. We build the ERC-20 transfer calldata manually (function selector + 32-byte address + 32-byte amount) instead of pulling in heavy web3.py. Real on-chain receipts via BaseScan.

Frontend: Next.js 14 App Router + Tailwind + a 600-line Canvas Agent Theater with a strict performance rule — React state is NEVER read inside the requestAnimationFrame loop. The RAF loop only touches useRef containers; SSE events push into a ref-held queue that the loop drains per frame. Hard cap of 30 concurrent cursors with FIFO recycling.

Voice: Browser SpeechRecognitionPOST /api/v1/chat (NIM with OpenAI-spec tools, max 5 tool-call rounds per turn) → SpeechSynthesis. Tool results feed back as role: "tool" messages so the model can chain actions.

Molecule preview: PubChem PUG REST for 2D PNG; 3Dmol.js loaded from CDN renders the target protein PDB as a spectrum-colored, spinning cartoon.

## Challenges we ran into

  • NIM free-tier rate limiting was brutal — HTTP 429s when 8+ agents fired in parallel. Solution: a global asyncio.Semaphore(1) plus round-robin across two API keys to double the effective budget, plus deterministic fallbacks on every LLM-dependent agent so workflows always complete even under sustained 429s.
  • Senso's publish endpoint requires the destination to be selected_for_generation: true — which can only be toggled in the dashboard. We made every brief succeed via a parallel git-mirror path so the public URL always resolves while the Senso draft still gets created.
  • ClickHouse provisioning with only API keys (no SQL password) required discovering the service endpoint via the management API and PATCHing a sha256+double-sha1 password hash.
  • Real-time canvas at 60fps under SSE bursts required strict discipline: zero React state inside the RAF loop, a ref-held event queue, cursor pool recycling.
  • Voice feedback loop — the assistant's TTS was being picked up by the mic. Fixed by queuing speech that arrives while the assistant is speaking and flushing after.
  • Browser SpeechRecognition's flakiness with continuous mode (random aborted / no-speech errors) required auto-restart logic in onend.
  • Mermaid on GitHub parses [/landing] and (parens) inside node labels as new shapes — broke half our diagrams until we quoted every label with tricky chars.
  • Next.js 14 vs 15 params API — App Router pages use plain params objects in 14, not Promises like in 15. Crashed the workflow route until fixed.
  • Light-mode contrast — Tailwind opacity utilities (text-ice/90) don't auto-react to CSS variable swaps. Required [class*="text-ice"] attribute selectors with !important to force readable dark text on light backgrounds.

## Accomplishments that we're proud of

  • Seven sponsor tools, every one doing real work in the demo: NimbleWay (real SERP calls), Senso (real drafts in the dashboard), ClickHouse (real cohort SQL matching 18 patients), NVIDIA NIM (real triage + adversarial counter-evidence), NVIDIA BioNeMo (real 1280-d protein embeddings), Datadog LLM Observability (auto-instrumented), x402 + Coinbase CDP (real Base Sepolia receipts).
  • The adversarial counter-evidence pass works on the actual demo recall — NIM correctly surfaced the Apotex investor-relations statement contradicting the FDA's NDMA finding on metformin and flipped the verdict to requires_human instead of broadcasting a possible false positive. That's the entire premise of the spec validated in production.
  • The voice agent is genuinely agentic. Saying "Take next steps for me" triggers a tool-calling loop that chains send_pharmacist_memo (1 recipient) + send_clinician_alert (2 recipients) + send_patient_letters (5 recipients) — all logged in the ClickHouse outbox, all visible in the activity feed.
  • The Substitute agent's BioNeMo output is biologically plausible. For metformin (target: AMPK / PRKAA1), it ranks Sitagliptin (DPP4) at cosine similarity 0.933 and Glipizide (KCNJ11) at 0.895 — both reasonable diabetes alternatives by mechanism family.
  • Every action is auditable. Outbox table records every memo/alert/letter/payment with workflow_id, recipient count, body, and trigger source. Activity feed streams these in real time on the landing page.
  • A real Base Sepolia transaction settles before the sub-brief unlocks — visible on BaseScan, not a mock.

## What we learned

  • The hardest problem in pharmacovigilance isn't detection — it's trust in the verdict. A single-LLM ingest pipeline can't beat the false-positive bar that Mass General Brigham failed. An explicit adversarial counter-evidence agent changes the trust dynamics.
  • Protein embeddings can guide therapeutic substitution — not perfectly, but well enough to surface mechanism-family alternatives that a clinician can sanity-check. This is a real and novel application of biology foundation models in a clinical-ops loop.
  • Browser SpeechSynthesis + SpeechRecognition + OpenAI-spec tool calling on NIM is a complete agentic voice stack with zero infrastructure. No LiveKit, no Whisper download, no gRPC. Works in 200 lines of TypeScript.
  • Auto-instrumentation beats per-call wiring every time. Running uvicorn under ddtrace-run captured every NIM call as a Datadog LLM Observability span without changing any agent code.
  • Single-process asyncio is the right scope for a hackathon agent system. Adding Celery / Redis / Kafka would have added zero capability and a lot of demo risk.
  • CSS variables + [class*=] attribute selectors are the cleanest way to retrofit a dark-only Tailwind app for a light-mode toggle without rewriting every component.

## What's next for Reflex AI

  • Real FHIR/EHR connectors (Cerner Millennium, Epic Care Everywhere) to replace the synthetic patient fixture.
  • HIPAA + 21 CFR Part 11 (electronic records/signatures) certification — prerequisite for any healthcare SaaS at scale. Architecture is already audit-trail-shaped.
  • Production Coinbase CDP on Base mainnet for real revenue, plus dual Stripe rail.
  • MAUDE (medical devices) coverage in addition to FAERS (drugs).
  • EMA + MHRA + Health Canada + TGA as first-class data sources for ROW expansion.
  • agentic.market listing so other agents can discover and subscribe to the Reflex brief feed as a paid service.
  • NVIDIA Parakeet ASR swap for the browser SpeechRecognition path — better accuracy, especially for medical terminology.
  • CMS CRUSH RFI alignment: the federal "detect and deploy" initiative explicitly solicits AI tools for healthcare fraud, waste, and abuse — Reflex's verification layer is a direct fit.
  • Peer-reviewed paper on multi-agent verification vs. single-LLM ingest in pharmacovigilance, citing Mass General Brigham 2024 as the precedent failure we beat.
  • A $5–10M Seed round. Comp set: Aletheia ($30M Series A, regulatory monitoring), Tessellate ($14M Seed, pharmacovigilance ops). The pharmacovigilance market is $13.7B today, projected to hit $34.2B by 2032 at 16.3% CAGR.

Built With

Share this project:

Updates