## Inspiration
Drug-safety operations in U.S. hospitals still run on faxes in 2026. When the FDA issues a recall, the pharmacy team manually verifies it against multiple regulatory sites, hunts for affected patients in the EHR, and drafts three different role-specific notifications — a process that takes days to weeks while patients keep taking the recalled medication. The IOM estimates 44,000–98,000 Americans die every year from preventable medication errors.
The clinching insight came from a 2024 study where Mass General Brigham tried to automate consumer-level recall notification with Epic MyChart + FDA's Healthy Citizen API — and abandoned operational deployment because false-positive notifications caused unacceptable patient anxiety.
That's the unsolved problem. Verification — not detection — is what's missing. Anyone can ingest an RSS feed; nobody trusts the output enough to act on it. We wanted to build the autonomous verification layer that closes that gap.
## What it does
Reflex is an always-on agent swarm that turns FDA drug-safety signals into verified, cited, routed operational deliverables — in seconds, not weeks.
- An autonomous monitor polls OpenFDA's Drug Enforcement Reports every 60 seconds, deduplicates against ClickHouse, and fires the swarm on novel recalls — with
no human input.
- An 11-agent swarm runs in three tiers (Ingest → Decide → Synthesize) wired with
asyncio.gather:- Inbound normalizes the recall payload
- Scout fans out three parallel web searches via NimbleWay (FDA, EMA, PubMed)
- Triage classifies severity using the FDA 21 CFR §7.3 rubric
- Recon finds historical analogs in ClickHouse
- Verify + Counter runs a confirmation pass AND an adversarial counter-evidence pass — actively hunting manufacturer rebuttals, sponsored studies, and
regulator clarifications. If a meaningful contradiction surfaces, the verdict becomes
requires_human. - Cohort runs SQL against the patient store to identify affected patients + high-risk subgroup
- Substitute uses NVIDIA BioNeMo ESM2-650M protein embeddings to rank therapeutic alternatives by molecular-target cosine similarity (1280-d vectors)
- Routing + Comms drafts three role-specific communications (pharmacist memo, clinician alert, patient letter)
- Writer composes the canonical safety brief with citation enforcement
- Auditor HEAD-checks every citation URL
- Publisher ships the brief to Senso's
cited.md(with a git-mirror fallback to GitHub raw)
- A canvas-based "Agent Theater" renders all 11 agents + 6 source nodes (FDA, EMA, PubMed, ClickHouse, Senso, BioNeMo) with cursors that physically traverse between them on each tool call. The Counter agent spawns a red cursor when it surfaces a conflict.
- A conversational voice agent with browser SpeechRecognition + SpeechSynthesis lets you literally talk to the swarm. It exposes 9 real tools to NVIDIA NIM Llama 3.3 70B (send memos, alert clinicians, notify patients, trigger workflows, publish briefs, run premium sub-briefs, list recalls, navigate, check wallet) — so saying "Take next steps for me" actually chains the three communication dispatches.
- Premium sub-briefs are paywalled via the x402 protocol with real on-chain settlement on Base Sepolia through Coinbase CDP — the agent itself pays from a burner wallet, demonstrating agent-to-agent commerce.
- A 2D + 3D molecule preview ships in every brief: PubChem renders the recalled drug's chemical structure; 3Dmol.js renders the target protein's cartoon directly from RCSB PDB.
- A live cost telemetry dashboard at
/pricingshows the actual NIM token spend, NimbleWay call counts, and rate-limit health in real time.
## How we built it
Architecture: Single Python process with a single asyncio loop. The orchestrator is asyncio.gather over coroutines — no Redis, no Celery, no Kafka. For a
hackathon, every additional moving part is a demo risk.
Backend: FastAPI + asyncio.
Reasoning engine: All LLM calls go through one file (apps/api/tools/reasoning.py) using the OpenAI SDK pointed at NVIDIA NIM (integrate.api.nvidia.com/v1).
Defaults: meta/llama-3.3-70b-instruct for text + tool calling, meta/llama-3.2-90b-vision-instruct for PDF/image entity extraction. The vendor SDK is isolated to
this one file so swapping providers (or adding Anthropic) is a single boundary.
Protein embeddings: apps/api/tools/biology.py calls NVIDIA BioNeMo ESM2-650M for protein sequence → 1280-d mean-pooled embeddings, then computes cosine
similarity client-side.
Observability: Wrapped uvicorn with ddtrace-run so every NIM call automatically appears as a Datadog LLM Observability span — zero per-call wiring.
State: ClickHouse Cloud holds adverse_events, agent_traces, published_briefs, patients, x402_transactions, workflows, monitor_seen, and the new outbox audit table. We discovered the cluster endpoint via the ClickHouse Cloud management API and provisioned the SQL password with a sha256+double-sha1 PATCH call.
Publishing: Senso v2 API (apiv2.senso.ai/api/v1, X-API-Key auth) creates a question, drafts content, and attempts publish. The draft always succeeds (visible
in the Senso dashboard); a git push mirror to docs/cited/<slug>.md guarantees a public GitHub-raw URL even if the destination toggle isn't set.
Payments: eth-account + Base Sepolia RPC. We build the ERC-20 transfer calldata manually (function selector + 32-byte address + 32-byte amount) instead of
pulling in heavy web3.py. Real on-chain receipts via BaseScan.
Frontend: Next.js 14 App Router + Tailwind + a 600-line Canvas Agent Theater with a strict performance rule — React state is NEVER read inside the
requestAnimationFrame loop. The RAF loop only touches useRef containers; SSE events push into a ref-held queue that the loop drains per frame. Hard cap of 30
concurrent cursors with FIFO recycling.
Voice: Browser SpeechRecognition → POST /api/v1/chat (NIM with OpenAI-spec tools, max 5 tool-call rounds per turn) → SpeechSynthesis. Tool results feed back
as role: "tool" messages so the model can chain actions.
Molecule preview: PubChem PUG REST for 2D PNG; 3Dmol.js loaded from CDN renders the target protein PDB as a spectrum-colored, spinning cartoon.
## Challenges we ran into
- NIM free-tier rate limiting was brutal — HTTP 429s when 8+ agents fired in parallel. Solution: a global
asyncio.Semaphore(1)plus round-robin across two API keys to double the effective budget, plus deterministic fallbacks on every LLM-dependent agent so workflows always complete even under sustained 429s. - Senso's publish endpoint requires the destination to be
selected_for_generation: true— which can only be toggled in the dashboard. We made every brief succeed via a parallel git-mirror path so the public URL always resolves while the Senso draft still gets created. - ClickHouse provisioning with only API keys (no SQL password) required discovering the service endpoint via the management API and PATCHing a sha256+double-sha1 password hash.
- Real-time canvas at 60fps under SSE bursts required strict discipline: zero React state inside the RAF loop, a ref-held event queue, cursor pool recycling.
- Voice feedback loop — the assistant's TTS was being picked up by the mic. Fixed by queuing speech that arrives while the assistant is speaking and flushing after.
- Browser SpeechRecognition's flakiness with continuous mode (random
aborted/no-speecherrors) required auto-restart logic inonend. - Mermaid on GitHub parses
[/landing]and(parens)inside node labels as new shapes — broke half our diagrams until we quoted every label with tricky chars. - Next.js 14 vs 15 params API — App Router pages use plain
paramsobjects in 14, notPromises like in 15. Crashed the workflow route until fixed. - Light-mode contrast — Tailwind opacity utilities (
text-ice/90) don't auto-react to CSS variable swaps. Required[class*="text-ice"]attribute selectors with!importantto force readable dark text on light backgrounds.
## Accomplishments that we're proud of
- Seven sponsor tools, every one doing real work in the demo: NimbleWay (real SERP calls), Senso (real drafts in the dashboard), ClickHouse (real cohort SQL matching 18 patients), NVIDIA NIM (real triage + adversarial counter-evidence), NVIDIA BioNeMo (real 1280-d protein embeddings), Datadog LLM Observability (auto-instrumented), x402 + Coinbase CDP (real Base Sepolia receipts).
- The adversarial counter-evidence pass works on the actual demo recall — NIM correctly surfaced the Apotex investor-relations statement contradicting the FDA's
NDMA finding on metformin and flipped the verdict to
requires_humaninstead of broadcasting a possible false positive. That's the entire premise of the spec validated in production. - The voice agent is genuinely agentic. Saying "Take next steps for me" triggers a tool-calling loop that chains
send_pharmacist_memo(1 recipient) +send_clinician_alert(2 recipients) +send_patient_letters(5 recipients) — all logged in the ClickHouse outbox, all visible in the activity feed. - The Substitute agent's BioNeMo output is biologically plausible. For metformin (target: AMPK / PRKAA1), it ranks Sitagliptin (DPP4) at cosine similarity 0.933 and Glipizide (KCNJ11) at 0.895 — both reasonable diabetes alternatives by mechanism family.
- Every action is auditable. Outbox table records every memo/alert/letter/payment with workflow_id, recipient count, body, and trigger source. Activity feed streams these in real time on the landing page.
- A real Base Sepolia transaction settles before the sub-brief unlocks — visible on BaseScan, not a mock.
## What we learned
- The hardest problem in pharmacovigilance isn't detection — it's trust in the verdict. A single-LLM ingest pipeline can't beat the false-positive bar that Mass General Brigham failed. An explicit adversarial counter-evidence agent changes the trust dynamics.
- Protein embeddings can guide therapeutic substitution — not perfectly, but well enough to surface mechanism-family alternatives that a clinician can sanity-check. This is a real and novel application of biology foundation models in a clinical-ops loop.
- Browser SpeechSynthesis + SpeechRecognition + OpenAI-spec tool calling on NIM is a complete agentic voice stack with zero infrastructure. No LiveKit, no Whisper download, no gRPC. Works in 200 lines of TypeScript.
- Auto-instrumentation beats per-call wiring every time. Running uvicorn under
ddtrace-runcaptured every NIM call as a Datadog LLM Observability span without changing any agent code. - Single-process asyncio is the right scope for a hackathon agent system. Adding Celery / Redis / Kafka would have added zero capability and a lot of demo risk.
- CSS variables +
[class*=]attribute selectors are the cleanest way to retrofit a dark-only Tailwind app for a light-mode toggle without rewriting every component.
## What's next for Reflex AI
- Real FHIR/EHR connectors (Cerner Millennium, Epic Care Everywhere) to replace the synthetic patient fixture.
- HIPAA + 21 CFR Part 11 (electronic records/signatures) certification — prerequisite for any healthcare SaaS at scale. Architecture is already audit-trail-shaped.
- Production Coinbase CDP on Base mainnet for real revenue, plus dual Stripe rail.
- MAUDE (medical devices) coverage in addition to FAERS (drugs).
- EMA + MHRA + Health Canada + TGA as first-class data sources for ROW expansion.
- agentic.market listing so other agents can discover and subscribe to the Reflex brief feed as a paid service.
- NVIDIA Parakeet ASR swap for the browser SpeechRecognition path — better accuracy, especially for medical terminology.
- CMS CRUSH RFI alignment: the federal "detect and deploy" initiative explicitly solicits AI tools for healthcare fraud, waste, and abuse — Reflex's verification layer is a direct fit.
- Peer-reviewed paper on multi-agent verification vs. single-LLM ingest in pharmacovigilance, citing Mass General Brigham 2024 as the precedent failure we beat.
- A $5–10M Seed round. Comp set: Aletheia ($30M Series A, regulatory monitoring), Tessellate ($14M Seed, pharmacovigilance ops). The pharmacovigilance market is $13.7B today, projected to hit $34.2B by 2032 at 16.3% CAGR.
Built With
- ai
- api
- apis
- asyncio
- base
- bionemo
- canvas
- clickhouse
- cloud
- coinbase
- css
- data
- datadog
- ddtrace
- developer
- esm2
- events
- fastapi
- frameworks
- github
- html
- httpx
- javascript
- languages
- lapdog
- llama
- llm
- ml
- nim
- nimbleway
- nvidia
- observability
- openai
- openfda
- other
- payments
- pdb
- platform
- platforms
- pubchem
- pydantic
- python
- rcsb
- react
- render
- sdk
- senso
- server-sent
- speech
- tailwind
- tech
- typescript
- uvicorn
- vision
- web
- x402
Log in or sign up for Devpost to join the conversation.