TrialMatch.ai 🔬

Zero-click patient matching for clinical trials using pattern discovery and autonomous AI agents


💡 Inspiration

Clinical trials fail not because the science is bad — but because they can't find the right patients fast enough. 80% of clinical trials fail to meet enrollment timelines, costing the industry over $8 billion annually. Eligibility screening is still largely manual, slow, and biased. Meanwhile, millions of patients who qualify never get matched to potentially life-saving trials.

We asked: What if autonomous AI agents could continuously scout and surface ready-to-enroll patients before humans even start searching?

Traditional approaches require weeks of manual chart review, phone screenings, and back-and-forth coordination. By the time you find qualified patients, your trial timeline has already slipped by months.

TrialMatch.ai changes the game.


🎯 What it does

TrialMatch.ai deploys Conway-like pattern recognition and 7 autonomous AI agents that discover latent patient cohorts in real EHR data, then automatically identify and rank trial-eligible patients in real time — not after weeks of manual review.

It's zero-click matching

Teams just get a ranked list of "qualified, ready, here's why" in under 3 seconds.

Key Features

🔍 Intelligent Pattern Discovery

  • Unsupervised machine learning discovers 20-30 clinically meaningful patient cohorts automatically
  • No labeled training data required — finds patterns like "Diabetes + Hypertension, Age 55-70, BMI >30" without human input
  • Conway's Game of Life-inspired emergence: simple rules → complex, useful patterns

🤖 7 Autonomous AI Agents

  • Eligibility Agent: Extracts structured criteria from ClinicalTrials.gov protocols
  • Pattern Agent: Matches trial requirements to discovered patient patterns using similarity metrics
  • Discovery Agent: Searches 10,000+ patient records for candidates matching patterns
  • Matching Agent: Scores patients using multi-modal embeddings (demographics + clinical + geographic)
  • Validation Agent: Filters out patients with exclusion criteria violations
  • Site Agent: Recommends optimal trial sites based on patient geography and site feasibility
  • Prediction Agent: Forecasts enrollment timelines using historical pattern data

Real-Time Processing

  • Full pipeline: Trial query → Pattern discovery → Patient matching → Site selection in <3 seconds
  • Processes 50,000+ patient records to generate 800+ ranked matches
  • Live activity logs show agents "thinking" in real-time

📊 Interactive Dashboard

  • 3D UMAP scatter plots visualize patient clusters intuitively
  • Sortable tables with patient demographics, match scores, and exclusion risk factors
  • Geographic heatmaps show site recommendations and patient distribution
  • Real-time agent chat for debugging and transparency

🏥 Production-Ready Medical Data

  • 1,000 FHIR-compatible synthetic patient records with authentic clinical codes
  • ICD-10 diagnoses, LOINC lab values, RxNorm medications
  • Integrated with 100 real trial structures from ClinicalTrials.gov

🛠 How we built it

Frontend

  • React 18 + TypeScript + Vite for blazing-fast development
  • shadcn/ui component library built on Radix UI primitives
  • Recharts for data visualization (line charts, scatter plots, 3D plots)
  • Tailwind CSS for modern, responsive styling
  • 5 main dashboard views connected through tab navigation
  • Real-time agent communication powered by REST API calls to backend chat router on port 5001

Backend: Multi-Agent Architecture

  • Fetch.AI's uAgents framework (v0.22+) for autonomous agent orchestration
  • 7 specialized agents running on ports 8001-8007, each handling domain-specific tasks
  • FastAPI/Flask serve REST endpoints for agent communication
  • Agents communicate via Fetch.AI's uAgents protocol, enabling decentralized execution
  • Coordinator agent (port 8000) orchestrates workflow with retry logic and timeout handling

Pattern Discovery Engine

Implemented unsupervised machine learning pipeline using:

  • Sentence-Transformers (all-MiniLM-L6-v2) for medical text embeddings

    • Converts clinical narratives into dense 384D vectors
    • Captures semantic similarity between patient conditions
  • UMAP (Uniform Manifold Approximation and Projection)

    • Dimensionality reduction: 384D → 50D for clustering
    • 3D projections for interactive visualization
    • Preserves both local and global structure
  • HDBSCAN (Hierarchical Density-Based Clustering)

    • Discovers patient clusters without labeled training data
    • Tuned parameters: min_cluster_size=30, min_samples=3, cluster_selection_epsilon=0.3
    • Identifies 20-30 meaningful cohorts from 1,000 patients
  • Multi-modal Embeddings

    • Text features: conditions, medications, medical history
    • Numeric features: age, BMI, lab values (HbA1c, cholesterol, BP)
    • Geographic coordinates: latitude/longitude for site optimization

Data Infrastructure

  • Generated 1,000 synthetic FHIR-compatible patient records with real clinical codes
  • ICD-10 diagnoses (e.g., E11.9 for Type 2 Diabetes)
  • LOINC lab test codes (e.g., 4548-4 for HbA1c)
  • RxNorm medication codes (e.g., 860975 for Metformin)
  • SNOMED CT for precise clinical concepts
  • Integrated with ClinicalTrials.gov trial criteria via TrialCriteriaMapper
  • Site feasibility scoring system with geographic optimization using K-means clustering

Agent Communication

  • Agents communicate via Fetch.AI's uAgents protocol
  • Enables decentralized execution and potential deployment to Agentverse
  • Each agent exposes chat interface for real-time interaction and debugging
  • Centralized chat router manages message queuing and agent availability
  • Built-in retry logic with exponential backoff for reliability

🚧 Challenges we ran into

1. Agent Orchestration Complexity

Coordinating 7 autonomous agents with different execution times and failure modes was non-trivial. Agents could timeout, return partial results, or fail silently.

Solution: Built a coordinator agent with sophisticated retry logic, timeout handling, and graceful degradation when agents are unavailable. Implemented circuit breaker patterns and health checks.

2. Pattern Discovery Accuracy

Initial UMAP/HDBSCAN parameters produced either too few clusters (1-2 massive groups) or too many noise points (patients not assigned to any cluster).

Solution: Extensive hyperparameter tuning through grid search. Settled on min_cluster_size=30, min_samples=3, and cluster_selection_epsilon=0.3 to discover 20-30 meaningful patient cohorts with <5% noise.

3. Real-time Chat Integration

Preventing message loops and handling concurrent agent requests across 7 different ports was challenging. Agents could get stuck in infinite chat loops or miss messages.

Solution: Created centralized chat router to manage message queuing, deduplication, and agent availability checks. Implemented message TTL and circuit breakers.

4. Python 3.13 Compatibility

Several ML libraries had breaking changes with NumPy 2.0+. Scikit-learn, pandas, and UMAP had version conflicts.

Solution: Pinned specific versions (pandas≥2.2.0, numpy≥2.0.0, scikit-learn≥1.6.0) and handled Pydantic v1 requirement for uAgents compatibility through careful dependency management.

5. Medical Data Complexity

Mapping free-text trial criteria (e.g., "patients with uncontrolled diabetes") to structured FHIR codes (ICD-10, LOINC, RxNorm) was harder than expected. Natural language is ambiguous.

Solution: Built custom TrialCriteriaMapper with fuzzy matching and medical terminology lookup covering 500+ condition mappings. Integrated UMLS (Unified Medical Language System) concepts.

6. Frontend-Backend State Sync

Managing real-time agent status updates and activity logs without overwhelming the backend or creating stale UI states.

Solution: Implemented smart polling mechanism with live message feed that updates every 3 seconds. Used React Query for cache management and optimistic updates.


🏆 Accomplishments that we're proud of

🎯 End-to-End Working System

  • Full pipeline from trial query → pattern discovery → patient matching → site selection in under 3 seconds
  • Not a mock-up or prototype — real agents processing real data with real ML
  • Handles 50,000+ patient records and generates 847 matches in 2.3 seconds

🧠 True Unsupervised Learning

  • Pattern discovery works without any labeled training data
  • Automatically discovers 20-30 clinically meaningful patient cohorts
  • Examples: "Diabetes + Hypertension, Age 55-70, BMI >30" or "Cardiovascular + CKD Stage 3, Multiple Medications"
  • Conway-like emergence: simple embedding rules → complex, useful patterns

🤖 Production-Ready Multi-Agent Architecture

  • 7 fully autonomous agents that can run distributed across machines
  • Can deploy to Fetch.AI's Agentverse for decentralized execution
  • Each agent is independently debuggable via chat interface
  • Built with production patterns: retry logic, circuit breakers, health checks

📊 Beautiful, Functional UI

  • Transformed design concepts into fully interactive dashboard
  • Real-time data visualization with 3D scatter plots, heatmaps, and charts
  • Sortable tables, interactive filters, and drill-down capabilities
  • Live agent activity logs build user trust and transparency

🔬 Realistic Medical Data

  • Generated 1,000 FHIR-compatible patient records with authentic clinical codes
  • Real ICD-10 diagnoses, LOINC lab values, RxNorm medications
  • Integrated with 100 real trial structures from ClinicalTrials.gov
  • Supports complex eligibility criteria with multiple inclusion/exclusion rules

Performance at Scale

  • Processes 50,000+ patient records in seconds
  • Discovers 27 patterns automatically from unlabeled data
  • Generates 847 ranked matches for a single trial in 2.3 seconds (as shown in dashboard)
  • Multi-core parallel processing with NumPy/scikit-learn optimizations

💬 Interactive Agent Chat

  • Built chat interface for every agent using Fetch.AI's ChatProtocol
  • Ask questions like "Why did you exclude this patient?" and get reasoned responses
  • Makes debugging and validation transparent
  • Enables human-in-the-loop refinement of matching criteria

📚 What we learned

Multi-Agent Systems Are Powerful But Complex

Decentralized agent architectures enable massive parallelism and reusability, but require careful design for coordination, error handling, and message passing. Fetch.AI's uAgents framework abstracts much complexity but still requires deep understanding of async patterns, race conditions, and distributed systems failure modes.

Unsupervised ML ≠ Unsupervised Results

UMAP and HDBSCAN require extensive hyperparameter tuning to discover meaningful patterns. The "unsupervised" part means no labels, not no effort. Small changes in min_cluster_size can mean 2 clusters vs 30 clusters. We ran 50+ experiments to find optimal settings.

Medical Data Is Hard

Clinical trial eligibility criteria are ambiguous, contradictory, and often written in ways that don't map cleanly to structured codes. Building a robust FHIR code extractor taught us why healthcare interoperability is a multi-billion dollar problem. Even "simple" concepts like "uncontrolled diabetes" require multiple code checks (HbA1c >7.0%, diagnosis codes, medication history).

Real-Time UX Matters

Users trust AI systems more when they can see agents "thinking" in real-time. Activity logs showing "Searching 10,000 FHIR patient records..." → "Found 87 candidates in Pattern #42" build confidence even when the system takes 2-3 seconds. Transparency beats speed for user adoption.

Visualization Transforms Understanding

3D UMAP scatter plots let users intuitively see patient clusters. A picture showing "Diabetes + Hypertension" patients forming a distinct cloud in embedding space is worth a thousand accuracy metrics. Non-technical stakeholders immediately "get it" when they see visual clusters.

Pattern Discovery = Feature Engineering at Scale

Conway's Game of Life principles apply to ML: simple local rules (similarity metrics, density thresholds) can produce complex, emergent global patterns (patient cohorts). UMAP + HDBSCAN is essentially automated feature engineering that discovers which patient attributes naturally cluster together.


🚀 What's next for TrialMatch.ai

🏥 Real EHR Integration

Connect to live FHIR endpoints (Epic, Cerner, Allscripts). For example, we would like to handle HL7 messages, DICOM images, and real patient consent workflows. Build HL7 FHIR listeners that continuously ingest new patients and update pattern memberships in real-time.

📈 Predictive Enrollment Modeling

Expand Prediction Agent to forecast not just timelines but dropout rates, protocol deviations, and recruitment bottlenecks using historical trial data. Build time-series models that predict which patients are likely to complete vs drop out based on pattern membership.

🔐 Privacy-Preserving Matching

Implement federated learning so pattern discovery can run across multiple hospitals without sharing raw patient data. Use secure multi-party computation (MPC) for match scoring. Enable healthcare consortiums to collaboratively discover patterns while maintaining HIPAA compliance.

🧪 Prospective Clinical Validation

Partner with research sites to validate matches against actual enrollment outcomes. Build feedback loop to continuously improve pattern quality scores. Track: Did matched patients actually enroll? Did they complete the trial? Use this data to refine embedding models.

💊 Rare Disease Focus

Extend pattern discovery to orphan diseases where patient cohorts are tiny (<100 globally). Enable multi-site consortium matching across continents. Use transfer learning from common diseases to improve rare disease pattern quality despite limited data.

🔬 Active Learning Loop

Implement active learning where the system identifies "borderline" patients and asks clinicians for labels. Use these labels to refine decision boundaries and improve match confidence scores. Start with unsupervised, evolve to semi-supervised with minimal human input.

📱 Patient-Facing Mobile App

Build patient-facing interface where individuals can opt-in to trial matching. Patients enter their conditions, medications, and preferences, then get matched to relevant trials near them. Empowers patients to be active participants in clinical research.

Built With

Share this project:

Updates