Samyama's Clinical Trial Enrollment Agent

Complete serverless architecture: 11 AWS services orchestrating AI agent pipeline with Bedrock, HealthLake, Textract & Comprehend Medical
AI-powered eligibility results with Mistral Large 2: 25% match score, criterion-by-criterion breakdown with confidence scores & reasoning
6-phase autonomous AI pipeline: PDF upload → Textract OCR → Comprehend Medical → Classification → Mistral Large 2 parsing → DynamoDB
Protocol processing complete: 88.1% OCR confidence, 100% extraction, 94.1% overall confidence with 2 criteria parsed by Mistral Large 2
Professional PDF report with Samyama branding: 50% match score, detailed criteria analysis table, ready for IRB submission & audit trails
Principal Investigator overview: 53 active trials, enrollment metrics, match confidence distribution, and pending review queue dashboard
Clinical Research Coordinator dashboard: 247 patients screened, 42 active matches, 78% success rate with real-time confidence scoring

Inspiration

85% of clinical trials fail due to slow patient enrollment. This staggering statistic represents billions of dollars in lost drug development costs and, more importantly, delayed access to potentially life-saving treatments for patients who need them most. As healthcare technologists, we witnessed firsthand how Clinical Research Coordinators (CRCs) spend weeks manually reviewing patient records against complex trial eligibility criteria. This process is:

Time-consuming: Weeks of manual chart review per protocol
Error-prone: Subjective interpretation of eligibility criteria
Expensive: Millions in delayed trial timelines
Inconsistent: Variable screening quality across coordinators

We asked ourselves: What if AI could automate this entire process?

With the AWS AI Agent Global Hackathon 2025 and the power of Amazon Bedrock with Mistral Large 2, we saw an opportunity to revolutionize clinical trial enrollment with autonomous AI agents.

What it does

Samyama's Clinical Trial Enrollment Agent is an intelligent, autonomous system that accelerates patient-trial matching from weeks to minutes. The system serves three distinct user personas:

🔬 For Clinical Research Coordinators (CRC)

Search patients from AWS HealthLake FHIR datastore (11 resource types)
Run AI-powered eligibility checks with one click
View explainable results with confidence scores and criterion-by-criterion analysis
Export professional PDF reports for IRB submission

📋 For Study Administrators

Upload trial protocol PDFs via drag-and-drop interface
Monitor 6-phase AI processing pipeline in real-time:
1. Document Upload → S3
2. Text Extraction → AWS Textract
3. Medical Entity Recognition → Amazon Comprehend Medical
4. Criteria Classification → Custom ML
5. Structured Data Parsing → Mistral Large 2 on Amazon Bedrock
6. DynamoDB Caching
Manage 50+ protocols with status tracking and analytics

👨‍⚕️ For Principal Investigators (PI)

Dashboard overview of all active trials and enrollment metrics
Match confidence distribution (High/Medium/Low buckets)
Review and approve patient matches with complete audit trails
Track enrollment progress across multiple trials

🧠 AI-Powered Matching

The system uses Mistral Large 2 on Amazon Bedrock to:

Parse complex eligibility criteria from unstructured protocol PDFs
Reason through patient FHIR data (demographics, conditions, medications, labs)
Generate confidence-scored matches (0-100%) with explanations
Provide criterion-by-criterion analysis showing exactly why a patient qualifies or doesn't

How we built it

Architecture Overview

We built a serverless, event-driven architecture using 11 AWS services orchestrated through AWS CDK:

Frontend (React 18 + TypeScript)

React 18.3 with TypeScript 5.8 for type safety
Tailwind CSS 3.4 + shadcn/ui for modern, accessible UI components
AWS Amplify Auth for Cognito integration
Recharts for data visualization (match confidence, enrollment trends)
jsPDF for client-side PDF report generation
Deployed: CloudFront CDN → S3 static hosting → Custom domain (enrollment.samyama.care)

Backend (Python 3.11 + AWS CDK)

Built with 10 AWS Lambda functions orchestrating the AI pipeline:

1️⃣ Protocol Processing Pipeline

 PDF Upload → S3 Trigger → Lambda Chain
 ├─ AWS Textract: OCR extraction from protocol PDFs
 ├─ Amazon Comprehend Medical: Medical entity recognition (conditions, medications, procedures)
 ├─ Section Classifier: ML model to identify inclusion/exclusion criteria sections
 └─ Mistral Large 2 (Bedrock): Parse criteria into structured JSON with medical coding

2️⃣ Patient Matching Pipeline

Patient Selection → FHIR Search → Eligibility Evaluation
 ├─ AWS HealthLake: Query FHIR R4 patient data (11 resource types)
 ├─ Data Aggregator: Consolidate Patient, Condition, Observation, Medication resources
 └─ Mistral Large 2 (Bedrock): Reason through criteria with explainable confidence scores

3️⃣ Authentication & Authorization

 - Amazon Cognito: 3 user pools (CRC, StudyAdmin, PI)
 - Lambda Authorizer: JWT validation for API Gateway
 - IAM roles: Fine-grained permissions per persona

4️⃣ Data Persistence

 - 3 DynamoDB tables: Protocols, Patients, Matches (DAX caching for sub-millisecond reads)
 - S3 buckets: Protocol PDFs, processed documents, audit logs
 - CloudWatch: Centralized logging and monitoring

Key Technologies
 - Amazon Bedrock (Mistral Large 2): mistral.mistral-large-2402-v1:0 - Our primary AI reasoning engine
 - AWS HealthLake: FHIR R4 datastore with 11 resource types
 - AWS Textract: PDF text extraction with table detection
 - Amazon Comprehend Medical: Medical NER (Named Entity Recognition)
 - Infrastructure as Code: AWS CDK (Python) for reproducible deployments

## Challenges we ran into

1. Mistral Large 2 Prompt Engineering

Challenge: Getting Mistral Large 2 to consistently output structured JSON for eligibility criteria parsing while
maintaining medical accuracy.

Solution: We developed a multi-shot prompting strategy with:
 - Medical terminology glossaries (SNOMED CT, LOINC, RxNorm)
 - Example criteria mappings with expected JSON schema
 - Chain-of-thought reasoning to explain each criterion's medical logic
 - Retry logic with exponential backoff for malformed responses

2. FHIR R4 Data Complexity

Challenge: AWS HealthLake FHIR data spans 11 resource types with nested references. Aggregating all relevant patient data for eligibility evaluation was complex.

Solution: Built a recursive FHIR resolver that:
 - Follows resource references (Patient → Condition → Observation)
 - Caches resolved resources in DynamoDB to avoid redundant HealthLake queries
 - Flattens nested data into a single patient profile for Mistral Large 2

3. Real-Time Protocol Processing

Challenge: Protocol PDFs can be 100+ pages. Processing through Textract → Comprehend Medical → Bedrock can take 60-90 seconds, risking API Gateway timeouts.

Solution: Implemented asynchronous processing with:
 - S3 event triggers → Step Functions state machine
 - WebSocket API for real-time progress updates to frontend
 - DynamoDB Streams for status tracking across pipeline stages

4. Explainable AI for Clinical Trust

Challenge: Clinicians need to understand why the AI made a matching decision. Black-box scores aren't acceptable in healthcare.

Solution: We engineered Mistral Large 2 prompts to return:
 - Criterion-by-criterion breakdown (Met/Not Met/Uncertain)
 - Confidence scores per criterion (0-100%)
 - Medical reasoning ("Patient HbA1c of 7.8% meets inclusion criteria of 7-10%")
 - FHIR resource citations (which Observation or Condition was used)

 5. Multi-Persona Authentication

Challenge: Three user types with different permissions and workflows sharing the same infrastructure.

Solution: AWS Cognito user groups with:
 - Group-based JWT claims in access tokens
 - Lambda Authorizer checking group membership
 - DynamoDB row-level security (RLS) based on persona

## Accomplishments that we're proud of

✅ Fully functional end-to-end system deployed to production (enrollment.samyama.care)✅ 52 real clinical trial protocols
 processed and cached✅ Synthetic patient data (Synthea-generated) for realistic FHIR testing✅ Sub-3-second eligibility 
 checks with explainable AI reasoning✅ Professional PDF reports ready for IRB submission✅ Three complete persona 
 workflows (CRC, StudyAdmin, PI) with role-based dashboards✅ 100% serverless - zero infrastructure management, scales to
 zero✅ HIPAA-eligible architecture (though using synthetic data for demo)

Impact Potential

 If deployed at scale, our system could:
 - Reduce trial enrollment time by 70-90% (weeks → days)
 - Save millions in trial costs per protocol
 - Accelerate drug development timelines by months
 - Improve patient access to life-saving treatments

## What we learned

Technical Learnings

 1. Mistral Large 2 excels at medical reasoning - Its ability to understand complex clinical criteria and reason through
 patient data exceeded our expectations
 2. AWS HealthLake + FHIR R4 provides a standardized, interoperable data layer perfect for healthcare AI
 3. CDK Infrastructure as Code made our architecture reproducible and version-controlled
 4. Serverless scales beautifully - From zero to 100 concurrent users with no config changes

Healthcare Domain Learnings

 1. Clinical trial eligibility is incredibly complex - A single criterion like "HbA1c 7-10%" requires understanding:
   - Medical codes (LOINC 4548-4)
   - Unit conversions (% vs mmol/mol)
   - Temporal logic (most recent test within 6 months)
 2. Explainability is non-negotiable in healthcare - Clinicians need to audit every AI decision
 3. FHIR adoption is growing but data quality varies widely across EHR systems

Team Learnings

 1. Start with the user workflow - We built persona-specific interfaces first, then architected the backend
 2. Iterate on prompts early - 60% of our development time was perfecting Mistral Large 2 prompts
 3. Demo data matters - High-quality synthetic data (Synthea) made demos convincing

## What's next for Samyama's Clinical Trial Enrollment Agent

 Near-Term Enhancements (Next 3 months)

 1. Multi-modal AI - Add image analysis for radiology/pathology criteria using Bedrock's vision models
 2. Proactive matching - Notify CRCs when new patients matching active trials are admitted
 3. Real-time FHIR subscriptions - Use HealthLake webhooks for live patient data updates
 4. Adverse event monitoring - Extend to post-enrollment safety tracking

 Long-Term Vision (6-12 months)

 1. FDA validation - Clinical study to measure impact on enrollment rates
 2. EHR integrations - SMART on FHIR apps for Epic, Cerner, Athenahealth
 3. Multi-site coordination - Federated matching across hospital networks
 4. Patient-facing portal - Let patients search trials they qualify for
 5. Regulatory submission support - Auto-generate regulatory documents for FDA/EMA

 Commercialization

 We envision this as a SaaS platform for:
 - Academic medical centers conducting 50+ trials concurrently
 - Contract Research Organizations (CROs) managing multi-site trials
 - Pharmaceutical companies accelerating Phase 2/3 enrollment

 Pricing model: Per-protocol-per-month subscription ($500-2000 based on trial size)

 ---
 Built for the AWS AI Agent Global Hackathon 2025 🚀

 ---

Built With

amazon-api-gateway
amazon-bedrock
amazon-cloudfront-cdn
amazon-cloudwatch
amazon-cognito
amazon-comprehend-medical
amazon-dynamodb
amazon-web-services
aws-amplify
aws-cdk
aws-healthlake
aws-lambda
aws-textract
fhir-r4
jspdf
mistral-large-2
python-3.11
react-18
recharts
shadcn/ui
tailwind-css
typescript
vite

Updates

Abhishekreddy31 Singireddy started this project — Oct 19, 2025 10:18 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.