BioStrata: AI-Powered Clinical Trial Analysis
Inspiration
90% of clinical trials "fail" but many actually work brilliantly for specific patient subgroups that get averaged out in the overall results. Finding those subgroups manually takes a team of PhD statisticians weeks of expensive work.
We spoke with Henry Wei MD at Regeneron, who challenged us to build something bigger than a subgroup finder. He described an AI that could replicate the full workflow of a professional biostatistician, write the statistical code, run the analysis, check its own work, and present findings in plain English to a medical director who has never touched a p-value.
That conversation became BioStrata.
What It Does
Upload any raw clinical trial CSV. Press one button. BioStrata does the rest:
- Agent 1 (Biostatistician): Reads your data, writes R statistical code, runs subgroup analysis, logistic regression, SHAP feature importance, and Kaplan-Meier survival curves
- Agent 2 (Manager): Reviews Agent 1's output like a senior statistician would, flags issues, and QCs the results before anyone sees them
Results appear in two modes:
- Medical Director mode: Plain-English summary of what was found, who it worked for, and what to do next. No statistics knowledge needed.
- Statistician mode: Full technical output including p-values, confidence intervals, survival curves, efficacy tables, demographics tables, and downloadable R code.
How We Built It
- Frontend: React with custom SVG Kaplan-Meier curves and a Medical Director / Statistician toggle
- Backend: Node.js async job pipeline with seven stages from upload to final report
- AI: Google Vertex AI with Gemini 2.5 Flash powering both agents
- Stats: Dynamically generated R code executed server-side
- Database: Supabase for job persistence and analysis history
Challenges
Getting Gemini to generate consistently valid R code required careful prompt engineering. Subtle mistakes like wrong survival curve indexing or including the intercept in SHAP scores required explicit mandatory code patterns in the prompt. We tested every pattern iteratively before wiring it into the pipeline.
The other challenge was generalizing across any trial. Every CSV has different column names and structures. BioStrata infers the trial structure dynamically so it works on any dataset a researcher uploads.
What We Learned
The goal is not to build a subgroup finder. The goal is to democratize biostatistical expertise, making world-class clinical trial analysis available to any researcher regardless of budget or team size. AI does not replace the statistician. It makes one available to everyone.
Built With
- gemini
- javascript
- node.js
- postgresql
- r
- react
- supabase
- tailwind-css
Log in or sign up for Devpost to join the conversation.