Analysis Page - Statistician View

BioStrata: AI-Powered Clinical Trial Analysis

Inspiration

90% of clinical trials "fail" but many actually work brilliantly for specific patient subgroups that get averaged out in the overall results. Finding those subgroups manually takes a team of PhD statisticians weeks of expensive work.

We spoke with Henry Wei MD at Regeneron, who challenged us to build something bigger than a subgroup finder. He described an AI that could replicate the full workflow of a professional biostatistician, write the statistical code, run the analysis, check its own work, and present findings in plain English to a medical director who has never touched a p-value.

That conversation became BioStrata.

What It Does

Upload any raw clinical trial CSV. Press one button. BioStrata does the rest:

Agent 1 (Biostatistician): Reads your data, writes R statistical code, runs subgroup analysis, logistic regression, SHAP feature importance, and Kaplan-Meier survival curves
Agent 2 (Manager): Reviews Agent 1's output like a senior statistician would, flags issues, and QCs the results before anyone sees them

Results appear in two modes:

Medical Director mode: Plain-English summary of what was found, who it worked for, and what to do next. No statistics knowledge needed.
Statistician mode: Full technical output including p-values, confidence intervals, survival curves, efficacy tables, demographics tables, and downloadable R code.

How We Built It

Frontend: React with custom SVG Kaplan-Meier curves and a Medical Director / Statistician toggle
Backend: Node.js async job pipeline with seven stages from upload to final report
AI: Google Vertex AI with Gemini 2.5 Flash powering both agents
Stats: Dynamically generated R code executed server-side
Database: Supabase for job persistence and analysis history

Challenges

Getting Gemini to generate consistently valid R code required careful prompt engineering. Subtle mistakes like wrong survival curve indexing or including the intercept in SHAP scores required explicit mandatory code patterns in the prompt. We tested every pattern iteratively before wiring it into the pipeline.

The other challenge was generalizing across any trial. Every CSV has different column names and structures. BioStrata infers the trial structure dynamically so it works on any dataset a researcher uploads.

What We Learned

The goal is not to build a subgroup finder. The goal is to democratize biostatistical expertise, making world-class clinical trial analysis available to any researcher regardless of budget or team size. AI does not replace the statistician. It makes one available to everyone.