🌍 Inspiration Over 80% of clinical trials fail to meet their enrollment targets; not because patients don't exist, but because they never find out they qualify. The discovery process is broken. Patients face walls of medical jargon, dense eligibility criteria written for clinicians, and search tools that require them to already know what they're looking for.

I've spent over a decade in clinical data management, working at companies like Medidata on the systems that power clinical trials. I've seen firsthand how much effort goes into running a trial; and how enrollment bottlenecks can delay or kill studies that could save lives. The problem isn't supply. It's navigation.

TrialConnect was born from a simple question: what if a patient could describe their situation in plain language and get a ranked, explainable list of trials they actually qualify for in under 60 seconds?

💡 What It Does TrialConnect is a guided AI concierge for clinical trial discovery. Instead of a blank search box, patients are walked through a 4-step onboarding wizard:

Choose your condition: plain language, no medical codes required

Set your location: GPS or typed address, with a radius slider

Build your profile: age, sex, current medications, and an optional medical document upload that Gemini extracts automatically

Launch: one button triggers a semantic search across 4,400+ real trials from ClinicalTrials.gov

Results are ranked by a combination of semantic relevance, exact condition matching, and proximity to the nearest trial site. Each trial has a dedicated detail page with full eligibility text, a location map, and a one-click AI eligibility check powered by Gemini 2.5 Flash that compares the patient's profile against the inclusion and exclusion criteria and explains the match in plain English.

🏗️ How I Built It The stack was chosen to demonstrate deep integration of both MongoDB Atlas and Google Cloud AI, not just surface-level API calls.

MongoDB Atlas is the core data layer:

4,468 trials seeded from ClinicalTrials.gov REST API v2, each embedded with text-embedding-005 (768 dimensions)

Atlas Vector Search powers the semantic matching pipeline

Geospatial queries rank trials by Haversine distance to the nearest site

Aggregation pipelines serve the live /api/stats endpoint in real time

Collections for users, patient dossiers, and promoted trials

Google Cloud / Gemini handles all AI inference:

Gemini 2.5 Flash for eligibility matching: full inclusion/exclusion criteria analysed against the patient profile

Gemini 2.5 Flash for document extraction: upload a PDF or image of a medical record and the profile fields are populated automatically

text-embedding-005 via Vertex AI for generating and querying trial embeddings

Vertex AI Agent Builder for the conversational AI chatbot embedded on every results page

Backend: Python 3.11, Flask, PyMongo, deployed on Google Cloud Run via Docker

Frontend: Bootstrap 5, Vanilla JS, Leaflet.js for trial site maps

Auth: Google OAuth 2.0 + local auth with Werkzeug password hashing

⚡ Challenges I Faced Startup performance on Cloud Run. The combination of MongoDB Atlas connection pooling, Vertex AI SDK initialization, and Gemini agent setup at import time caused the container to exceed Cloud Run's startup probe window with multiple Gunicorn workers. The fix was reducing to a single worker with multi-threading, which cut cold-start time dramatically.

Embedding quality vs. cost. Early tests with shorter trial descriptions produced poor vector matches. Enriching the embedded document to include condition names, phase, sponsor, and inclusion criteria summary significantly improved recall; but also increased seeding time and Atlas storage requirements.

Eligibility language is hard. Clinical trial eligibility criteria are written in a highly technical register. Getting Gemini to produce a patient-friendly match explanation (not just a yes/no) required careful prompt engineering, including explicit instructions to explain why a criterion was met or missed in plain language.

Multi-step session state. The 4-step onboarding wizard needed to persist partial state across steps without a database write on every click. Flask server-side sessions with careful serialization of the profile object solved this cleanly.

🏆 What I'm Proud Of The guided onboarding wizard is the feature I'm most proud of. It transforms what would otherwise be a cold, intimidating search interface into a warm, step-by-step experience. The document upload and auto-extraction step; where a patient uploads a medical record and watches their profile fields populate automatically; is the moment in the demo that shows what AI-augmented healthcare UX can really feel like.

I'm also proud of the depth of the MongoDB integration: vector search, geospatial ranking, and aggregation pipelines all working together in a single query pipeline is not a trivial implementation.

📚 What I Learned Prompt design for medical AI is its own discipline. Tone, structure, and explainability constraints in the Gemini prompts mattered as much as the underlying model capability.

Cold-start architecture matters on serverless. Lazy initialization of heavy SDK clients (Vertex AI, MongoDB) rather than at module import time would be a key refactor for a production version.

MongoDB Atlas Vector Search is genuinely powerful for this kind of hybrid retrieval; combining semantic embedding similarity with structured field filters (condition, status, country) in a single $vectorSearch pipeline stage is elegant and fast.

🔮 What's Next Patient accounts and saved searches: bookmark trials, set alerts for new matches

EHR integration: FHIR-compliant profile import from hospital systems

Sponsor dashboard: trial sponsors can promote studies and see match analytics

Multi-language support: the current pipeline is English-only; Gemini multilingual capability makes this straightforward to extend

Regulatory compliance layer: GDPR-compliant consent management for EU patient data

Built With

Share this project:

Updates