CardioVascular Disease Risk Assessment

CVD Risk Assessment

Inspiration

We watched doctors spend hours searching through clinical guidelines and research papers for every high-risk patient. A cardiologist friend told us: "I know the answer is in these 500-page documents, but finding it takes too long." That's when we realized - what if an AI agent could do this search in seconds? We wanted to help doctors save time so they could focus on what they do best: caring for patients.

What it does

Our agent assesses cardiovascular disease risk in mins instead of 2 hours. You give it patient data: blood pressure, cholesterol, lifestyle factors and it does three things automatically:

Predicts risk using our XGBoost model trained on 70,000+ patients
Searches clinical guidelines to find the right treatment protocols
Pulls research papers that back up the recommendations

How we built it

We built it around Elasticsearch Agent Builder because we needed something that could orchestrate multiple data sources without us writing complex coordination code.

The ML part: We trained an XGBoost model on cardiovascular data, deployed it as a RestAPI using FastAPI, and connected it to the agent using Elasticsearch's HTTP workflow tool.

The knowledge part: We indexed 100+ PubMed abstracts and clinical guidelines into Elasticsearch with vector embeddings. This lets the agent search semantically. It understands "high blood pressure" and "hypertension" are related.

The smart part: We used Elasticsearch's hybrid search (vector + keyword) because pure vector search missed specific medical terms, and pure keyword search missed context. Combining both improved accuracy by 30%.

The agent: We wrote a system prompt that teaches the agent to map risk factors to medical topics. When it sees "high BP" from the ML model, it knows to search hypertension guidelines. That's autonomous reasoning, we didn't hardcode that logic.

Challenges we ran into

Challenge 1: Making the agent cite sources correctly
Early versions made up PMID numbers. We fixed this by having Elasticsearch return all metadata (guideline class, PMID, year) and forcing the agent to only cite what it actually retrieved. No more hallucinations.

Challenge 2: Chunking 500-page guidelines
We couldn't just split documents every 512 tokens that broke recommendations apart. We had to chunk by sections (like "Blood Pressure Targets") so each piece made sense on its own. This actually made search better because each chunk was semantically complete.

Accomplishments that we're proud of

It actually works end-to-end. From patient data to evidence-based recommendations in under a minute.

The agent is genuinely autonomous. When it sees "high cholesterol" in the ML output, it decides on its own to search lipid management guidelines. We didn't program that with if-then statements. It just... figured it out. That's what Agent Builder enables.

Hybrid search was a breakthrough. Combining vector and keyword search improved retrieval by 30%.

The citations are real. Every recommendation links to an actual guideline or paper (pubmed id). We solved the hallucination problem that plagues most medical AI systems.

What we learned

Elasticsearch Agent Builder is powerful when you trust it. At first, we tried to control everything. Once we stepped back and let the agent make decisions, it got way smarter.

Medical AI needs hybrid search. Vector search alone doesn't cut it for specialized domains. You need keyword matching too.

Chunking matters more than embeddings. We spent days tuning our embedding model, but the biggest gains came from smarter document chunking.

Speed and quality aren't always a trade-off. ES|QL's optimization let us go faster AND maintain quality. You just need to know the tool well.

Healthcare AI needs citations. Doctors won't trust recommendations without evidence. We learned to treat citations as a core feature, not an afterthought.

What's next for Cardiovascular Disease Risk Assessment

Short-term (next 3 months):
Test it with real doctors. We need feedback on whether our recommendations match how they actually practice medicine.

Medium-term (6 months):
Add ECG image analysis. Right now we only use vitals. If we can read ECG images, we catch acute problems like heart attacks.

Long-term (1 year):
EHR integration. Right now you have to manually enter patient data. We want to pull it automatically from Epic, Cerner, etc.

Dream goal:
Get this into an actual hospital. That means FDA clearance, HIPAA compliance, clinical trials—the whole nine yards. But if we can prove it helps doctors make better decisions faster? That saves lives. And that's worth the effort.