Explainable AI-Based Cardiovascular Risk Assessment System

ROC Curve for Calibrated LightGBM — the primary model used in CardioShieldAI
Confusion matrix for Calibrated LightGBM — achieving 80% recall, minimizing missed high-risk patients
Calibration curve for Calibrated LightGBM — predicted probabilities closely follow the ideal diagonal, confirming reliability
Top 15 clinical features ranked by LightGBM importance — age, cholesterol and blood pressure are top drivers
ROC Curve for XGBoost baseline model — AUC shows strong discriminative ability

Inspiration

Cardiovascular disease is the #1 cause of death globally, yet most patients are flagged too late. Traditional risk scores like the Framingham Score rely on static, hand-crafted formulas that miss complex non-linear interactions between clinical features.

I was inspired by a simple but powerful question: Can AI catch a cardiac event before it happens — and actually explain its reasoning to a doctor?

Clinicians don't just need a number. They need trust. A black-box model that outputs "high risk" without any reasoning is useless in a real clinical setting. I wanted to bridge the gap between raw clinical data and transparent, actionable predictions that a doctor can act on.

What it does

CardioShieldAI is a hybrid clinical decision support system that:

Accepts patient data via manual input or OCR-extracted medical reports (PDF / scanned documents)
Runs a calibrated LightGBM model to predict the probability of a future major cardiac event
Generates SHAP-based explainability — showing exactly which features (e.g., cholesterol, age, blood pressure) drove the risk score up or down
Produces a natural-language clinical summary via a locally-run LLM (Ollama), mimicking a clinical consultant's reasoning
Scores data quality and handles missing values gracefully

The output is not just a risk score — it's a complete, explainable clinical report.

How we built it

The system is built in six layers:

1. Input Layer Two modes: manual entry (structured form) and OCR upload (Tesseract + PDF parser) for automated extraction from medical reports.

2. Machine Learning Core A LightGBM Gradient Boosting Ensemble trained on cardiovascular risk features. The binary classification predicts future cardiac events:

$$\hat{y} = \mathbb{1}\left[P(\text{cardiac event} \mid \mathbf{x}) \geq \tau\right]$$

3. Probability Calibration Raw ensemble probabilities are often overconfident. We applied Isotonic Regression calibration so that a model output of $P = 0.8$ truly means ~80% of similar patients experienced an event:

$$P_{\text{calib}}(y=1 \mid s) = g(s)$$

Model	Accuracy	Recall	F1 Score
LightGBM (raw)	0.6972	0.7965	0.7555
Calibrated LightGBM	0.6969	0.8006	0.7563

4. SHAP Explainability Each prediction is broken down using SHapley Additive exPlanations, giving every feature a fair, game-theoretic attribution:

$$\phi_i = \sum_{S \subseteq F} \frac{|S|!(|F|-|S|-1)!}{|F|!} [f(S \cup {i}) - f(S)]$$

5. LLM Clinical Reasoning A locally-run LLM (via Ollama) takes the risk score and top SHAP contributors and generates a structured clinical narrative — grounded in model outputs, not hallucinated.

6. Full-Stack Deployment

Backend: FastAPI (Python)
Frontend: React + CSS
ML Stack: LightGBM, Scikit-learn, SHAP, Tesseract OCR

Challenges we ran into

OCR reliability: Medical reports vary wildly in format and scan quality. Extracting structured values like Cholesterol: 210 mg/dL required heavy post-processing and regex pipelines.
Calibration stability: Isotonic calibration can overfit on small datasets. Careful split tuning was needed to preserve monotonicity between raw and calibrated scores.
SHAP latency: Computing full Shapley values adds response time. We resolved this by returning the primary risk score immediately and computing SHAP asynchronously.
LLM hallucination: Getting the local LLM to produce grounded clinical summaries without fabricating labs or diagnoses required careful prompt engineering and output validation layers.
Synthetic data constraints: Real cardiovascular datasets are hard to access due to privacy regulations, which limits real-world generalizability.

Accomplishments that we're proud of

Built a full end-to-end clinical AI pipeline — from raw OCR input to a calibrated, explainable risk report
Achieved 80% recall after calibration — critical in a clinical setting where missing a true positive (a real cardiac event) is far more dangerous than a false alarm
Successfully integrated three AI layers (ML model + SHAP + LLM) into a coherent, doctor-facing output
Demonstrated through an ablation study that probability calibration meaningfully improves clinical utility without sacrificing accuracy
Built a working OCR pipeline that handles multi-format medical documents

What we learned

Calibration matters more than accuracy in healthcare ML. A well-calibrated model with slightly lower accuracy is far more trustworthy than an overconfident one.
Explainability drives clinical adoption. In testing, clinicians engaged far more with the SHAP breakdown than with the raw risk score alone.
LLMs as reasoning assistants, not oracles. Grounding LLM output in structured model results (rather than free-form generation) is the right pattern for clinical AI.
OCR in healthcare is genuinely hard. Even state-of-the-art OCR struggles with clinical documents; structured extraction pipelines are non-trivial to build reliably.
The full ML lifecycle — data → model → calibration → explanation → communication — is far more complex than optimizing a single metric.

What's next for Explainable AI-Based Cardiovascular Risk Assessment System

Real hospital data integration — connecting to datasets like MIMIC-III and PhysioNet to validate on real patient populations
ECG signal analysis — adding a 1D CNN / Transformer branch to process raw ECG waveforms as a second modality
Multi-modal fusion — combining structured labs, ECG signals, and cardiac imaging for a richer risk profile
EHR integration — deploying as a hospital-grade decision support system with HL7/FHIR compatibility
Mobile clinical interface — a lightweight bedside app for point-of-care risk assessment

Built With

a
and
and-pandas-and-numpy-for-data-processing.-medical-report-extraction-is-handled-via-tesseract-ocr-with-a-custom-pdf-parsing-pipeline
by
cardioshieldai-is-built-primarily-in-python
clinical
full
github
hosted
is
javascript
llm
locally-run
ollama.
on
powered
project
reasoning
shap-for-explainability
the
through
version-controlled
with-a-fastapi-backend-and-a-react-(javascript/html/css)-frontend.-the-machine-learning-core-uses-lightgbm-with-scikit-learn-for-training-and-calibration

Updates

Shankesh Raja started this project — May 21, 2026 09:34 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.