Inspiration

This project was inspired by a desire to understand how early antiretroviral treatments influence immune recovery and long-term outcomes in patients with HIV/AIDS. Using data from the AIDS Clinical Trials Group (ACTG) Study 175, we explored how treatment choice, baseline immune health, and patient characteristics interact, and how data science can uncover clinically meaningful patterns from real-world trial data.


What We Built

We conducted an end-to-end analysis combining statistical methods, machine learning, and model interpretability to study treatment effectiveness and predict clinical failure.

Our workflow included:

  • Exploratory analysis of immune markers (CD4, CD8, and CD4/CD8 ratio)
  • Comparison of treatment regimens over time
  • Survival analysis using Kaplan–Meier curves
  • Predictive modeling of treatment failure risk

We trained two main models:

  • Logistic Regression as an interpretable baseline
  • Random Forest to capture non-linear relationships and feature interactions

To ensure transparency, we used SHAP values to explain which features most strongly influenced model predictions.


Key Insights

  • Combination therapies (ZDV + ddI, ZDV + Zal) consistently outperformed monotherapies in immune recovery and survival probability.
  • CD8 inflammation levels were more predictable than CD4 recovery, suggesting different biological dynamics.
  • Prior treatment exposure significantly increased failure risk, highlighting the impact of drug resistance.
  • Counterfactual analysis showed that remaining on treatment could substantially reduce predicted failure risk for high-risk patients.

What We Learned

Through this project, we learned how to:

  • Handle clinical datasets responsibly while avoiding data leakage
  • Balance interpretability and performance in healthcare models
  • Evaluate models beyond accuracy, focusing on minority-class risk
  • Use explainable AI tools to turn black-box models into actionable insights

Challenges

  • Class imbalance made treatment failure prediction difficult and required careful metric selection.
  • Ensuring model outputs aligned with clinical intuition was non-trivial.
  • Feature engineering had to be done cautiously to preserve validity and interpretability.

Takeaway

This project demonstrates how data science can be used not just to predict outcomes, but to understand them. By combining statistical analysis, machine learning, and explainability, we show how historical clinical trial data can still inform better treatment insights and patient risk stratification.

Built With

Share this project:

Updates