Inspiration

HIV/AIDS remains a critical global health challenge, and disparities in healthcare access and outcomes can place certain populations at greater risk. We were motivated by the social impact of healthcare data and the opportunity to analyze a real clinical trial dataset rather than simulated data. The AIDS Clinical Trials Group Study 175 (ACTG175) provided a meaningful setting to explore how patient demographics, immune system markers, and treatment strategies relate to health outcomes. Our goal was to use data science and statistical analysis to better understand disease progression, treatment effectiveness, and potential inequities in care.

What it does

This project explores clinical trial data from ACTG175 to analyze demographic disparities, immune system health, and treatment effectiveness among patients diagnosed with AIDS. We investigate whether outcomes differ across demographic groups, examine trends in immune markers such as CD4 and CD8 cells over time, and evaluate how treatments influence immune recovery. We also model changes in patient health using statistical analysis and machine learning to better understand which factors contribute most to treatment response and disease outcomes.

How we built it

We used the ACTG175 dataset from the UC Irvine Machine Learning Repository and conducted our analysis in Python. After merging features and outcome labels into a single dataset, we cleaned the data and did multiple tests to analyze the data.

Our exploratory data analysis included demographic subgroup comparisons, correlation analysis, and visualizations such as histograms, scatter plots, and reference-threshold charts to interpret immune health. We applied chi-square tests to examine associations between demographic variables, treatment groups, and clinical outcomes. To model treatment effectiveness, we used linear regression and a Random Forest regression model, and based it on baseline demographics, immune markers, and treatment indicators. Model performance was evaluated using R² and RMSE, and feature importance was analyzed to identify the most influential predictors.

Challenges we ran into

One major challenge was interpreting unfamiliar medical variables correctly based on the short descriptions provided, and ensuring our analysis aligned with their clinical meaning. Additionally, many variables showed weak or complex relationships, making it important to carefully interpret visualizations and avoid forcing relationships that are not supported by the dataset.

Accomplishments that we're proud of

We successfully transformed a complex clinical trial dataset into meaningful insights about immune health, treatment effectiveness, and patient risk factors.

What we learned

How to perform exploratory data analysis on real clinical data How to visualize data to communicate insights The importance of choosing appropriate statistical analysis models How to build and evaluate machine learning models That real-world healthcare data is complex and often resists simple explanations

What's next for ExploringAIDS

This project highlights how data science can help us better understand healthcare outcomes. Even simple analyses can reveal meaningful patterns about immune health, treatment effectiveness, and patient risk factors. With richer data and more advanced models, similar approaches could support future medical research and clinical decision-making.

Built With

Share this project:

Updates