Inspiration
This project was inspired by the need to understand which socioeconomic factors contributed most to COVID-19 deaths in different communities. The pandemic revealed that health disparities—such as access to healthcare, income, and pre-existing conditions—played significant roles in who was most affected. We sought to use this data to identify predictors and improve responses to future health crises.
What it does
This project uses data on socioeconomic factors and COVID-19 deaths by ZIP code in Chicago to build a predictive model. It aims to identify key factors that influence death rates, helping to guide future public health responses.
How we built it
We gathered data from the City of Chicago Health and Human Services Department and the U.S. Census. Using multiple linear regression (MLR) and Lasso regression, we refined the model to focus on key predictors, improving its predictive power while addressing issues like multicollinearity.
Challenges we ran into
One significant challenge was handling missing data, particularly in ZIP code 60666 (Chicago O’Hare International Airport), which lacked complete Census data. The presence of multicollinearity among predictors also complicated model development, requiring additional steps like VIF to ensure reliable results. Additionally, while the model showed reasonable predictive power overall, the lack of statistically significant individual predictors highlighted that the factors chosen might not fully capture the complexity of COVID-19 mortality.
Accomplishments that we're proud of
The recent COVID-19 pandemic had devastating global impacts, and understanding the factors that influenced mortality rates is essential for managing health crises and preparing for future pandemics. By analyzing data collected during this time, we can uncover patterns that may help predict outcomes in future outbreaks.
What we learned
Through this project, we learned how socioeconomic and health-related factors impact COVID-19 mortality rates. We also explored various data science techniques like multiple linear regression (MLR) and Lasso regression, learning how to refine a model to improve prediction accuracy. This process emphasized the importance of addressing collinearity and avoiding overfitting in predictive models.
What's next for COVID-19 Risk Prediction by Socioeconomic Factors
Our model demonstrated moderate predictive power for COVID-19 death rates, but further analysis and refined methods are needed to uncover more significant factors. Moving forward, we aim to explore alternative modeling techniques that could offer better insights into the complex relationship between socioeconomic factors and health outcomes. This project underscores the need for health equity and highlights how data can guide policy to reduce disparities in future health crises.
Log in or sign up for Devpost to join the conversation.