Taking care of your lungs should be as easy as checking the weather.

In our health-conscious society where information can be easily be spread, people have a right to know how pollution in the air effects respiratory health. While in the modern day we are accustomed to certain points of data such as weather, the goal of this project is to make data regarding air quality and inform them of some of the health risks of their environment and what precautions to take.

What it does

It analyzes pollution data and mortality rate of pneumonia and influenza to determine how these two are correlated with each other.

How we built it

Using SAS, we modeled the relationships between mortality rates and air pollution and demonstrated both statistically significant and insignificant relationships. We then explored this data potential for end-user consumption.

Challenges we ran into

Immediately, our inexperience with machine learning was a problem that we had to overcome. As lacking solid foundations in Python and related machine learning libraries, we experimented with tensorflow and pytorch. While this was a valuable experience, were realized in the interest of time it made more sense to focus on the Linear regression and correlations between pollutants and respiratory diseases as these concepts are the fundamentals on which a machine learning algorithm could be based.

Accomplishments that we're proud of

We were able to use a large amount of data in our project. Over 100k data points were used in the creation of the linear models we were able to produce which is something we were incredibly surprised by.

What we learned

While completing this project, we learned that not all analysis produces the results that might seem intuitive. When we modeled CO relative to the mortality rate, we found no statistically significant correlation between the two. This discovery put into focus how important putting air pollutant information in laymen's term. While all the information we used and intend to use going forward is publically available, it doesn't exist in an accessible, easy to access form.

What's next for Respiratory Disease and Air Quality Alert Generation

One of the major successes of our project is how we collected over 100,000 data points to determine linear regressions to determine the correlation between pollution and respiratory disease. With that in mind, we plan to add on far more data points that record different factors such as cases in other respiratory illnesses such as asthma as well as more data about the pollution levels. We intend on continuing our project by using the linear regressions we generated for machine learning to forecast the dangerous air pollutants to make connections with big data that might not be obvious for humans to correlate together and push that information out to the health-conscious audience so they are aware of the harms of life's simplest pleasure breathing.

Built With

  • sas
Share this project:

Updates

posted an update

 The program downloaded csv datasets offline and determined which dataset's would be correlated with each other. This program was then studied in SAS to determine the correlation between these data values. A GUI is being created just to display some of the graphs created by the SAS programs. Future plans include more data sets regarding other testing points such as different molecules weather conditions including temperature and wind. Their also plans to use these linear regression graph for machine learning. 

Log in or sign up for Devpost to join the conversation.