Suppose you work as a healthcare policy maker and you want to make laws that make less people have diseases such as diabetes or cancer. To decide which policy changes should be implemented, you would like to have an estimate in how your policy changes affects the risks for diseases such as cancer in the general population.

What it does

We offer an interactive data map of the US in which policy makers can test how proposed policy changes improve risk for selected diseases. For example, a policy maker can interactively see how the risk of diabetes decreases when people start doing more workouts or when people stop smoking.

How we built it

The CDC dataset contains hundreds of features and so we started with a feature selection step. After that, we trained a Naive Bayes classifier on the preprocessed dataset in order to predict how a change in policy (e.g., smoking is forbidden) affects the outcome of the target variables.

Challenges we ran into

From an accessibility point of view, we needed to make sure that our tool can be used by non-experts in Machine Learning. From a more technical point of view, we had to deal with a lot of missing data in the CDC dataset.

Accomplishments that we're proud of

We believe that our tool is highly accessible to non-experts in Machine Learning because policy makers are not usually trained in Machine Learning. We believe that it is interesting direction to use Machine Learning to "test laws" before they are implemented in reality.

What we learned

It appears to be an interesting problem to predict how policy changes would affect their target variables. Working with health data is exciting and can help reduce cost for health care and improve the life of millions of people.

What's next for 2017 CDC Survey Data Interactive Map

To further improve our predictions, we could use more involved classifiers than the ones we did. Furthermore, it will be intriguing to make the policy changes more fine-grained: Right now we assume that smoking will be forbidden completely; in reality, it might be more realistic to consider the case in which the fraction of smokers is reduced by 30%.

Share this project: