Diseases are correlated - some you can easily think of, some you cannot! Arthritis is recently found to increase the risk of diabetes. Such relations can be too obscure to be found by tuition, even for researchers and medical practitioners.

What it does

We combine a few data source to find what kind of disease/life habits correlation are being "ignored" or less studies by the research. Clinical survey data BRFSS gives us insights on which diseases are correlated; PubMed and Google News serves as good source to find the research and news coverage. Tech Specification

Please see our DEMO at

How we built it

We use numpy, sklearn, pandas for insights. d3.js, viz for visualization.

Challenges we ran into

BRFSS clinical survey is SO COMPLICATED (yet so powerful)!

Accomplishments that we're proud of

  • We integrated multiple data sources for unique meaningful insights!
  • We built beautiful and visually striking visualization to demonstrate our finding!
  • We have done some many things in such a short time!
  • We work as a team!

What we learned

  • Don't have too much sweets, says the dataset.
  • A dataset can always go beyond itself!

What's next for What's Missing?

  • Grow in size and diversity beyond the current 450k-row BRFFS data
  • Leverage NLP to extract semantic-level correlation in Google News and PubMed
  • More sophisticated metric for measuring discrepancy

Built With

Share this project: