With the initial intention of tracking the spread of disease through the use of tweets and their associated geolocations, we wanted be able to study and understand how diseases travel among populations and provide society with warning of incoming epidemics.
At some point, we realized that if Google had tried a similar project only to scrap it, we would be destined to fail with our inferior machine learning and data. The data that we wound up collecting, however, is still incredibly helpful for understanding the needs of society, and we decided to map it against other incredibly helpful statistics to help social infrastructures such as hospitals select the socially optimal expansion points.
What it does
As a purely data visualization app, our app scrapes a number of socioeconomic and disease related information from online sources such as wolfram alpha and twitter. For twitter, our app uses gnip to fetch a real time list of tweets by keywords that indicate disease such as symptoms like coughing, sneezing, or fevers. For wolfram alpha, our app utilizes wolfram's vast database and collects information on every single city in the nation and compares it to wolfram's collection of information on hospitals. After collecting these pieces of information, our app plots them in a concise, informative, and interactive application in which researchers and hospital administrators can study and make plans for the future.
How we built it
We built a node/express backend that fetches streams of tweets from twitter that contain certain keywords pertaining to sickness, and we built a python backend that interacted with wolfram alpha to fetch information on hospitals and cities and then parsed them for socioeconomic information. The maps were built using leaflet.js, and the charts were built with d3.js.
Challenges we ran into
- We ran into a number of situations where the libraries that we were working with did not cooperate with us - be it poor documentation, lack of language support, or restrictions on queries.
- This was our first time deploying to azure, and we didn't quite know what we were doing.
Accomplishments that we are proud of
Successfully collecting massive amounts of data from various databases.
What we learned
- How to work with Azure.
- How to work with wolfram alpha and twitter
- How to visualize data elegantly
- Using alchemyapi or some other NLP library to determine whether or not a tweet a truly pertaining to sickness
- Using a neural net to determine future patterns of disease spread and to determine optimal social infrastructure locations
- Normalizing twitter data based on twitter traffic