Inspiration
I've always been interested in political science, and this project really grew from that interest. One day I was looking into the results for the democratic primary (so far) and began to wonder how the race would shape up in a few weeks. I wanted a quick and simple way to measure public opinion, and I realized that I may be able to do this myself by analyzing twitter data. I'm also deeply interested in natural language processing, so I thought doing this as a personal project would really let me delve into a field in which I'm interested as well as create something that I think is cool and personally would want to use.
What it does
Mine The Issues is an app that allows users to customize a twitter query through simple text prompts, performs a search using the Streaming API with the custom parameters, and analyzes the data to provide insight into what people are saying about whatever topic the user chooses.
How I built it
The app was built in python. I utilized several APIs to create this project, including the Tweepy Twitter Streaming library, Yahoo Placefinder to add geolocation functionality to the app, TextBlob and the NLTK to perform data mining, tokenization, and natural language processing on the data set.
Challenges I ran into
One of my biggest challenges was that Twitter's Streaming API explicitly prevents filtering by both location and search terms at the same time, which I really wanted my users to be able to do. I was able to hack my way around this by creating custom functions to filter my data by location, something that required the use of Yahoo's PlaceFinder API. Another major problem I had was in optimizing my sentiment analysis algorithm to ensure the most accurate classification of results possible. For this I used several techniques to eliminate "low information" features (essentially noise), something that increased the accuracy of my classifier by over 10% to around 93%.
What I learned
First and foremost, I learned Python. I had never used Python before this week, so the fact that I was able to complete a pretty major project in this language is really exciting for me. I also learned how to use various APIs including PlaceFinder, Tweepy, TextBlob, etc. What I was most excited to learn however was the natural language processing. Implementing the sentiment analysis tools I used in this program required significant research into the theory of natural language processing (e.g., tokenization, n-grams, co-occurences, information gain, features values, etc...), something I plan to continue in the future.
What's next for Mine The Issues
I want to do several things with Mine The Issues: 1) Expand it to include more social media services (Facebook, Google+, Instagram, etc...) 2) Include image mining 3) Move pieces to the cloud (storage in AWS DynamoDB, querying with AWS ElasticSearch, data visualization with Kibana) 4) Optimize streaming and sentiment analysis algorithms for accuracy and speed 5) Create a GUI (plan right now is to use node.js) and make this a fully functional Web Application w/ Mobile to come afterwords
Log in or sign up for Devpost to join the conversation.