Learning the market and general sentiment on specific topics (such as stocks etc.) through analyzing massive public data sources like Twitter/Glassdoor got us ticking. We had only elementary ML and python experience and think that this hack-a-thon could be a great opportunity to tackle this hard problem.
We started with a complete team of enthusiasts but this number went south quickly as we went on. In the end it was just 3 of us coding (and reading up on things) and one guy trying to come up with the math. Finding datasets we could work with proved to be one of the biggest challenges. Especially when Glassdoor was too stingy to give an API for streaming reviews. We had to write a crawler in python to extract information page by page (super nightmare!). We also managed to plug into Twitter to get real time tweet information on topics that ranged from Microsoft and Google, all the way to Donald Trump! Humans do express "sentiment" in quite the creative number of ways! We ultimately managed to put together a set of scripts to:
- Pull data from the sources like glassdoor and twitter
- Train our classifier on a sample dataset extracted from step 1.
- Run a simple native Baye's classification algorithm on the larger data set.
With some improvement, we believe that this project can be helpful in many academic and business realms, such as learning the key concepts on a new topic, getting new perspectives on topics, assessing customer responses to a product, just to name a few. This project proved to be a great learning experience for us and we plan to keep adding incremental enhancements as and when we learn more!
Log in or sign up for Devpost to join the conversation.