Advances in natural language processing techniques and the rise of web 2.0 made sentiment analysis a popular area of study. We want to look at how sentiment analysis can be applied to trading stocks to bring automated and timely information to casual traders.

What it does

Hindsight allows users to look at the sentiments of news articles related to a company during a certain date interval and how it changes over time. Users can see if there is some correlation between the sentiment of how others write about the company with the value of its stock.

How we built it

We used the NASDAQ API provided for Cal Hacks 3.0 to obtain historical stock prices for companies we are interested in. Since the API is rate-limited and can be slow, we parsed the XML data obtained from the API in Python and stored them to a postgresql database.

To obtain historical news articles, we had to use web scraping to obtain links related to the companies were are interested in from Google News, including ones from years ago in the past. We used tools such as selenium and beautifulsoup to deal with this. We then store the text from these pages in ElasticSearch to take advantage of its full-text search capabilities.

To obtain sentiment scores, we utilized the Microsoft Cognitive Services Text Analytics API.

Users use our service through a single page web application served by Flask, using modern frameworks including React, Redux, and Immutable.js.

Challenges we ran into

We had significant trouble getting the historical news articles. At first we used the Event Registry service, but we found it had low relevance for some companies (e.g. a one-word mention of Apple in a non Apple-related article was believed to be related to Apple). We decided to use Google News since we can browse historical new articles on the website and it is also highly relevant. However, Google shutted down its News API so scraping a small subset of content was the choice left to us.

Accomplishments that we're proud of

We built a working web app that touches a broad spectrum of technologies in less than 2 days. The problem is quite challenging and involves wrangling lots of data, dealing with heavy rate limitation, and work with many unfamiliar frameworks.

What we learned

We learned to use the Redux framework in conjunction with React and learned about the capabilities of Microsoft Cognitive Services.

What's next for Hindsight

We started working on streaming tweets related to particular companies using Kafka. We want to explore using something like Twitter that updates frequently to provide possible signals for how the market is doing currently and in the near future.

Share this project: