While walking through the beautiful UCLA campus, we were brainstorming how to combine the machine learning APIs from Google Cloud and the plentiful of financial data provided by BlackRock's API. To investigate the effect of media on public opinion, the most straightforward way seemed to be looking at the effect of various news articles at stock prices.
What it does
A web scraper scrapes an average of 20 articles a day for the past 30 days. This is then passed through the entity sentiment analysis of the Google NLP API to find the relevant salience value and the sentiment analysis to find the overall document sentiment. After calculating a value for a several articles, we use a mean measure to predict whether a stock will fall or rise. We then compare to the stock data provided by BlackRock to check our prediction score.
How we built it
A web scraper built on Python scraped the Google News API. The news article were then passed to the entity sentiment analysis to check the salience of the relevant keyword (such as if the company we were considering is 'Apple' then the keyword with the most salience should be 'Apple'). If the salience was high enough, we check the sentiment analysis to see if it is positive or negative. This is then compared against BlackRock's API.
Challenges we ran into
There are many different moving parts of the project. Of the five members, we were initially two groups of two and one individual member who then came together to form a group. So building team chemistry and trust was an underlying hurdle.
Figuring out the different parts of the problem, dividing and conquering; and then eventually combining seemed to become a monumental task. We ran into issues with finding a proper algorithm for salience and sentiment analysis (we still feel it can be improved). An issue that was out of our control was the internet issues the venue was having due to its size.
Other issues included trying to find a proper way to visualize the data, finding relevant news articles, better scrapers, improving quality of news sources and finding a reliable metric for prediction.
Accomplishments that we're proud of
At the beginning of the hackathon, we had the choice of choosing a far simpler project compared to this; but in the end we are proud that we chose this idea and were able to complete as much of the task as we could - an exercise in challenging the unknown.
What we learned
A lot, in summary. A lot about the Google API for natural language processing and BlackRock's API for financial analysis; HTTP Requests through Python, web scraping and most importantly a keeping a calm head through problematic situations.
What's next for Senti_Stock
Figuring out a proper algorithm with the values provided by the Google News API, finding better machine learning models and a better front end engine to visualize our data.