Prediction of stock prices is a hot topic in today's financial intelligence world. However, it has never been easy. In the short term, there is evidence that momentum exists in stock prices. Technical analysis, which makes use of previous stock price data to predict future prices, has its merit in this case. However, in the long term, trends in stock prices become elusive. Therefore, we attempt to introduce external inputs into the prediction to improve the accuracy. Market news, which is a typical source of information for equity research professionals, becomes a natural choice for us.

What it does

Instead of analyzing market news manually, our project enables the quantification of features of market news. These features can then be used as inputs for our prediction model. With these external inputs, the stock prices that we have predicted are closer to actual prices, as compared to those predicted solely using past prices.

How we built it

  • stage1: Data cleaning and normalization
  • stage2: Model Identification -- ARX linear dynamic model identification -- determining the order of the system by good fitting
  • stage3: Model Validation
  • stage4: Visualization

Challenges we ran into

  1. Our raw data is about 7 gigabytes, which is too large and inefficient for us to analyze in the limit time.
  2. If we look back for the period of 7-days, the model will process with a high dimensional space, 28^2=784.

Accomplishments that we're proud of

  1. We selected and cleaned the data using PCA(Principal Component Analysis) to significantly reduce the data size to approximate 20 thousand.
  2. Linear ARX (auto-regressive exogenous) model was applied in the prediction stage to reduce the dimensionality from 28^2=784 to 7*4^2=112.
  3. Successfully turned the sentiments expressed by the news into concrete numbers, resulting in prediction process will be more objective, and less affected by human emotion.
  4. This model enables automatic searching and analysis of a large number of news, and help investors to respond to market change more timely
  5. Most importantly, it shows that the importing news as extra inputs obviously increases the stock prices prediction accuracy compared with the single source prediction.

What we learned

When doing a project, there might be different ways to optimize the model and therefore the outcomes. It is important to critically analyze them and find the best possible way. Always good to strive for perfection :)

Also, sometimes it takes longer to prepare for a project than actually completing it. It is important to start planning early, having a clear idea of the workload and timeline, so you won't rush at the last minute.

What's next for stock price prediction with news

We hope to expand our data (both news features and stocks) to test the validity of our model. If possible, we want to pitch it to investment firms for actual use in the market :)

Built With

Share this project: