Inspiration

Financial projections are highly influenced by impactful current events.
Most statistical models are inconsistent when fed real-time data.

What it does: stock market visualization and classification based on real-time data

Given an industry (eg: the airline industry) and a time frame we:

  1. Imports the data from the Google Finance API
  2. Plot the time course data in 3D using Plot.ly with heat map and over time scale.
  3. Uses the NYT API to parse through NYT/AP/Reuters articles over said time course with given stock name
  4. Builds and trains a Bayesian machine learning classifier on the given 10 most positive/negative 1-day swings
  5. Applies the trained data to new incoming data to predict increase/decrease
  6. Given an uploaded plain text (user input) on an Amazon EC2 server, outputs either 'positive' or 'negative' given sentiment of the news.

How we built it

Using Python, we utilized Google Finance API to access stock data over a user input time course. We appropriately stored our data and created heat map representations from our plotting data. From there, we found the 10 largest positive/negative 1-day swings (measured from the close less the open price for a given day). We created and trained a Bayesian machine learning classifier using headlines generated from the Google News API from each day, and ran our data on known results to achieve a 75% accuracy. We then hosted our results on an AWS instance for presentation purposes.

Challenges we ran into:

  1. NYT API limits queries to 5/second, limiting the amount of data we could reasonably collect. To work around this, for our examples we used the Google News search results.
  2. Limitations in file sizes, need to work on EC2 instance hosting
  3. Need access to more Reuter's data (contacted Reuters Corpora @ NIST for RCV1 training data, no response yet)

Accomplishments that we're proud of

  1. 75% predictive accuracy for article headlines/article body text using customized Bayesian classifier
  2. Real-time working, interactive plots for data

What we learned

  1. Using custom plotting and finance APIs, retrieving financial data from server using RESTful calls
  2. Creating heat map representation from plotting data
  3. Bayesian Machine Learning
  4. Using EC2 instance to host MVP

What's next for FinancialViz

1, Automation of news article detection

  • Can currently automatically generate articles however needs to classify articles on financial relevance
  • Needs way classify companies in direct competition with each other 2, Work on front end for tools
    1. Link each point in plot with corresponding max impact article link (WIP)

Our product

  • Groups companies in direct competition with each other
  • Uses machine learning to detect stock behavior from articles in real-time
  • Classification aids in predicting future market behavior

Prototype

  • Analyzes the airline industry
  • Manually accepts news articles
  • Uses machine learning to detect stock behavior from inputted articles
  • Classification aids in predicting future market behavior

Methodology

  • Test data was manually collected based on the derivative of stock price changes over the period of 24 hours
  • Relevant articles were manually collected and assigned a value based on above derivative value
  • Standard 60-20-20 train validation-test split used

Built With

Share this project:
×

Updates