When a regular retail investor, or even an investor at a fund, decides to make an investment what do they do? They take a look at the trends of the company through its technical indicators to see if its being under or overvalued in the short and long term and then they take to the news to see what the companies recent endeavors are. If the investor finds articles that positively reflect the company they are likely going to invest and vise versa, as does every other investor that is looking into this company.
What it does
The Eureka AI, is a neural network that is trained on the compilation of over seven thousand news articles and technical indicators. Once trained on this it has the potential to analyze current news articles and indicators to determine if the price of current stock is going to go up or down and buys, sells, and shorts accordingly. When picking between positive, neutral, and negative changes to the stock we have achieved a 61.44% accuracy on some training runs, as opposed to the 33% expected from a random classifier.
How we built it
We gathered the news data from the EventRegistery API and preprocessed the data with the Natural Language Tool kit to determine the polarity, subjectivity, word count, and sentence count. Then we utilized pandas to process that into a dataframe where we then coupled the news data with pricing information pulled from the IEXFinance API and technical indicators that we calculated with the Technical Analysis library.
After we gathered the training data set we ran the dataframe into a sciKit-Learn backed neural network, that would then be able to determine future buy, stay, or sell scenarios at a 61 percent accuracy rating.
Challenges we ran into
Data mining for this project proved challenging. Most API's found required a premium to make the necessary calls per second to gather to amount of data points we needed (AlphaVantage for example caps the calls at 5 per minute). As a result, we had to use mainly open source data sources which proved to either be unclean or would just be processed as noise.
Accomplishments that we're proud of
The preprocessing mechanisms we developed filtered through 250,000 data points and was able to determine news scores and indicators for each while simultaneously filtering out the noise to result in thousands of clean data points. When picking between positive, neutral, and negative changes to the stock we have achieved a 61.44% accuracy on some training runs, as opposed to the 33% expected from a random classifier.
What we learned
Each stage of processing took several hours so developing better preprocessing techniques could be the focus of our next endeavors. Many variables affect stock price mobility, but with the advancements in machine learning and AI we may be able to predict future trends with current, clean data sets.
What's next for Eureka Indicator
Get funding for cleaner data sets and write a more robust machine learning algorithm that takes in more technical indicators and handles noise effectively enough to generate an alpha of 7% or more.