Inspiration
Our team went talked to numerous mentors and found a lot of interesting projects, but they were often being pursued by other people as well. We wanted to create something different -- something that would challenge our skill sets and push us to learn more. We starting talking with companies like Optum, Caterpillar, and Capital One to get a scope of prevalent challenges in Open Source. After our discussions, we decided to pursue financial technology because that area was new to all of us. HISA, HackIllinois Stock Analytics, was not only a project to demonstrate our abilities, but also a project that let us dive deep a core application of computer science.
What it does
Our python package provides an intuitive real-time stock data access and prediction on future trends based on sentiment analysis of past tweets regarding the target company. We use the Alpha Vantage API to get raw stock data and then provide a wrapper that easily modifies the data to fit the user's needs. We further augment stock data with sentiment analysis on previous tweets regarding the particular stock's company. The sentiments are categorized as positive, negative, or neutral with NTLK, and then these results were fed into a regression linear model alongside stock prices for that time frame. This gives the package the ability to predict future trends based on detected twitter sentiments.
How we built it
We built it using python with some help from APIs. The Alpha Vantage API allowed for access to stock data and an open source python twitter scraper allowed for access to more tweets than the free API. The tweets were fed into NLTK for sentiment analysis. Now the dataset is ready for training. We processed the results of the sentiment analysis and the stock data to get feature vectors which we used to train our regression models.
Challenges we ran into
We originally started off using the Twitter API to gain access to tweets. However, this came with many drawbacks, such as limited access to tweets such as a max of 7 days worth of tweets and only 100 tweets every 15 minutes. This initially forced us to work with limited data to train our models with. We later switched approach to an open source scraping library to get all the data we needed.
We also had trouble converting the sentiment results from NLTK into a format that was more intuitive to use with our models and so that we could use it for testing and other purposes. In the end we decided to convert it to csv files and then process it further into larger matrices of feature vectors.
Accomplishments that we're proud of
We are proud of delving deep into a subject that was completely unfamiliar to us with some very good results. We were able to train a linear regression model successfully on sentiment data and stock prices. The machine learning model will output a predicted stock price for a company after being fed sentiment data from the past hour. Furthermore, learning and contributing to open source was a first for some of us, and it was a great experience!
What we learned
We learned a lot about finance and open source. In terms of finance, we understand the basics of financial technology and how computer science drastically improves it. We also learned about the fundamentals and importance of open source. Open source APIs greatly contributed to our idea, and we are excited to give something back to the community as well.
What's next for HISA (HackIllinois Stock Analysis)
We hope to continue expanding on our project. We wish to add more graphical utilities such as plots and interactive diagrams. We also want to continue to increase the accuracy of the models that we use.
Log in or sign up for Devpost to join the conversation.