Inspiration

All of us are really interested in data science and machine learnings, so we really wanted to work on a project using ML algorithms in this Hackathon. While brainstorming, we could find massive and information rich dataset related to stock prices, and we decided to deeply analyze the data and create a prediction model.

What it does

Based on today's opening stock price and closing stock price (and the historical trends since the company's foundation), the regression model predicts the next day's stock price.

How we built it

We collected the massive data from websites and filtered by each company's ticker symbol. Then we did data analysis and data visualization, and finally built Linear and Logistic Regression Models, which were combined into a single model using ensemble learning. In order to effectively present our outcomes, we created interactive plots and a website (localhost).

Challenges we ran into

While looking for additional features for logistic model, we wanted to integrate Watson's NLP technologies by finding the tones of each news headlines and twitter posts and use those factors in our regression models. However, extracting massive headline information from API took a long time, and converting the necessary information into a nicely formatted csv file took a lot of efforts and times. We ended up getting a csv file, yet because the data were too small for training (compared to 30 years of numerical data we got from websites), so we couldn't incorporate Watson in our project.

Accomplishments that we're proud of

Our regression ensemble models performed really well, and our final website looks polished.

What we learned

By attending several different workshops, we got a lot of inspirations, not necessarily related to our topic but are really useful and exciting. Also, after creating the stock price model, we realized that the stock prices are heavily correlated to the historical trends, and therefore not extremely hard to predict the future.

What's next for HackPrinceton2017sp

As mentioned above, we would like to improve our model using Watson's tone classification technology to use social media and press that affected the stock prices. In order to do that, we would need to extract a massive news headline data and twitter posts and perform data cleaning for data analysis.

Built With

Share this project:

Updates