Stock Related Tweet Sentiment Analysis vs Stock Performance

Inspiration

Going on Kaggle, we saw tons of interesting and fun data sets. However what seem to catch our eye was a huge data set full of tweets about Tesla, Amazon, Microsoft, Apple, and Google. Due the whole GME drama, interest for stocks and investing has been at an all time high. This led us to believe that creating a stock oriented project was the best way to merge current events and our own personal interests at the same time

What it does

Taking in as input two dates creating a time range as well as a ticker symbol from the list TSLA, GOOGL, AMZN, or APPL, it plots the rolling average of the sentiment analysis of the tweets from that time period and specific to that company, as well as the company stock performance over the same time period, and presents it in a fashion where you can compare to look for similarities in patterns. The point of the project is to be able to see if there was some sentimental influence from the stock performance on the tweets.

How we built it

Mainly in jupyter lab, We imported the necessary libraries such as the VADER Sentimental analysis, Pandas and Pyplot. Next we imported the datasets of tweets, stock data, and tweet company relation. Next was to create our own DataFrame which precalculated the sentimental scores for each tweet in the data set, as well as applies a scale to each sore value based on the weight of the tweet in terms of likes and retweets with a homemade algorithm. We take the dates as input, using pd.to-datetime for the input and the unix epoch time in the data. A slice of the sentiment analysis tweets for the specified company is taken for the time period specified. We then first plot the sentiment compound scores with a rolling average, then we plot the company stock chart, and lastly we graph the positive and negative sentiment scores underneath the stock chart, and then we plot a positive sentiment vs negative sentiment graph.

Challenges we ran into

There were many challenges we ran into. One being the issue with efficiency. Some processes would've taken several minutes, if we hadn't implemented some solutions, such as pre-processing the sentiment data and rejecting all 0 values in the sentiment data. Another challenge was the issue with plotting the several plots. We ran into issues of scaling, centering, and rolling average.

Accomplishments that we're proud of

This is our first hackathon, I we are proud that we survived it in one piece. We are also proud of being able to make this idea into a reality, and that we were able to make an algorithm that makes sense with the data.

What we learned

We learned the importance of teamwork (as cheesy as that sounds) and important coding topics such as sentiment analysis, pyplot, and even teaching processes in python.

What's next for Stock Related Tweet Sentiment Analysis vs Stock Performance

Next would be expanding the companies that it is able to support. As well as expanding the tweet data since we are only limited to dates between 2015 and 2019.

Built With

jupyterlab
kaggle
matplotlib
nltk
pandas
python
vader-lexicon

Updates

Tim Kraemer started this project — Feb 14, 2021 05:32 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.