Inspiration
Climate change is very relevant today, but people have in the past and continue to have very different opinions on the topic, and argue over these points. Reddit is a space commonly used for people to express their opinions as such, so Fallie Mae provided this dataset to analyze sentiment in posts and comments on a range of subreddits on Reddit.
What it does
Our notebook was used to process, clean, and analyze different topics frequently discussed related to climate change on Reddit posts and comments within the dataset.
How we built it
Using a Kaggle notebook to easily access the data on Kaggle, we used Python Pandas to create and edit dataframes with our data. We also used Python packages including spaCy, n-gram frequency distrubution, and Plotly to better analyze and visualize the results of our data.
Challenges we ran into
Challenges include learning these new packages to work with text data analysis since text data requires parsing through strings and trying to get the program to understand the many complexities of English speech/language.
Accomplishments that we're proud of
We are proud of being able to produce plots representative of clusters of data to show the relationship between topics frequently brought up with one another.
What we learned
We learned how to use different packages for text data analysis (spaCy, n-gram frequency distrubution, and Plotly).
What's next for Climate Change Sentiment Analysis on Reddit
Next, we would like to bring in graph data, and create clusters in graph format using neo4j to create better representations of these clusters to gain a better view on the results of our data. Furthermore,
Log in or sign up for Devpost to join the conversation.