Inspiration
This off-season was highly tumultuous for the NBA, and we wanted to know how the internet felt about players across the league.
What it does
Our project collects all the mentions of a given NBA player in the r/NBA subreddit across the 100 most popular posts in a given week, and attaches a sentiment value to each individual mention. Combining this data with the upvote and downvote data for each comment, we can get a realistic idea of not only how one person feels about a player, but how the rest of the subreddit feels about that person's opinion. Finally, we calculate a net sentiment score and polarity score for each player.
How we built it
We used Python and the Reddit API in order to scrape both NBA and r/NBA data. We ended up with a collection of NBA players, and a collection of r/NBA reddit posts from the past week. From there, we again used Python in order to find and process the mentions of every NBA player, storing the mentions back into the collection of NBA players. Once the data set was fully processed, we exposed it using a simple Express API, and built a single-page web app in React.
Challenges we ran into
Although VADER Sentiment is well equipped for analyzing social media posts, we often found that the NLP library was unable to identify subtleties within the r/NBA culture, as well as identify certain logical negations and offered very little in terms of subject analysis, which would have been useful when analyzing a mention which includes multiple players, but in different contexts.
What's next for nba-rank
We would love to think about how this concept could be applied to different contexts, and at larger scales. We envisioned that at a larger scale, the data processing could be carried out in an entire server-less architecture using Lambda functions. We'd also want to think about how we can better identify subjects and context within mentions, so we can better attach mentions to individual players.
Log in or sign up for Devpost to join the conversation.