Inspiration

The inspiration for this project comes from our passion for basketball and fantasy sports. Typically, NBA players that have been playing at a high-level for a while can be expected to continue doing so. This can also be said for the players who are at the other end. However, there are players in between who show flashes of improvement season-to-season, and have potential to become stars in the following season. These "swing players" are of huge interest to us because drafting them gives teams a great chance to win their league. As a result, we were interested in creating a study to predict which "swing players" would showcase major improvements in the 2025-26 season, and we'd validate our results in real-time as they continue to play this season.

What it does

The Hex notebook we developed does the following:

  1. Imports the data for the project, in the form of NBA statistics, Bluesky posts about the player, Wikipedia mentions about the player, and contract information about the player.
  2. Filters out players who are experienced.
  3. Engineer new features that resemble swing players, like stat improvements, whether they are in the final year of the contract, and sentiment of posts.
  4. Normalize all features
  5. Compute a breakout score for each player and rank players by breakout score, a function of the engineered features. ## How we built it We started this research project by scraping data from basketball-reference.com. We gathered relevant player stats and narrowed our dataset to players who are in their second years up to players who are in their fourth years. We took the players' differences in relevant statistics and multiplied them together with a weight and normalized the added results together to give an actual breakout score. The higher the score, the higher chance said player is going to breakout the following season.

Challenges we ran into

Some of the challenges that we ran into is gaining access to reddit API's as we would have liked to scrape some posts about the players in our dataset. Reddit is a really huge platform with a huge fanbase active in the subreddit and we would have wanted that data in our calculations for an even better prediction model. Another issue that we are running into is the sentiment analysis of the model about what was said about the player as some of text are not properly classified as positive or negative sentiments.

Accomplishments that we're proud of

Some of the accomplishments that we are proud is that our findings are lining up with promising and bright players who are breaking out in the NBA currently; so for the most part our breakout score model seems to be working out fine.

What we learned

We learned that there is some correlation between opportunity and the breakout probability of the players. Rarely did we find a player who has gotten more minutes but their performance took a back seat. However, there were some numbers that may have cause our predictions to have some outliers as injuries and players' off the court issues are not factored into our research since these datasets are not easily attainable.

What's next for Predicting NBA Breakout Players

We planned to come up with a more robust prediction model and modify the weights. We also intend to add more relevant data into our calculations such as minimum games played so that we can really pinpoint true breakouts.

Built With

Share this project:

Updates