We were inspired to make this project because YouTube has such a huge influence on our generation. So we were wondering if it would be possible to use data from YouTube videos to predict their future success. We also hoped to create a model that would help YouTubers evaluate how successful their videos would be in order to better appeal to their audiences.
What it does
Our program takes view, like, and comment information from a database of past trending YouTube videos, and compares it with the current video data obtained through a web scraper program. We then use the information from these two sources to build a model that can predict the current number of views given the past data.
How we built it
We first made a web scraper with selenium that iterates through the videos in the database and collects the current data on that video. Next, we made a script that organizes all the necessary information into a .csv so that we could parse our data in a regular manner. We then ran a linear regression model on our data to see which variables had the biggest impact on the future number of views. Then we created an algorithm that tests different combinations of weights and variables to produce a model that fits our data the best.
Challenges we ran into
We had trouble deciding what dataset we should analyze and what to do with the data. For a while we considered looking at how the titles affected the number of views which we didn't end up using, and we also struggled with understanding some of the statistics behind our modeling. Finally, we had trouble obtaining some of the data with our web scraper.
Accomplishments that we're proud of
We were able to successfully implement a web scraper that collects data from youtube videos and stores it into .csv files. Additionally, we were able to successfully use version control and collaborate with one another to build our project efficiently. We also made an algorithm that finds the best parameters that give the optimal results for our model.
What we learned
We learned how to improve runtime on our files, how to use GitHub effectively, and how to implement a data science project.
What's next for Modeling Youtube Success
We want to collect more data and run our model over a larger dataset and also improve our model by looking at different possible parameters.