Artist Popularity Predictor and Playlist Filter Instructions

Authors: Sam Chao, Shane Dang, and Zulkifli Sales

Things to do before running the Python files:

We will provide the necessary datasets to use in order to achieve the same results, but in the case where you need to download the dataset, we will provide instructions below:

Go to https://corgis-edu.github.io/corgis/csv/music/ and download the "music.csv" under the 'Download' section.
Import the "music.csv" into the IDE or Edstem Workspace
- This file is a very large subset of the original 1 million song dataset.
  - This subset contains 10k rows of information with 35 columns
- We provided a 'test.csv' because of how large this subset dataset still is.
  - Within this file is a subset of 20 songs that will be used for testing purposes.
Install and/or Import this python package for musiclibrarymanipulator.py
- Pandas as pd
Import the MusicLibraryManipulator class from musiclibrarymanipulator.py
- Files that require this are:
  - artist_predictor.py
  - manipulator_tester.py
  - plots.py
Install and/or Import these python packages for artist_predictor.py
- pandas as pd
- From sklearn.tree import:
  - DecisionTreeRegressor
  - DecisionTreeClassifier
- From sklearn.model_selection import:
  - train_test_split
- From sklearn.metrics import:
  - mean_squared_error
  - accuracy_score
Install and/or Import these python packages for manipulator_tester.py
- pandas as pd
- from cse163_utils import:
  - assert_equals
Install and/or Import these python packages for plots.py
- pandas as pd
- plotly.express as px
- plotly.graph_objects as go

Introduction to our project:

* musiclibrarymanipulator is a class within python that represent the filtering/sorting system used within music libraries.
* artist_predictor module is a machine learning predictor with various functions that predicts the accuracy score and mean squared error using DecisionTreeClassifier and DecisionTreeRegressor for the given functions.
* manipulator_tester module is a tester file for the musiclibrarymanipulator class functions and operations.
* plots module creates data visualizations for functions and data organization that can be done using the musiclibrarymanipulator class.

Useful information while running the project:

Observe the data
- after downloading from the weblink provided above, open music.csv and analyze the data
  - as you can see, this is song data containing columns/variables that pertain to each song
- open test.csv
  - because music.csv may be too vast of a library to work with in most cases, this library is a tiny subset of data pulled from music.csv
Take a look at musiclibrarymanipulator.py
- You are free to analyze the content to see what each function does and how the class is initialized, but THIS FILE SHOULD NOT BE ALTERED OR CHANGED!
  - if you take a look, you can see that the functions have different sorting and filtering methods for a dataframe and returns that dataframe as an object
  - take some time to understand what each operation does and how you may use it.
Take a look at manipulator_tester.py
- This is the main file that will be ran when experiementing and reaching the results of our program
  - within this test file, you will see that we have shown how output may be manipulated for each function that is called for a music library object
  - you will also find a method called 'test observbation'. This method is what you will be trying to mimic in your own efforts in producing output and results
- Try messing around and creating your own filter/sort chains in the 'playground' method located near the bottom of the file.
  - you are able to change between test.csv and music.csv
- NOTE: Memory is limited and can lead to kernels killing. To prevent this, use of test.csv and also commenting out methods you are not using (such as the test_operations method which uses the music.csv dataset and a lot of memory) is highly reccommended
Take a look at artist_predictor.py
- This file uses a DecisionTreeClassifier and DecisionTreeRegressor as the main modes of predicting the dataset.
  - There are two DecisionTreeRegressor's that produce the mean squared error and this information will be useful to analyze how efficient of a predictor for numerical data it is.
    - The closer the mean squared error is to 0.0, the better.
  - There is one DecisionTreeClassifier that produces the accuracy score of how accurate the testing model is to the training model.
    - The closer the accuracy score is to 1.0, the better.
Take a look at plots.py
- Upon running this file, you are prompted to download the maps via html file to your local disk.
- When running the plots.py:
  - The following plots are as listed:
    - artist_loc_and_pop
    - plot_ordinary_least_squares
    - plot_pop_rock_2000_2010
    - popular_genres_tempo_by_year
- NOTE: We have provided the HTML files that our plot functions have created in the workspace/directory in a folder labeled "html_plots".
  - These plots can be viewed by single clicking on the file and navigating to the navbar and selecting the open web preview option which will display the visualization you selected in a web previewer to the right of the code on your screen.
    - opening the files after downloading via html produces the same results as the web preview.

Built With

Updates

Zulkifli Sales started this project — Mar 16, 2022 07:57 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.