What it does
Predicts the Rate of Penetration
Inspiration
We thought this would be a fun time to apply some data science skills that we learned over time for kaggle.
How we built it
Data Cleaning: exploratory data analysis to check for outliers, missing data, and explore the statistics and distribution we compared each of our features against our label which is our rate of penetration. Once we cleaned the data and removed the outliers, we used a standard scaler in our data preprocessing step to scale our data. So for example, our features (x) contain all the quantitative variables except for categorical variables and our label (y) which is the "rate of penetration"
Moving forward, we used kfold cross validation to check our model performance, fitting the model with light gradient boosting tree to increase our performance, we did hyperparameter tuning using hyper opt which is a library for Bayesian optimization. After getting the final parameters we trained our model with those parameters and then ran it on the test set.
As a result, we got an RMSE on average of less than 10 on train, and about 20 - 25 on our validation.
Challenges we ran into
Dealing with categorial column values.
Log in or sign up for Devpost to join the conversation.