What it does

Predicts the Rate of Penetration

Inspiration

We thought this would be a fun time to apply some data science skills that we learned over time for kaggle.

How we built it

Data Cleaning: exploratory data analysis to check for outliers, missing data, and explore the statistics and distribution we compared each of our features against our label which is our rate of penetration. Once we cleaned the data and removed the outliers, we used a standard scaler in our data preprocessing step to scale our data. So for example, our features (x) contain all the quantitative variables except for categorical variables and our label (y) which is the "rate of penetration"

Moving forward, we used kfold cross validation to check our model performance, fitting the model with light gradient boosting tree to increase our performance, we did hyperparameter tuning using hyper opt which is a library for Bayesian optimization. After getting the final parameters we trained our model with those parameters and then ran it on the test set.

As a result, we got an RMSE on average of less than 10 on train, and about 20 - 25 on our validation.

Challenges we ran into

Dealing with categorial column values.

Built With

Share this project:

Updates