What it does
Given a dataset our goal was to find a model that predicts the rate of penetration when drilling a well.
How we built it
We first cleaned the data by analyzing variability and removing outliers depending on that analysis. We then ran correlation analysis to make sure of what variables we use to create our model. We then trained and tested various models including but not limited to kNN, random forest, and xGboost. In the end we chose xgboost after hyper-parameter tuning.
Challenges we ran into
The most difficult parts were deciding what to do with outliers, dividing which variables were most important, and then tuning parameters to get the best fit model.
What we learned
We learnt the process of preprocessing and processing data to retrieve a good model. We learnt new machine learning algorithms and data science techniques as well as python libraries.
Log in or sign up for Devpost to join the conversation.