The process of drilling new wells, especially offshore, is extremely challenging and costly. After reaching the seabed more than 3,000 feet underwater, rigs in the Gulf of Mexico must drill through an additional 20,000 feet of rock. In these extreme environments, where temperatures and pressures far exceed regular drilling conditions, specialized equipment and teams are required.

These operations can involve hundreds of people and equipment with very high daily drilling costs. Reducing the time it takes to drill by even a few hours per well can result in significant savings for the company and provide a significant competitive advantage as more and more wells are drilled.

What it does

We came up with a variety of models to predict the rate of penetration given controllable drilling parameters.

The random forest model:

  • Selected hyperparameters through cross-validation
  • Obtained best prediction results at max_depth = 15, num_estimators = 100, min_split = 2
  • Achieved best RMSE of 16

The neural net model:

  • 5 layer feed-forward network
  • Input layer size 310
  • 3 hidden layers of size 2,5, and 5
  • Output layer size 1
  • Last layer activation is linear because this is a regression problem, all others ReLU
  • Experimented with nodes per layer, batch size, and optimizer
  • With a relatively small dataset, there is a risk of overfitting

In the end, we ensembled the two models with a greater emphasis on random foresting.

How we built it

We used python, sci-kit learn, pandas, numpy, and seaborn primarily during the development of this project.

Challenges we ran into

  • Dealing with categorical variables
  • Feature selection
  • Trouble hooking up backend to frontend

Accomplishments that we're proud of

Our models range from a RMSE of 16 - 20 on the validation data. We also have some incredible visualizations that can be used to further understand our data and improve our models.

What we learned

We learned how to use a variety of models and data visualization techniques to come up with the best predictor possible.

Built With

Share this project: