Random Forest models for predicting oil drill failures

Inspiration

We don't want oil to be expensive. To make oil cheaper we need to reduce costs of drilling. This means reducing maintenance costs.

What it does.

It predicts equipment failure based on aggregated data from 150 sensors.

How we built it

Our model was created using a smaller subset of data enriched with positive outcomes to allow better fitting on skewed data. By bootstrapping this enriched training data, we created a random forest model that predicted response in testing data with over 94% accuracy.

Challenges we ran into

Data set was extremely large and difficult to process efficiently in a short time frame. Biggest challenge was finding a suitably powerful model that was not computationally expensive to build.

Accomplishments that we're proud of

We processed data following a structured and general workflow that optimizes scalability and generalization to any situation that involves large-scale data analysis.

What we learned

We learned how to manage qualitative data with large n and small p while maintaining good ROC values without overfitting or sacrificing scalability.

What's next for Random Forest models for predicting oil drill failures

We want to make sure that the future remains bright for the oil industry and so we hope that our model can increase longevity of vital oil industry infrastructure.

Built With

Updates

Johnathan Lo started this project — Oct 20, 2019 12:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.