Inspiration
We don't want oil to be expensive. To make oil cheaper we need to reduce costs of drilling. This means reducing maintenance costs.
What it does.
It predicts equipment failure based on aggregated data from 150 sensors.
How we built it
Our model was created using a smaller subset of data enriched with positive outcomes to allow better fitting on skewed data. By bootstrapping this enriched training data, we created a random forest model that predicted response in testing data with over 94% accuracy.
Challenges we ran into
Data set was extremely large and difficult to process efficiently in a short time frame. Biggest challenge was finding a suitably powerful model that was not computationally expensive to build.
Accomplishments that we're proud of
We processed data following a structured and general workflow that optimizes scalability and generalization to any situation that involves large-scale data analysis.
What we learned
We learned how to manage qualitative data with large n and small p while maintaining good ROC values without overfitting or sacrificing scalability.
What's next for Random Forest models for predicting oil drill failures
We want to make sure that the future remains bright for the oil industry and so we hope that our model can increase longevity of vital oil industry infrastructure.
Log in or sign up for Devpost to join the conversation.