Inspiration

The challenge presented by ConocoPhillips fit the team’s backgrounds and was a challenge faced in the real world with real consequences. Although we all come from different backgrounds, a burning passion for problem solving united us all and pushed us forward through the confusion and sleep deprivation towards creating a solution that would accurately predict equipment failures.

What it does

Our model predicts which pumps will experience underground equipment failure via the random forest algorithm

How we built it

To build the program, the first step was looking at the data from an overview to determine what we were truly presented with. Values of “na” throughout the data were coerced to numeric values and were considered to be equal to zero. We then applied and compared a number of classification models using 10fold CV procedure We compared different models including Logistic Regression, Decision Tree, K Nearest Neigbours, Linear Discriminant Analysis, and Gaussian Naïve Bayes. Basing on their performance, we selected Decision tree as our final model. We evaluated its performance through a confusion matrix, F1 score and accuracy scoring. The model was saved and then applied to the test set of data.

What we learned

We learned how to collaborate and combine all of our knowledge and ideas to create the most efficient solution for detecting machine failures. Each one of us had knowledge of a unique skill that added to the success of our program and to the knowledge of our teammates.

Challenges we ran into

Perhaps the biggest challenge was the lack of balance between the classes: for 59000 instances of class “0” (surface failure) there were only 1000 instances of class “1” (underground failures). Models based on this data can achieve remarkable accuracy, yet be woefully inefficient in real life because they would effectively predict one class. To accommodate for that, we used bootstrapping, upsampling the minority class.

Built With

Share this project:

Updates