ConocoPhillips Challenge - Team Ahhggies

Inspiration and Team-work

Garey: Learning data science in a team environment was an awesome experience. My teammate Angel is knowledgeable in programming, statistics and machine learning and he really showcased his leadership skills. I'm brand new to machine learning and he did a great job coaching me and introducing me to field of machine learning. I cant say enough about Texas A&M. They did an outstanding job in hosting the first ever Datathon and they continue to showcase why they are the best school in the world! Angel: Having worked with machine learning before it was fun seeing what real companies expect from you. I had a lot of fun applying what I already knew as well as learning throughout the Datathon to help the team.

What it does

Our model can predict if the error is on a surface error or underground. The model is capable of being saved and ran again

How we built it

Before running any classifiers, we had to change the 'na' cells to something that is readable by the classifier. We used imputer to replace the 'na' values with a mean of those cells. Afterward, the first thing we did was test classifiers that are capable of ranking features. Once we found that Random Forest was the classifier with the best results, we went with it for the rest of the program. We then found out how to get the prediction from the model for the output file. After our first test cases, we started fine-tuning the classifier by messing with n_estimators and max_features. We ended up getting our best result following that method.

Challenges we ran into

Once we maxed the potential of fine-tuning, we were not able to improve our results. One problem that hindered us was the fact the training took a long time to be completed. At the start, we worked with SVC which took an hour to train for okay results.

Accomplishments that we're proud of

Honestly, I am really proud that we got results good enough for the top 10. It took us a while to get there, but in the end, we tried our best to get results we are satisfied with.

What we learned

We learned how to correctly train data as well as clean it. The hardest part of this challenge was getting better results. Thankfully that made us learn more about the algorithm to try and find how to maximize its potential.