Inspiration

I'm from Houston, Texas, and Wes is from Philadelphia, Pennsylvania, where oil accidents occur frequently. We wanted to see if there is any way to better protect people and the environment near oil pipelines.

What it does

Identifies strategies for risk mitigation in oil pipelines to help the neighboring people and environment

How we built it

We did a lot of exploratory analysis with visualization, and we built a randomForest model to predict total cost. The randomForest model also allowed us to interpret which features were most important in determining total cost. Outside of total cost, we looked at the causes that are most associated with human fatalities and injuries as well as environmental damage. In economics, firms tend to optimize only to maximize profits, but the costs to people and the environment are also very important as well. With these data, we were able to develop strategies for firms to optimize for profit, but also to minimize the risk for human injuries/fatalities and environmental damage.

Challenges we ran into

The data was cumbersome, and it included a lot of missing data. Only 5 records were complete out of 2795. We also only had data points for systems that failed, so we couldn't necessarily predict failure, but we could see which factors contributed most to cost.

Accomplishments that we're proud of

We were able to perform many interesting and informative visualizations related to geography and feature importance.

What we learned

Cleaning and pre-processing data is tedious, yet very important.

What's next for Oil Spill Accident Mitigation

We'd like to perform various forms of analysis of accidents over time to see if there are particular time periods in which accidents occur more. Perhaps, it is in the early hours of the work day that more accidents occur, so oil firms would know to be more attentive then. As we only had data points for oil spill accidents/failures, it would be interesting to add data for oil pipelines operating normally, so we could also develop systems to predict failure.

Built With

  • r
  • ggmap
  • ggplot
  • randomforest
Share this project:
×

Updates