Inspiration
Sanitary Sewer Overflows (SSOs) are events during which untreated sewage is released by a sanitary sewer into the environment before arriving at a wastewater treatment facility.[1] SSOs pose serious health risks. Human health can be compromised through exposure to microbial pathogens originating from fecal matter, including viral, bacterial, and protozoan pathogens.[2] Environmental health may also suffer as pollutants from SSOs end up in surface waters, where they exert oxygen demand and high nutrient loads accelerate eutrophication.[3] SSOs should be prevented to avoid these health consequences, yet the pathway to prevention is not simple, especially since SSOs have long been regarded as unpredictable. Without prediction capability, clean-up crews can only react to spill reports, oftentimes arriving at the spill site where contaminated water is already seeping into the ground or making its way to surface water.
What it does
Based on input variables relating to environmental, weather, and infrastructure data, I developed a model that can predict the order of magnitude of a wet weather sanitary sewer overflow in California with reasonable accuracy. Prediction capability of sanitary sewer overflow volume is useful to city officials so they can anticipate clean-up needs and environmental impacts.
How I built it
I curated several large datasets from the Cal EPA SSO Interactive Report, Cal EPA Sewer Agency Questionnaire, and NOAA daily weather summaries. I combined and aligned these datasets to create a dataset of spills with many predictor variables (55,000 spills x 31 variables). After separating my dataset into training and testing sets, I developed a multivariate regression model. Since SSOs are highly complex in nature, I wanted to consider nonlinear interactions between variables and analyze how this could improve predictability of spill volumes. I therefore added many cross products and scalar divisions between variables in this model. I also used trigonometric functions and analyzed certain variables at a higher power. I used ANOVA testing to determine the significance of variables. While the multivariate regression model yielded predictions with a larger margin of error than I would have hoped, it was still useful for determining which variables and nonlinear interactions between variables were significant. I then moved on to developing a random forest model (which was successful!) and added the most significant variables from the multivariate regression model.
Challenges I ran into
One challenge I ran into involved missing values in my datasets. 48,000 of the 55,000 spills I was analyzing had one or more missing variable value. I used k-Nearest Neighbors (kNN) imputation to overcome this problem. Related challenges included obtaining relevant datasets general data wrangling issues.
Accomplishments that I'm proud of
The creation of my complete dataset required significant time and effort. I’m proud of this dataset and recognize that it likely contains the answers to many more questions than currently investigated. I’m also proud of the success of my model in successfully predicting the order of magnitude of SSOs.
What's next for Predicting Sanitary Sewer Overflows via Random Forests
I plan to develop an interactive tool for government officials and decision makers so they can estimate SSO volumes in advance of a spill event. I also intend to forecast how the frequency and volume of SSOs may change with climate change and specifically the predicted increase of “whiplash” events (extreme rain to extreme drought).
References
[1] Green Nylen, Nell, Luke Sherman, Michael Kiparsky, and Holly Doremus. "Citizen Enforcement and Sanitary Sewer Overflows in California." (2016).
[2] Sauer, Elizabeth P., Jessica L. VandeWalle, Melinda J. Bootsma, and Sandra L. McLellan. "Detection of the human specific Bacteroides genetic marker provides evidence of widespread sewage contamination of stormwater in the urban environment." Water research 45, no. 14 (2011): 4081-4091.
[3] Golden, Jonathan B. "An introduction to sanitary sewer overflows." Seminar Publication: National Conference on Sanitary Sewer Overflows (SSOs), pp. 1-8. (1995).
Built With
- naniar
- r
- randomforest
Log in or sign up for Devpost to join the conversation.