Inspiration

Coming to bitcamp this year we had one goal in mind, to do something we genuinely thought wasn't feasible but do it anyways, make a machine learning model from scratch. Furthermore, it couldn't be just any machine learning model, it had to be something which had impact, something which was able to help people, deliver something unique or unheard of. As a result we began searching, bouncing ideas off each other from food insecurity to issues with menstrual products and their high costs to even clinical patient absentees or payment defaulting. However, they all had either not enough data or were something already done. This is when we came across a dataset from the Water Point Data Exchange (https://data.waterpointdata.org/dataset/Water-Point-Data-Exchange-Plus-WPdx-/eqje-vguj/about_data) that had over 450,000 inputs of data to analyze. It was from this we knew that we could build a model which would help communities with staying on the tap with water, and also help governments or people who manage pipes better facilitate their maintenance.

What it does

Our model takes in columns such as the year the pipe was built, the water type of the pipe, the water source, and geographic location in order to predict when the pipe will become non functional, and thus impacting the community depending on it. Using that prediction, our model then estimates when exactly we would need to conduct maintenance on the pipe in order to prevent such a catastrophe.

How we built it

We built it by using pandas for data cleaning and numpy with sklearn for creating the model. To create the full project we created a Django website that combines the python backend with the trained model and the HTML frontend that allows the user to input the data. We used gradient boosting regression model with staleness as a weight, and install_decade, latitude and longitude, water tech, water source, pressure_score, and crucialness_score as features to predict the time to failure.

Challenges we ran into

We learned a great deal about different machine learning models and the full pipeline required to turn them into a real product. One of us had no prior experience with machine learning, while the other two had some familiarity but not an in-depth understanding. As we progressed, we realized that we needed to work with models and techniques we weren’t used to, which meant learning new approaches and unfamiliar code from scratch.

Additionally, although we had some experience with either full-stack development or building models in isolation, none of us had worked through the entire pipeline before. We weren’t sure how to extract and use the trained model, how to connect it to the frontend without an obvious API, and even the model type we should use. The models we were familiar with, such as random forests or k-nearest neighbors, weren’t sufficient for our use case, so we went through extensive research and multiple failed attempts before finding a workable solution.

Despite these challenges, we pushed through, learned rapidly, and within the 36-hour Bitcamp timeframe, successfully built a working model and a functional product.

Accomplishments that we're proud of

We are proud of building a machine learning from scratch, learning how to use Django to implement the model across the full stack, and understanding the entire pipeline from developing the model to creating a product. Finally, we are proud of the good cause that our product attempts to solve, and we hope to pursue our model to continue to understand and predict the replacement of water infrastructure.

What we learned

We learned more about the process of data cleaning to get the most refined and accurate machine learning model. Removing extreme outliers as well as data that is invalid significantly improves the accuracy of the model and is a necessity for a reliable product. We also learned the basics of how to create a gradient boosting regression model, as well as how to use Django to create a full stack application.

What's next for HydroSentry

We are hoping to continue developing and refining the model and then perhaps making it useful enough for NGOs or other major corporations to use it and improve existing infrastructure making it such that communities are less impacted by water pipe infrastructure issues.

Share this project:

Updates