According to the World Health Organization, water scarcity impacts 40% of the world’s population, and as many as 700 million people are at-risk of being displaced as a result of drought by 2030. We think that, potentially, drought prediction software could help people and governments in these areas to plan ahead and avoid unforeseen circumstances, in turn improving the quality of life in these regions.
What it does
This code uses a multi-output regression model to predict the duration and severity of droughts based on meteorological data from the US drought monitor. The data was collected and organized by Christoph Minixhofer on Kaggle
How we built it
We decided to use Python for the project because there are a lot of great packages for data analysis and machine learning. Specifically, Pandas, SciPy, NumPy came in handy for working with data, and scikit-learn provided a large library of machine learning tools. We spent almost the entirety of this hackathon to learn machine learning, and we still have a long way to go.
Challenges we ran into
This project was fairly challenging, especially given that we knew absolutely nothing about machine learning going into this project. We were able to obtain a basic understanding of the concepts only after lots of trial and error. Additionally, the input data that we got was a different format from what we expected. The practice problems that we did before diving into the project contained simple matrices, but the data from the US drought monitor is in a 3D array format.
Accomplishments that we're proud of
After hours of research and trial and error, we were able to figure out that Multi-Output Linear Regression is the machine learning model that works the best with our data. It took a lot of effort to get our data ready for the model, but we managed to get it working.
What we learned
We learned that it is better to assess the complexity of a project before deciding to work on it. We did not expect it to be so complex. When participating in the hackathons in the future, we will definitely spend more time on choosing the right project to make sure we are able to finish it in the limited amount of time we have.
What's next for ML Drought Prediction
The next step for this project will be to implement it using different regression models, such as Random Forest, when we manage to get the computational resources. It is important to note that our dataset also contains soil data, but due to the lack of time, we excluded it from the model. It would be a good idea to include it in the future. Moreover, it would be good to use satellite images to further enhance the predictive power of our model.