With global temperatures rising and outbreaks of fires becoming more and more abundant, effective fire conservation is becoming crucial. Containment of fires becomes exponentially more difficult as the fire spreads, so early efforts at containment can be extremely valuable in limiting damages and loss of life. By building this classifier, we can provide federal government agencies with the tools to know how, where, and when to commit extra resources to fires that could potentially become catastrophic.
What it does
The model looks at initial conditions surrounding a fire (location, date discovered, cause, etc.) in order to classify it as either a threat or not. The model is a random forest classifier with 17 estimators.
How I built it
We used R and Python (run in a Jupyter notebook) for all of the EDA and plot generation. Sklearn's packages were extremely helpful in building different kinds of models and tuning them.
Challenges I ran into
It was pretty difficult to formulate the overall goal into a solvable data science problem within the Datathon time constraints. We eventually decided on simplifying the problem into a binary classification problem since that makes the most amount of sense for decision makers. Also, the vast amounts of data made running some of the models time intensive. Also, tuning the model to reduce the number of false positives proved extremely difficult.
Accomplishments that I'm proud of
That we were able to model something and achieve a prediction accuracy of about 80%. Also, I believe that we were able to come up with a convincing narrative surrounding this problem and have the relevant data to support our conclusions.
What I learned
The difficult in doing data science in the real world. You need to make a lot of assumptions and make informed decisions when building the model. Blindly doing random things and hoping accuracy goes up will get you nowhere. Data Science is a interesting blend of critical thinking and raw statistics.
What's next for Random Forest Fires
Adding in different data sources that give more relevant attributes for fire prediction (specifically climate data). Tuning the random forest more or trying out different kinds of models (K-means, neural networks).