Wildfire Spread Prediction Model

Title: Summarizes the main idea of your project.
- Wildfire Spread Prediction Model
Who: Names and logins of all your group members.
- Henry Earnest - hearnest
- James Hu - jhu74
- Rishi Patel - rpate114
Introduction: What problem are you trying to solve and why?
- If you are implementing an existing paper, describe the paper’s objectives and why you chose this paper.
  - We are implementing “Next Day Wildfire Spread: A Machine Learning Data Set to Predict Wildfire Spreading from Remote-Sensing Data.” Predicting wildfire spread is critical for land management and disaster preparedness. This paper aims to provide the Next Day Wildfire Spread dataset and they also supply a convolutional autoencoder model that makes use of this dataset. We chose this paper because this dataset provides insight into an issue that causes hundreds of thousands of deaths per year, and that until now has not been as easily predictable. Also, this dataset is relatively modern (March 2022).
- If you are doing something new, detail how you arrived at this topic and what motivated you.
  - N/A
- What kind of problem is this? Classification? Regression? Structured prediction? Reinforcement Learning? Unsupervised Learning? Etc.
  - This is a structured prediction problem, because we are using data with labels to train a model that will predict wildfire outcomes.
Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?
- Please read and briefly summarize (no more than one paragraph) at least one paper/article/blog relevant to your topic beyond the paper you are re-implementing/novel idea you are researching.
  - Medium article: This article took a classification approach to implementing predictions of natural disasters like wildfires. This is different from ours in the sense that we are using a convolutional autoencoder on image data. They found ~90% accuracy with this approach, which was surprising, although we aren’t sure what 90% accuracy looks like for this kind of task.
- In this section, also include URLs to any public implementations you find of the paper you’re trying to implement. Please keep this as a “living list”–if you stumble across a new implementation later down the line, add it to this list.
Data: What data are you using (if any)?
- If you’re using a standard dataset (e.g. MNIST), you can just mention that briefly. Otherwise, say something more about where your data come from (especially if there’s anything interesting about how you will gather it).
  - The historical wildfire data are from the MOD14A1 V6 data set [1].
  - Topography data are from the Shuttle Radar Topography Mission (SRTM) [2].
  - Weather data are from the Gridded Surface Meteorological data set (GRIDMET) [3].
  - Drought data are from the GRIDMET Drought data set [4].
  - Vegetation data are from the NASA VIIRS Vegetation Indices (VNP13A1) data set [5].
  - Population density data are from the Gridded Population of World Version 4 (GPWv4) data set [6].
- How big is it? Will you need to do significant preprocessing?
  - 18,000 image samples taking up 4GBs. We will need to preprocess the data by clipping out extreme values. We need to set the previous fire mask at time t as data features and t+1 as data labels. Finally, we will need to augment the data.
Methodology: What is the architecture of your model?
- How are you training the model?
  - The model architecture is that of a typical convolutional autoencoder: we have an encoder, a flatten layer, and then a decoder with dense layers and upsampling. Our model is different in the sense that the encoder uses Max Pooling ResBlocks, and the decoder uses Conv2D ResBlocks.
- If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here.
  - Using ResBlocks and working with accuracy in an autoencoder are two things we aren’t very experienced with, so those will probably be the hardest to implement.
- If you are doing something new, justify your design. Also note some backup ideas you may have to experiment with if you run into issues.
  - N/A
Metrics: What constitutes “success?”
- What experiments do you plan to run?
  - N/A
- For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate?
  - Since we are predicting images, we cannot use a traditional measure of accuracy. We will be using AUC-PR, as used in the paper.
- If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model.
  - They used AUC-PR to measure the precision of their deep learning model and compared the results with 2 other machine learning models (random forest and logistic regression).
- If you are doing something new, explain how you will assess your model’s performance.
  - N/A
- What are your base, target, and stretch goals?
  - Base: get an autoencoder working, calculate some value for AUC PR, Precision, and Recall (our accuracy metrics)
  - Target: Implement similar or same structure as in paper with accuracy values approaching those in the paper.
  - Stretch: Change the architecture to get a higher AUC (PR) than existing paper
Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.)
- What broader societal issues are relevant to your chosen problem space?
  - Climate change - wildfires produce aerosols harmful to the atmosphere and pollute the surrounding area, affecting hundreds of thousands of lives per year.
  - Worker efficiency - it takes lots of effort for workers to perform crucial tasks like preventing wildfires, so making their jobs easier and more effective is ideal. Also, working in dangerous areas like fire locations is harmful to workers’ health and should be minimized.
  - Trust of machine learning algorithms - for something important like wildfires, a machine learning model leading workers astray and wasting their time in the wrong areas could damage public trust of technology
- Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm?
  - Everyone is a stakeholder because wildfires will affect the entire environment. If we make incorrect decisions using our algorithms, then a town might not be evacuated if the fire actually moves towards them. A less extreme issue could be that incorrect predictions would make fighting the fire more difficult. More generally, untackled wildfires will only contribute to climate change, harming all beings on the planet.
Division of labor: Briefly outline who will be responsible for which part(s) of the project.
- Each task will be completed collaboratively, or divided equally on a low-level if necessary. Specifically, this applies to preprocessing, architecture setup, AUC PR implementation, testing / debugging, and documentation.