Final Writeup
See here: https://docs.google.com/document/d/1nQqXou6KzhHXBdgSCZcMwDTUs8zJ8K6Ks9FML6nLhtc/edit?usp=sharing
Oral Presentation
https://drive.google.com/file/d/18JoKQAucaapoXbfT0Hqg3Sdml_esARNU/view?usp=sharing
Producing Greenland Ice Sheet River Networks from Satellite Imagery
Who
Sarah Esenther (sesenthe), Nicholas Petrocelli (npetroce), Ellie Madsen (emadsen)
First Check-in:
Introduction:
For our project, we are attempting to solve a new problem (Option 2). Complex stream networks have been observed in detail on the Southwest Greenland Ice Sheet. These channels control drainage efficiency and timing, which has important effects on the rates of local surface melt, ice sliding velocity, and sea level rise. However, because the channels are only 1-20 m wide and vary greatly over space and time, they require repeated high resolution satellite imagery to map, and thus have not been extensively studied until recently. Multiple semi-automated methods have been proposed to detect the streams from satellite imagery, but these methods have not used Deep Learning and heavily rely on manual labor. To advance this research, we will classify the streams from a data set of high-resolution satellite images using a Convolutional Neural Network. This is a Regression problem.
Related Work:
A similar study was conducted by Xin Lu et al. in the 2020 paper “Small Arctic rivers mapped from Sentinel-2 satellite imagery and ArcticDEM” using digital elevation models (DEMs) and Sentinel satellite data over a Siberian and an Alaskan river network. The researchers used Gabor filtering on the Sentinel data to identify areas likely to be river reaches, then refined these selections by overlaying a drainage model calculated from Arctic DEM data. The study found that incorporating the Arctic DEM simulated drainage data improved the connectivity of the generated river network. Our project will seek to accomplish the same objective - mapping of small Arctic rivers - relying solely on satellite imagery by using deep learning techniques in place of physical DEM modelling.
Same objective, but using physical Arctic digital elevation model to refine satellite river network mapping: https://www.sciencedirect.com/science/article/pii/S0022169420301499?via%3Dihub
Data:
The dataset we are using for this project consists of a set of .tif image files, each matched with one or more arcGIS shapefile representations of the stream network shown in the image. These images were captured by the Worldview 3 satellite and were obtained from the UCLA GIS database. Currently we only have access to two such images that we plan to use as training and testing data, respectively; however, more images could be retrieved as part of further work on the project, as Sarah has access to the source database. In order to facilitate batching and efficient training, the training and testing images will be sub-divided into some much larger number of smaller resolution images which will act as a larger dataset. The source images appear in both high resolution (~1-2GB) and low-resolution (700-900 MB) form, and the shapefiles are less than 100MB in size. We do not plan on performing much preprocessing on the input images, as they are already consistently padded and we believe we have sufficient cloud computing capabilities to process them without downscaling (see “methods”). Encoding the output to be usable in a model will require significant work, however; while python packages exist for reading and writing shapefiles such as pyshp (https://github.com/GeospatialPython/pyshp), the challenge will come from bridging the gap between Tensorflow and the shapefile format.
Methodology:
As this is an image processing task, we plan on utilizing a CNN model implemented in Tensorflow/Keras to generate a representation of the stream network geometry. We are not basing this on any existing papers or implementations so the precise architecture (# of convolution layers, # of dense layers, exact input/output dimensionality) will be determined during the course of the project and is one of our primary areas of investigation. The model will be trained in the cloud using GCP cloud credits, which will make processing the large image files feasible. We currently plan to use one large-resolution image as training data, and one other image as testing data; if we are able to produce a functioning model for this small dataset, then we plan on obtaining additional images to expand the train/test sets.
As a backup option, we could utilize one of the pre-trained CNN models available through Keras such as VGG16 and fine tune it on the image data. This would save significant time in implementation if we end up spending a long time working through data preprocessing issues.
Metrics: What constitutes “success?”
-What experiments do you plan to run? We plan to use our CNN architecture to generate a vector river network corresponding to the reserved test river network. The accuracy of this predicted network will be assessed by comparing the distance of the predicted and manually digitized vector river networks. -For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? The notion of accuracy does apply to the project as a quantitative difference between the training river network and the generated river network can be calculated. -If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model. -If you are doing something new, explain how you will assess your model’s performance. As we have two manually-digitized stream networks available, we can train our data on one of the networks and use the other as test data to assess the accuracy of the model. Accuracy between the images may be calculated by rasterizing the training vector network and the generated vector network and calculating image difference between the two. Total connected river network length may also be incorporated to encourage the network to generate connected networks rather than disjointed water bodies. -What are your base, target, and stretch goals? Our base goal is to produce a river mapping algorithm that correctly identifies any river reaches, but may not connect the reaches or identify all river reaches in the image. The target goal is to generate a river network that is mostly connected (the majority of the length is connected rather than in short, disjointed reaches) and overall overlaps with the training data in more places than it does not. The stretch goal would be to generate a fully connected river network that overlaps almost entirely with the training data and that could be applied to other black and white satellite images taken over northwest Greenland.
Ethics:
What broader societal issues are relevant to your chosen problem space? The societal issue most relevant to the chosen problem space is climate change: if the algorithm is successful, it will enable faster mapping of Greenland Ice Sheet river networks to improve regional climate model inputs and refine ice sheet melt and sea level rise predictions.
Why is Deep Learning a good approach to this problem?
-Deep learning is well-suited for this highly tedious task because the time-consuming labor of detecting the streams can be done by a computer rather than humans. This would save hundreds of hours of human labor. Since the subject of this problem is non-human, it will not directly impact people or use data with biases towards people (although unlikely, the results of this project could indirectly affect people local to the Greenland ice sheets if it were to bring about some policy change). As this is a technical problem, it warrants a technical solution such as Deep Learning. This first step could be followed by social or political methodologies; we are not attempting to replace such methodologies with our work.
-What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? We are using Worldview satellite imagery, which was collected remotely by a commercial satellite imagery company. The training data was created by manually tracing over the river networks as they appear in the images. As the study location is so far north, many commercial satellites do not collect data at these latitudes, focusing instead on more populated, more profitable regions of the planet. Data at these locations at high commercial satellite resolutions is therefore more time-sparse than over other regions, potentially introducing time biases into the data and limiting the time resolution available for potential river network evolution studies.
Division of labor:
At first, we will work collaboratively to design: -Accuracy/Loss metrics -Data shape/preprocessing steps -High-level network architecture After that, we will divide implementation work between the group members. Currently we plan to divide the work as follows: -Sarah: Data Preprocessing/Slicing -Ellie: Cloud Infrastructure Setup/Tensorflow data formatting -Nick: Network Architecture Implementation
Built With
- google-cloud
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.