This project focused on:
Challenge 1: Match Boston’s Ground Truth Sidewalk Data
Use Google Maps to try to replicate our sidewalk condition findings from 2014, to see if using this way is a viable option in the future.
If we want to be able to use Google Maps in order to replicate the city's 2014 findings we must be able to separate the sidewalk from any other aspects of a city scene. When using google street view this problem of separation becomes one very similar to the semantic image segmentation problem that self-driving cars face. With this realization, we decided to use a pre-trained DeepLabV3 model to help us solve this problem. This pre-trained model fits directly with this problem as it was trained on the Cityscapes Dataset which translates very well to the urban environment of Boston.
What DeepLabV3 allows us to do is to take an image from google maps and get rid of everything but the sidewalk. For example:
We first provide an image from google maps:
Then the DeepLabV3 model segments the image:
Finally, we apply one last image pre-processing step and crop out everything but the sidewalk:
Now we can feed these isolated sidewalks into a classifier to determine if the sidewalk needs repairs!
Sidewalk Damage Classification/Scoring
The possibility of finding a dataset containing damage scores for semantically segmented sidewalk was never even close to probable. Because of this, we were forced to make our own dataset, but because of Boston's StreetCaster project, we weren't running around blind. By extracting the latitude and longitude from the StreetCaster dataset we knew where to look for damaged sidewalks in Google Maps, and just about how damaged they should be. With this approach, we were successful in creating a small dataset (albeit tiny for ML) of labeled semantically segmented sidewalks.
The architecture and transfer learning
At the center of any meaningful machine learning model is the data, but once the data is collected we need an architecture to best utilize such data. We decided to choose ResNet18 as our model. Because we were limited by both time and amount of data we opted to use a pre-trained ResNet18 model and transfer its already learned features to this task of sidewalk damage scoring. This process, known as transfer learning allows us, in a sense, to teach a model that already knows how to see to simply make meaning of the things it detects in our dataset's images.
After training for 100 epochs (iterations over the dataset) we saw the model successfully learn. However, it is questionable if the model was actually picking up on relevant features as the dataset the model was trained on is, as previously mentioned, tiny for an ML task. We do believe that our results lead to the conclusion that this is a worthwhile approach given additional time and resources.
Challenges we faced/where this could be improved
The lack of a sizeable dataset is the greatest pressure point of this project. We originally wanted to programmatically pull Google Maps images from their API, but after looking into this option and finding out that the size and quality of the images return would not work well with both the segmentation and scoring model we decided our best option would be to manually create the dataset. We do believe that with more it is very possible to automate the dataset creation - at least partially (labeling may be tricky).
Model Design in Summary: