Our ultimate goal is to engage public and government officials in keeping San Diego beautiful. This is a prototype and shows the potential of using data science for public benefit.
What it does
We designed and published an interactive website link for visualizing which zip codes in San Diego are least and most littered. Our site is built on our own analysis of Google Street View images using machine learning algorithms to detect litter.
How we built it
We pulled latitude/longitude coordinates of San Diego addresses from a SANDAG data repository. We then used Google's street view API to collect data (images) for a subset (~50,000) of these coordinates. We also extracted a sample of 40 littered images for training purposes. Then we applied machine learning techniques to detect features in our dataset. Specifically, we looked for trashcans and graffiti using shape recognition models, using the OpenCV library. We also used Python Imaging Library (PIL) to detect blue tarps using a color recognition model based on RGB counts. Using these features, which we found to be indirect measures of litter, we classified our images as littered or not. We used Tableau to integrate, visualize and publish our findings. We also included Google Forms to allow for user feedback for better training in the future.
Challenges we ran into
First and foremost, we found very little litter in San Diego street view images. This, coupled with extreme variability in our images, made it difficult to accurately classify images based on the presence or absence of litter. As such, we had to look for indirect measures.
We also faced low accuracy of our models, particularly shape recognition. We believe this was primarily due to time constraints: we had to train on extremely low resolution images. We also didn't have any labels (pictures pre-labeled littered or not littered) to train or validate our findings with. We had to create a small number of positive images ourselves, and could not easily evaluate accuracy of our test set.
Lastly, we faced issues publishing our dashboards on our website, which we believe was due to embedded Google API keys.
Accomplishments that we're proud of
We are proud of our working Tableau dashboards, displaying ~50,000 Google street views of clean and littered areas in San Diego, numerically evaluated for litter for each zip code. We especially enjoyed creating the interactive interface that allows for user input to better train our learning algorithms and classify images more accurately. We are also proud of our application of machine learning techniques, which we believe could be used to accurately detect littered areas with better training.
What we learned
In preparing our analysis, we were able to apply machine learning algorithms we learned in our Data Science classes as well as apply new machine learning techniques, including OpenCV item detection and RGB color analysis. We were also able to practice Google API integration, data cleaning, and image processing in Python.
What's next for Waste Bots Unleashed
Future directions include increasing and improving feature selection, using crowdsourcing performance metrics for model parameter tuning and implementing a neural network model.