Motivation

     Most of the recycling costs are results of putting the trash in the wrong category at the point of generation. Many people won't stop at a trash can for 5 seconds when they are in the inertia of continuing their walk, so they toss the trash into any one of the recycling bins without thinking. The computer vision aided intelligent trash can aims to classify incoming trash into the correct category and trigger a physical setup to put it into a sub-can of the correct category. 

What it does

     "Toss It!" follows the work flow of "detection -> classification -> label display" in fully automated cycles. Once started, it detects the motion of an incoming item and then activates the machine learning algorithm to classify the item on the spot! After the classification, it send the result to a Unity program which will display an animation of trash can with the item's correct recycling category. 

     In real world use cases, instead of displaying the label on the screen, the system shall start a physical mechanism inside the trash bin to put the classified item into the correct sub-bin.

How I built it

    TensorFlow, the open source AI platform of Google, is the driving force behind this system. After fine tuning the pre-trained Inception v3 model on more than 500 images of trash collected during the hackathon, I achieved the average 92.6% classification accuracy on validation sets. 

     The functional components of this system are iSentry (the security camera app on Mac), TensorFlow, and a Unity program. The iSentry app monitors the front web camera, detects any motion in the camera's field of view, and captures images of the intruding objects. TensorFlow is responsible for analyzing the captured images on the fly. The Unity program displays a short animation once the classification results are available.

     While all three pieces of technologies work great, there has to be an automation mechanism in order for the program to run seamlessly and truly function.

     I used a back-end Python observer called watchdog to detect the generation of images by iSentry. Once the image generation events are detected, watchdog invokes TensorFlow to classify the object. I modified the output of TensorFlow to be an empty file whose name is the class label. Once this file is generated, the Unity program takes over. The Unity program continuously monitors the output directory of the classification results and will immediately display the animation of the label once the file is created. 

     After the animation, the Unity program deletes the old images and calls an AppleScript which controls the buttons of iSentry. iSentry will again be activated and be ready to take images! 

     Thus, all steps of the program are automated, once iSentry, watchdog, and the Unity program are set off at the beginning, the program runs forever and no human intervention is needed. It just works.

Challenges I ran into

     There were a couple of challenges, the first one being not knowing how capable TensorFlow is and how to use it. Luckily, the pre-trained model does a very good job already. However, it does not mean I can just sit back and enjoy. In order to make TensorFlow classify objects in my predefined 5 categories (glass, metal, paper, plastic, mixed) instead of its original 1000+ classes, I had to retrain the model with my own datasets.

     This step is referred to as transfer learning. The TensorFlow website provides an OK tutorial on how to do it. It basically tells me how to remove the last layer of the pre-trained neural network and retrain the entire neural network with my customized images and labels. My new Macbook has an Intel M CPU, which is not suitable for computation-intensive tasks. Therefore, the retraining process with 500+ images took more than 3 hours. All the while I had no idea whether or not it was going to work.

     In addition, I did not know how to detect motion and capture the images. I considered using the iPhone camera, but then data transfer is an issue because iPhone cannot run TensorFlow. To avoid data transfer, I decided to use the web camera on my Macbook. Then I set off to search for tools for monitoring motions. Luckily I found iSentry.

     I also had no experience in automating system programs with Python. So I did my research and found the event listener watchdog. It gave me the power to monitor file changes and trigger a callback function when the event is detected.

     The last challenge is learning how to use AppleScript. The difficulty lies in its design to mimic human conversations. Because it does not have a rigid set of rules on the first look, I had no clue in what exact way I should write my command. A true human sentence will not work of course, but the disguise of human-approachable syntax covers up any useful clue of what the real syntax rules are. Apple does not seem to provide much guidance on this subject, which I believe is in the belief that it is easy to self-learn. I did manage it to work after hours of fumbling using online forums and tutorials. The built-in Accessibility Inspector is important for figuring out what UI component I should call in the AppleScript, however it is still hard to construct the right hierarchy in AppleScript even with the help of the Accessibility Inspector.

Built With

Share this project:

Updates