Inspiration:
One of the biggest impediments to sustainability is trash. As the amount of trash ever grows with rising human consumption, the environment suffers as a result. Pollution, deforestation, and danger to wildlife are just some of the disastrous side effects. Hence there is an urgent need to handle the collection of trash, especially plastic and safely recycle it.
However, could be very hazardous for us humans to handle trash material. It is very expensive to have humans clear litter in the streets, because of increasing wages for humans, and also the usage of human resources for a task that could have been handled by a robot. The human’s time is much better off contributing to the economy using his/her creativity and something which cannot be achieved using AI.
What it does:
The robot needs to be able to clearly visually distinguish whether material is trash or not, and then get the exact dimensions of the material to determine the best grasping position
Trash can come in highly varying shapes, sizes, and colors, and hence detecting the presence of a rubbish object is not a trivial task
Also, since cleaning up is a highly repetitive task, there is huge scope for automation. It also could be grueling in terms of physical effort needed, so it could be done much better by a robot
The robot scans each object in its environment indicates its confidence level that the object detected is a piece of garbage, and that inference is visually depicted using different colored LED lights
How we built it:
Instance Segmentation associates each pixel of the image with an instance label which is used to identify the objects from the background and also mark the exact boundary of each instance of the object present in the image. Faster R-CNN (Region-based Convolutional neural network) is a commonly used neural network architecture used to predict the bounding boxes of the objects and their labels. Mask R-CNN which extends Faster R-CNN by having another branch that predicts the object mask. Here, we use Mask R-CNN to perform instance segmentation to analyze the image obtained from the camera mounted on the robot to identify the pixels in the image that contain trash.
Zero-shot learning is gaining popularity due to its generalizability. Usually, the model is pre-trained on large state-of-the-art datasets. The model predicts unseen data without any fine-tuning. This saves computational time and resources. It is also difficult to gather large amounts of labeled data, especially for tasks like trash detection and segmentation where the heavy annotation is involved. Zero-shot learning has proven to perform better than trained models through research like Open.ai’s CLIP. We observe that the pre-trained Mask R-CNN model with a ResNet-50 backbone detected bounding boxes and produced heatmaps significantly well for TACO dataset images with zero-shot capabilities. This is attributed to the ability of the model to generalize on out-of-domain data with prior generic knowledge.
Challenges we ran into:
Since trash could come in different shapes and sizes, we would need a huge dataset with a huge variety of objects for the best training results. So essentially, our model is limited by the data available for training.
We wanted to get the real-time video input of surroundings using the ESP-32 cam wifi module, but we had trouble since the voltage supplied by the USB port in the laptop was not sufficient and we were running into errors. A possible way to address this is by connecting it to a DC external power source (not the laptop USB port), but we were unable to get access to that during the time of the hackathon
Accomplishments we’re proud of:
Developing a full-stack solution of training a deep neural network - computer vision model and using that to perform inference on a real-time feed. Further, we controlled the switching of one of three LED’s corresponding to the probability of the prediction that it is trash. Use ‘Red’ to indicate high probability, ‘yellow’ for medium, and ‘green’ for a clean no-trash scenario.
Also, the model even works well in low-light scenarios as depicted in our demo video.
What we learned:
We gained hands-on experience using a Computer Vision-based approach to solve a real-world problem and integrating it with the Arduino. By going through research papers in this domain, we got to know about the current state-of-the-art approaches. We also learned how to plan effectively and manage time efficiently in order to complete a project within a short amount of time.
Further Work / Future Directions:
We have interfaced the output prediction of our model to a microcontroller. The next step will be to integrate it with a robotic gripper arm mounted on a navigator robot base. Visually we have solved the problem of identifying trash. However, it is still a challenge for the robotic arm to be able to pick up. For this, we will use Reinforcement Learning techniques from the paper QT-Opt to enable the robot to generalize to new objects and pick them effectively, robust to external disturbances too. We can also consider training a CNN model to predict the best position in which the object could be grasped in such a way that the object is well balanced and doesn’t slip from the gripper.
This work has the potential to clean up secluded and potentially risky areas such as forests where trash could lead to disastrous long-term outcomes such as danger to wildlife, deforestation, and flooding. Deploying a robot here to handle the collection would be a step closer to sustainable development.
Log in or sign up for Devpost to join the conversation.