Lootbox

Poster

Title: Package Counting for Shipping Logistics

Who:

Jorge Isaac Chang Ortega: jchang88
Eric Wang: ewang34
Ruotao Zhang: rzhang63

Introduction:

With the advent of e-commerce and its increasing user population, large companies with shipping services like Amazon and FedEx require various systems to understand their flow of packages for pipeline analysis and optimization. One important process is counting the number of packages flowing through a particular pipeline, such as a delivery loading window or warehouse storage. Counting packages can be tedious and prone to error when done manually. Thus, we want to leverage deep-learning to create an efficient and accurate computer vision solution for automated package counting. This idea originated from Jorge, who worked on a similar problem at a company for a particular client. The client provided footage of a truck being manually loaded with packages and requested a 90% accuracy for the amount of packages being loaded. Jorge originally faced the problem with a more traditional computer vision solution, however he believes that we can get better performance from using deep-learning methods. We thought this was an interesting problem to tackle with the unique data from provided footage, so we all agreed in working on this project. We will be exploring solutions to this problem by researching and augmenting Faster R-CNN for bounding box prediction and the SORT algorithm for the package counting.

Related Work:

Are you aware of any, or is there any prior work that you drew on to do your project?
There is a project from bharath5673 on GitHub that detects and tracks vehicles on a video stream and counts those going through a defined line. It uses the OpenCV-dnn implementation of YOLOv3 with pretrained weights on the MS COCO dataset for object detection, and SORT, a simple online and real-time tracking algorithm, for tracking vehicles across different video frames.
In our task, we are also considering using the Faster R-CNN algorithm for object detection. We will use the PyTorch version of pre-trained Faster R-CNN on COCO train2017. We will fine-tune all parameters of the model using our own dataset.
List of URLs:
link
link

Data:

The data is a series of videos of packages being passed from one place to another. They come from a package delivery company in Chile.
There are approximately 20 hours of video where you can see more than 3000 packages pass. The processing is relatively fast; probably, there is no processing to be made. However, it is necessary to label some video frames.

Methodology:

Since our main task is object detection, we will consider Faster R-CNN as our network architecture. It has a region proposal network together with another network that predicts object class and corresponding bounding box coordinates.

Metrics:

What experiments do you plan to run?
Since we need to label some video frames, important tests will be to analyze how the detector performance improves based on the number of training examples. We will also need to test the moments in the video where two or more packages are placed on top of each other, when packages are returned, and when people get under the camera.

If you are doing something new, explain how you will assess your model’s performance.
It is possible for the classification or object detection problem to assess it with accuracy, precision, and recall. However, the most important metric to evaluate the program's success will consider the number of packages that pass in the system end-to-end since this is the final purpose.

What are your base, target, and stretch goals?
Base: count 80% of the packages in the test set
Target: count 85% of the packages in the test set
Stretch: count 90% or more of the packages in the test set

Ethics:

Why is Deep Learning a good approach to this problem?
One of the most difficult aspects of our problem is that the packages can have arbitrary shapes, colors and textures. The packages are also being handled by hand in our footage, so we will also have to deal with transformations, rotations, and obfuscations. Thus, in order to increase our accuracy, using deep learning methods will allow us to accurately detect more general packages as opposed to traditional computer vision techniques.

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?
Our dataset is composed of footage of packages being loaded into a truck be hand. These packages may contain shipping labels on them that may disclose the address of the recipients. This can be a violation of privacy if the labels are clearly legible. In our particular dataset, the camera is a little far away from the actual packages, so reading the labels may be hard. However, this is something to keep in mind in the future if the cameras are closer to the packages in a particular scenario while collecting training data.