Headline Case Identification—An unsolved problem
Amazon Distribution Centers have robotic systems that are capable of physically manipulating and retrieving cases. Surprisingly, however, the sensing and informatics systems do not yet allow an accurate information of where exactly a case lies, within a single “bin” that may contain up to a couple dozen cases. This information gap is enough to fully prevent the robotic systems from being useful; since wrong identification is very costly in terms of time and the extra overhead to put back and find again, etc.
You might be thinking, “Tracking some cardboard boxes? That doesn’t seem too hard to me!” Let’s explore what makes case-tracking a non-trivial problem.
- Lack of unique identifier

- Boxes are very self-similar, unlike other objects, that surround us. This makes it difficult to track them by using “semantic” information i.e., uncovering meaning from the aesthetical aspects.
Temporal correspondence
Not being able to see through boxes
Inspiration
On the technology side, we are inspired by some of the modern innovations in the computer vision space:
What it does
We built a system that is able to provide the position of any case in bins—including ones that are occluded (i.e., blocked by other things)—in real time, just using a single camera attached to the forklift of warehouse operators. (Meaning, you don’t have to install a gazillion of these to use them at a real warehouse!)
The input of our system is a video that captures the entire history of a bin, starting at completely empty, gradually becoming fully filled. Our core assumption is that it is possible to ascertain the position of any and all case just from this video. We believe this is the case, because the operator
How we built it
Hardware
Due to limited manpower, we were unable to dedicate extensive time to hardware prototyping. As a result, we opted for a simplified design focusing only on major components.
Software
The case locator algorithm is based on serial communication between Arduino and Python. Due to image distortion caused by the webcam, calibration between the servo controls and webcam display is essential. This calibration involves interpolation to correct and grid-align the input camera view. Once calibrated, given the 3D coordinate information of a specific case from the 3D optical flow, the first red laser directly points to that coordinate. The second laser provides depth information about the case's position. Different combinations of the two lasers clearly indicate the case's location: One dot: the case is on the surface and ready for pick-up. Two dots with one blinking: the case is immediately behind the first layer. Two solid dots: the case is likely positioned further back.
We started out by using generic computer vision algorithms like YOLO, SAM, and Depth-Anything Model to get a foundational understanding of the scene. This allowed us to identify boxes, track them, and even approximate 3D location without a depth camera.
We used a variant of the optical flow algorithm, called Optical Expansion, developed by [Gengshan Yang], to extract the depth-wise movement of boxes.
Then, we synced the box movement information into the MuJoCo physics simulation engine, which helped us 1) prevent accidental re-identification (the algorithm thinking that a new box has appeared, when in actuality, it was the same box that briefly hid behind another; and 2) model things such as falling of boxes due to gravity and collisions between box.
We didn’t completely finish it, but I think we are on track to developing a great useful algorithm that works in real life.
Challenges we ran into
Fifteen hours after the kickoff, two team members withdrew from the project due to heavy school workloads and personal reasons. As a result, we (Myeongjun and Yunho) had to move forward under a tighter schedule with fewer resources.
Accomplishments that we're proud of
Development of our own 3d optical flow algorithm; human interactive case locating system using laser; out first ever hackathon - we are proud that we tried our best under non-optimal situation
What we learned
- How the physical goods supply chain operates! The pipeline of organization, sortation, and different levels of delivery from manufacturer, distribution center, fulfillment center.
- How to think of robotic systems that can effectively work with human operators—practicing how to think in terms of their perspective (taking into account the urgency, priorities, etc.)
- What the 3d optical flow and serial communication are and their practical use cases.
What's next for 3D Optical Flow Package Tracker + Package Indicator
We will:
- Finish & flush out the system
- Test it with small-scale testing for fun
- Document and upload the project and write a great, interactive blog post about it!
GitHub
https://github.com/yunho-c/RoboTech-Hackathon-2025
*some pictures are missing due to technical issues.
Log in or sign up for Devpost to join the conversation.