Inspiration
I wanted to build a system that could track and count objects in real time, even under occlusions, lighting changes, and motion.
What it does
OS2D+ takes a query image and live video feed, detects the target object, tracks it across frames, and outputs real-time counts with persistent IDs.
How we built it
I used a shared CNN backbone for feature extraction, feature correlation for matching, a detection head for bounding boxes, re-identification embeddings for occlusions, and OC-SORT for smooth tracking.
Challenges we ran into
Real-time performance, maintaining IDs through occlusion, tuning detection thresholds, and integrating tracking were the main challenges.
Accomplishments that we're proud of
I built a robust end-to-end pipeline that maintains object identity and counts accurately in real time.
What we learned
I gained hands-on experience in query-based detection, re-identification, and real-time object tracking optimization.
What's next for OS2D+
I plan to add multi-object queries, transformer backbones, improved embeddings, and edge-device deployment.
Log in or sign up for Devpost to join the conversation.