Inspiration
We've all experienced the frustration of misplacing our keys or wallet under a pile of papers on a cluttered desk. But in professional environments—like a research lab, a mechanic's garage, or a medical supply room—losing small tools or sensors costs serious time and money. We were inspired to build Spatial-Search to solve this problem by giving any physical workspace a searchable, digital memory.
What it does
Our project turns a live video feed of a workspace into a temporal database. Using an overhead camera, our AI continuously scans the desk and logs the coordinates of specific items (like keys and wallets) every 10 seconds.
If a user loses an item, they simply search for it in our Web UI. The dashboard provides a Timeline Scrubber to see exactly where the item was moved throughout the day. Most importantly, it features a Last Known Location tool: if an item is completely hidden (e.g., covered by a notebook), the system remembers where it was right before it disappeared from view and draws a bounding box over its hiding spot.
How we built it & The Math Behind It
Our project is built on three main pillars:
- Computer Vision: We used Ultralytics YOLO to train a custom object detection model tailored specifically for top-down environments.
- Backend Pipeline: A Python backend powered by FastAPI and OpenCV processes the video feed.
- Frontend UI: A responsive dashboard built with HTML, JS, and Tailwind CSS.
To prevent our database from overflowing with duplicate data when an object is sitting still, we implemented a movement threshold utilizing Euclidean distance. The system calculates the distance d between the centroid of the newly detected bounding box (x₂, y₂) and the last known centroid (x₁, y₁):
d = √((x₂ − x₁)² + (y₂ − y₁)²)
If d exceeds our spatial threshold, the backend logs it as a definitive physical movement rather than just camera noise.
If d exceeds our spatial threshold, the backend logs it as a definitive physical movement rather than just camera noise.
Challenges we ran into
Our biggest challenge was False Positives in Small Object Detection. During our initial testing on our heavily cluttered test desk, our standard YOLO model hallucinated and confidently labeled a spiky Naruto bobblehead as "keys."
To solve this, we became our own data engineers. We took 50 custom top-down photos of the desk under various lighting conditions. Crucially, we left the bobblehead in the frame as a "decoy" but explicitly did not annotate it. After retraining the model on this custom dataset, the AI learned to ignore the visual noise, bringing our tracking accuracy near 100%.
Accomplishments that we're proud of
We are incredibly proud of our Total Occlusion Handling. Computer vision natively fails when an object is blocked from view. By bridging the CV model with a temporal SQLite database, we successfully engineered a system that remembers object states rather than just reacting to live pixels.
What we learned
We learned a massive amount about deploying AI in physical spaces. We learned that an AI is only as smart as its training data, and that incorporating "negative space" and decoys into a dataset is just as important as the target objects.
What's next for Spatial-Search
While our proof-of-concept tracks personal items on a desk, the architecture is endlessly scalable. Next, we want to swap our current weights for models trained to track pipettes in biotech labs, sterile instruments in operating rooms, or specialized tools in automotive workshops. We also plan to introduce multi-camera support to triangulate objects in 3D space.
Built With
- fastapi
- html
- javascript
- opencv
- python
- sqlalchemy
- sqlite
- tailwindcss
- ultralytics
Log in or sign up for Devpost to join the conversation.