Inspiration
HiveSight started with a simple question: how do teams share awareness when visibility is low and information is fragmented? We were inspired by high-stakes situations like firefighters in smoke-filled buildings and police officers coordinating through bodycams—scenarios where communication can be chaotic and each person only sees a small slice of the environment. We wondered what would happen if those individual viewpoints could be combined into one shared, live map.
Instead of describing what you see over the radio, what if your team could just see it?
What it does
HiveSight is a real-time multi-camera spatial awareness system that turns several iPhones into a collaborative tracking network.
One iPhone acts as a spatial anchor, while the others act as mobile “bodycams.” Each phone:
- Determines its relative position to the anchor using Nearby Interaction (UWB)
- Measures its camera orientation using ARKit
- Streams its live camera feed via MJPEG to a central HTTPS server
On a central computer, we run YOLO with BoT-SORT-ReID to detect and consistently track people across frames—even through partial occlusion or temporary exits from view.
We then combine each phone’s detections to estimate the 2D positions of target people. The frontend is a map built with HTML, CSS, and JavaScript that shows team member locations and directions, detected people (including last-seen positions), and shortest paths from each team member to a target. The idea is similar to a minimap in a video game—but with real-world sensor and vision data!
How we built it
The project had four main components, developed in parallel:
Mobile app (Swift) We built a custom iOS app using SwiftUI that uses Nearby Interaction for relative positioning, ARKit for orientation data, and AVFoundation to stream camera feeds via MJPEG. This creates our own lightweight streaming pipeline that updates every video frame.
Detection and tracking (Python)
On the backend, we used PyTorch with Ultralytics YOLO for person detection and integrated BoT-SORT with ReID for a consistent identity tracking method. OpenCV and NumPy helped with frame handling and preprocessing data.
Spatial fusion logic
We used bounding box detections to estimate 2D positions of detected people relative to each camera. We also implemented simple persistence logic to maintain “last known” positions when someone left the frame.
Frontend visualization (Web stack)
We built a 2D map using HTML, CSS, and JavaScript intended to display the final scene. Phones are rendered as dots with direction, and detected individuals are plotted in the space. We also added shortest-path visualization to demonstrate potential tactical use cases.
Challenges we ran into
Pivoting from laptops to phones
Our original plan used laptops as cameras, but we quickly realized they couldn’t provide accurate indoor positioning or orientation data. Switching to iPhones meant rebuilding our streaming pipeline in Swift to ensure a more robust solution.
Maintaining consistent IDs
Early on, YOLO frequently reassigned IDs when someone left and re-entered the frame. We spent a significant amount of time tuning BoT-SORT-ReID parameters and experimenting with different configurations to improve stability.
System integration Each component initially worked independently, and getting everything to operate together in real time was a significant challenge. Much of our hackathon time went into testing, refactoring, and reconnecting pieces, and we ultimately didn’t have the opportunity to fully combine each phone’s data into the live map.
Accomplishments that we're proud of
- Learning how to code custom apps with Swift in a weekend
- Successfully creating a pipeline to perform real-time computer vision across multiple devices
- Achieving persistent person tracking using computer vision
What we learned
This project taught us how challenging multi-view spatial reasoning can be in practice, and how finicky tracking systems across devices can be! Other things we learned include:
- How to connect mobile hardware sensing with ML pipelines
- How to debug cross-platform issues quickly under hackathon constraints
- How important clear delegation is in team development
Overall, it was a great exercise in full-stack development and rapid prototyping and debugging.
What's next for HiveSight: Real-Time Multi-Camera Spatial Intelligence
If we continue developing HiveSight, we would focus on:
- Fully integrating camera data into a live global map
- Utilizing multiple anchor phones or a different tracking system for better positional data
- Incorporating a data gathering tool for unknown indoor environments
- Reducing latency for camera streams
- Experimenting with AR overlays for smart glasses
HiveSight is still an early prototype, but we believe its real-world applications would be very impactful!
Log in or sign up for Devpost to join the conversation.