Inspiration
The Augmented Reality "augments" the real world by adding visual and auditory information to the real world. In order to perform useful augmentation, it is important to understand the real-world environment.
What it does
Our app feeds Hololens 2 sensor data to a Yolo object detection network on a server. The resulting object detection results are then projected back into the real world.
How we built it
We grab the Hololens 2 camera frames using Microsoft's API and send them to an external server where they are processed and sent back to the Hololens 2. On the Hololens 2, we are using Stereokit to map the real world, calculate the projections from the camera image to the real world space and visualize the object detection results in the user's field of view.
Challenges we ran into
The projection from 2D to 3D has been challenging, especially how to best represent bounding boxes of the object detection results around the detected objects at their real world locations. Understanding the neural net output has been challenging. The multi-threaded networking between the Hololens 2 and the server. The access to the high-res spatial mapping (used eg. for hand tracking) is limited and can only be done by jumping through a few hoops via research mode instead of using available Microsoft API methods.
Accomplishments that we're proud of
That our application reliably understands multiple objects in the real world and where they are located with respect to the user.
What we learned
The Hololens 2 is too weak to run computing intensive applications such as a neural net completely offline on the device and requires instead to outsource such process to an external processor. The lack of 5G or LTE currently limits our application to locations where Wifi is present.
What's next for HoloYolo
We plan to build on the information provided by HoloYolo to augment the user's view with context-aware intelligent information.
Log in or sign up for Devpost to join the conversation.