Vision360

Inspiration

We were inspired by the paradigm shift offered by the new boundaries of spatial computing in the Apple Vision Pro. As AI and ML enthusiasts, we wanted to further explore the use of the Apple Vision Pro with more readily available consumer technologies, such as iPhones and 360-degree cameras. These devices already have a vast amount of pre-existing footage and sources of media, yet their integration with the Vision Pro was lacking.

Through our use of the Vision Pro, we found that there was no simple way to view 3D models, particularly those of everyday objects that could be easily captured with a phone. Additionally, the 360-degree immersive experience felt underwhelming, as most existing applications relied on embedding YouTube or other conventional platforms. This reliance made custom workflows—such as running object detection and other ML models on 360-degree media—nearly impossible.

What it does

Vision360 enables users to scan objects using their iPhones, process them with photogrammetry on a Mac, and then view and share them in an immersive 3D space on Apple Vision Pro. Additionally, Vision360 allows live-streaming of 360-degree videos with real-time AI-powered object detection, revolutionizing how users interact with spatial computing, all due to the first-ever custom protocol-pipeline for sharing immersive 360-degree video to the Apple Vision Pro.

How we built it

We developed native applications for iOS, macOS, and visionOS, leveraging Apple's powerful toolkits. Our pipeline was built to ensure seamless integration between devices, emphasizing quality and low-latency transmission. AI-powered object detection was implemented using YOLO models, with computationally expensive processing offloaded to HPC clusters and the cloud for real-time analysis.

Challenges we ran into

Like any cutting-edge project, we encountered several challenges along the way:

Limited Development Support: Due to visionOS being a new platform, documentation and community support were scarce, which made many technical decisions difficult.
Network Performance Uncertainty: We had to use unconventional technologies for our protocol due to concerns over WiFi performance and its impact on latency.
Apple's Privacy-Focused Development Approach: Apple's strict privacy policies required us to find creative workarounds for media and file importing.
Accessing Raw Camera Footage: Working with external cameras, such as Insta360, posed challenges in retrieving raw footage for processing and analysis.

Accomplishments that we're proud of

Successfully developing a cross-platform pipeline that enables seamless 3D scanning and streaming.
Implementing real-time AI object detection in immersive 360-degree video.
Overcoming visionOS development hurdles and optimizing performance for low-latency streaming.
Creating the first-ever custom protocol-pipeline for live stream 360-degree immersive video from a custom source to the Apple Vision Pro.

What we learned

Development for Apple's Ecosystem: We learned how to develop native applications for various Apple OS platforms, including iOS, macOS, and visionOS.
Building for a 3D Paradigm: Developing applications in a 3D environment required us to rethink our approach to UI, UX, and data representation.
Accelerating Workflows with AI: We explored how AI tools and Apple's toolkits can optimize development and processing pipelines.
Protocol Development: Understanding how different technologies impact protocol development was crucial, especially since we prioritized quality and latency while also running a YOLO model for real-time object detection.
Offloading AI Processing: We learned how to offload computationally intensive AI processes to HPC clusters and the cloud, allowing for real-time AI enhancements without compromising performance.

What's next for Vision360

More AI-powered features like automated 3D model enhancement and advanced scene understanding.
Support for additional platforms to make our technology accessible to more devices.
Improved network optimizations to further reduce latency for real-time immersive experiences.
Integration with AR/VR applications beyond Vision Pro, allowing for broader adoption and new creative possibilities.

Built With

docker
hls
hpc
openai
python
rtmp
swift
yolo

Submitted to

BoilerMake XII

Created by

I was full-stack working on the back and frontend as well as the processing pipelines and handled the integration for all our services. I developed the image to 3D model pipeline using Apple's Photogrammetry Kit and and Sample Capture. I helped with the LLM use for image-to-text generation. I also created the Vision Pro application that displayed these 3D models and text descriptions as well as 360-immersive videos from local sources and from a livestream using our custom novel protocol-pipeline process.

Ritvik Gupta
I worked on setting up a server for RTMP live streaming from any device to the Vision Pro.

Zach1031 DeFazio
Made infra for live streaming video at 4k quality with <10 s latency, tweaked RTMP/HLS protocol to add object detection and TTS.

Arav Tewari
I worked on creating a locally hosted LLM to generate descriptions based on images of objects, and worked on fitting the YOLO model to 360° video for object detection.

Mahad Faruqi

Updates

Ritvik Gupta started this project — Feb 23, 2025 12:42 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.