AI-Powered Video to VR

Inspiration

This project is actually an extension of a project Ethan worked on in high school. His original project tried to convert 360 degree photos to walkable VR experiences. For this hackathon we wanted to convert a single video into an entire walkable VR scene by using AI to create a 3D model of the video. We were inspired by the idea of being able to “walk inside of movies”, experiencing entertainment as a fully-immersive experience rather than on a 2D screen. This technology has implications extending to not only Entertainment but to fields such as Education, Urban Construction and Planning and Healthcare.

What it does

For this project we tried to develop software to convert photos and videos into walkable VR experiences. At the time this project was submitted our photo-to-VR software was successfully rendering scenes ranging from "orbital" views of objects to immersive 360-degree scenes. We're still working on developing a version that renders a video in VR. Additionally, we're trying to see if we can adapt our 360-degree photo-to-VR pipeline to work with Google Street Views.

How we built it

Given the short time frame for the project, we decided to adapt existing open-source research to fit our problem rather than designing our own end-to-end solution. Our first challenge was finding a suitable open-source project. To address this, we reviewed dozens of papers on generating 3D models from images and videos. We concluded that our architecture needed to involve two steps: a diffusion model with Gaussian Splatting to create views of the scene from different perspectives and a neural radiance field to unify those perspectives into a single 3D model. The research project "Guess the Unseen" incorporated both of these elements and was the most suitable for our needs. We implemented this approach on a video but initially obtained less accurate results. We then transitioned to using NVIDIA's Neural Radiance Model, achieving significantly better results with custom videos we captured. Before feeding the dataset into the model, we implemented several image enhancement and correction solutions. This approach allowed us to refine the input data, leading to more accurate and high-quality 3D models.

Challenges we ran into

Our biggest challenge was implementing brand new technology in a field (AI for 3D modeling) that none of us have experience in. We managed this, in part, by adapting open-source research projects for our use case, but this brought it's own set of challenges. We had significant difficulties with managing the dependencies of some of these projects.

Accomplishments that we're proud of

Adopting a research paper that was released just 3 days ago and successfully presenting was the most we are proud of ! We hope to take this on bigger levels going forward.

What we learned

We learned the importance of preprocessing and enhancing image data to improve the accuracy of the resulting 3D models. Additionally, we discovered the challenges and rewards of working with cutting-edge AI technologies and integrating various research methodologies into a working solution

What's next for AI-Powered Video to VR

Moving forward, we aim to refine our pipeline to achieve even more accurate and realistic VR reconstructions. Our next steps include:

Building our own end-to-end pipeline for recording a video of a subject and converting that video to a 4D model. A key feature of this model would be generalizability. Rather than simply modeling humans or static objects we would aim to model a wide range of complex scenes.
We could then extend such a model to build a consumer facing app. Doing so would require handling scenes with very limited video inputs and dealing with scene cuts. We would also have to figure out complex problems such as moving scenes (eg car chases), in which the user would have to travel with the camera to experience the scene.