Metropolis

A sample 2D sketch of a room with 3D objects in them.
Our system's output given that input image.
A look at the internals of one of the ways in which we analyze images.
The Gear VR headset we used to experience the scenes we generated.

Inspiration

One of our team members is currently enrolled in a computational cognitive science class, and he's been learning about this neat idea of inverse graphics. Inverse graphics is both a theory of how vision works in the mind and a computational technique to solve CV problems. We're intrigued at the thought that the algorithm behind inverse graphics is strong enough to solve a diverse array of CV tasks, so we were eager to work on Metropolis, which builds off ideas in the literature.

What it does

Metropolis accepts as input a single 2D image and outputs a 3D model, ready to seamlessly be explored in a virtual world thanks to the Gear VR headset. Metropolis can accept both rendered images and hand-drawn sketches as input, which means you can draw a room with some basic furniture (simple blocks) and be able to walk around that room _ in seconds _. Metropolis lets you bring your sketches and imagination to life, in the full vigor of the 3rd dimension.

How we built it

The majority of Metropolis was build with Python and C#: Python for the backend algorithm and web server and C# for the VR / Unity integration. The first step to using inverse graphics to solve a CV problem is to design a renderer that can render your scenes of interest. There are plenty of off-the-shelf renderers we could have used, but for maximum flexibility and learning, we built our own 3D renderer from scratch.

Once we had the renderer, the next step was to hook in the fundamental algorithms that power inverse graphics. These algorithms tend to be in the MCMC (markov chain monte carlo) family, like Metropolis-Hastings (this project's namesake) and Hamiltonian Monte Carlo. Like we did with our renderer, instead of using any of a number of pre-built probabilistic programming libraries, we opted to implement these algorithms from scratch.

Next, we needed a way to view our 3D models in the Gear VR, so we used Unity to create an Android application that could do just that. We put the Metropolis backend on a Flask server in the cloud and directed our Android app to work with that API. This allowd us to take pictures of sketches with our phone and immediately render the corresponding 3D, VR rooms.

Challenges we ran into

The biggest issue we ran into was dealing with Metropolis-Hasting's slow convergence time. We fooled around with alternatives like Hamiltonian Monte Carlo, but in the end, settled on beginning with a pre-optimization optimization step in which we use particle swarm optimization to determine a good starting point for the MH/HMC algorithm. (like most things in this project, we implemented PSO ourselves).

Another issue we had was dealing with slow DNS update times since we registered https://metropolis.black/ for this project and it took some time for DNS to recognize it.

Accomplishments that we're proud of

We're very proud of how hard we worked to do much of this project from scratch. We're also especially proud of solving a challenging computer vision problem without resorting to less-interesting, catch-all techniques like neural networks!

What we learned

Something we learned was how important it is to iterate. We set out to complete a dozen small milestones this weekend, each leading up in some way to the final product, and we did it. We also learned a lot of fun algorithmic stuff as you can imagine.

What's next for Metropolis

Next, we'd like to expand Metropolis to render cool-looking textures and more classes of objects. We did some work inferring the 3D structure of real-life objects during the hackathon, but we didn't have the time to polish this fully so we don't plan on showing it off just yet.