An example of the kind of semantic segmentation within images that is possible with our technology
We wanted to enable rich multi–sensory three–dimensional experiences within the affordances of existing two dimensional touchscreen technology. We are big fans of virtual reality, but we missed the lack of complete immersion within today's (really good) headsets. We wanted to emulate tactile sensations which we felt had been largely neglected through modern VR headsets.
What it does
What if you could transform any vanilla image video or movie into a fully interactive, touchable version of it? With this product, your recordings of your trip to Italy would become an entirely more immersive experience. Imagine looking at a video of the colosseum and being able to feel the rough concrete structure of its walls with the tips of your fingers or the soft,suave feeling of its outer marble columns. This application would definitely change the way in which travel is experienced, enabling travelers to remember much more about the beloved places in which they visited. Not only that, it would enable people that have never been to notorious landmarks to have a more intimate experience with those beautiful places, allowing them to not only observe those places but also touch them and feel them
How we built it
The main technology that drives this forward is Deep Learning and Artifical Intelligence, specifically Convolutional Neural Networks. Advanced deep neural nets are now able to perform really well in a task called Semantic segmentation. It involves looking at different images and identifying the individual objects in actors present in it. After doing so, it draws a bounding box around those objects, giving us an idea on the exact placement of an object, animal or feature in an image.
This technology is perfect for a company like Tanvas. By being able to name and identify objects and agents in a video, this enables us to create multiple different tactile sensations to different objects. This algorithm could easily learn how to recognize a cow in a plantation, for example. Then, we could associate a particular haptic feeling to the cow, enabling people to feel how fluffy they are. This method could easily be applied to any elements in a video and it only becomes better as it’s used, since more data equals more knowledge when it comes to deep nets. Given the absurdly large amount of videos in the internet, it wouldn’t be a far cry to create models that could crawl through those videos, label and segment the objects present on those and learn to associate objects with different haptic sensations. Those sensations could be learned programatically, through user feedback or even through natural language processing in the future. Understanding which objects have word representations that are closer to “sandy” or “rocky” could give you an edge on associating different tactile feelings to different objects
During hack GT, we attempted to get those object segmentation regions and match them to different haptic bitmaps of different magnitudes. This would be a simple way of attributing different touch sensations to different regions and concepts. After that, we would render a video clip with around 600 frames matched with different haptic bitmaps representing the object.
The haptic bitmaps were generated through the use of gaussian noise distributions. We chose to represent animals and humans with a higher mean/lower variance gaussian, to give a stronger sense of touch to them. Secondary objects that could add to the experience were associated to gaussians with lower means.
Challenges we ran into
The main issue arose from the time taken to train deep models. However, as it can be seen from our demo, it is possible to create a simple demonstration of this concept through the use of simple edge detectors allied with some sem–automated coloring. It wouldn’t be a stretch to build a Semantic Segmentation deep net that could do this automatically if we were given more time. Here, in this final image, we show some of the bitmaps for one video frame. Notice how one of the haptic blobs surrounds the castle in the background and how the more noisy blob surrounds the photographer
Accomplishments that we're proud of
In the end, this was a great hackathon and we definitely learned a lot. This is a project that we’re definitely interested in pursuing in the future, since we’re dealing with a really interesting game-changing technology that can be deeply augmented and enhanced through the use of data and artificial intelligence
What we learned
Deep Learning, Machine Learning, Caffe, Tensorflow, and System Administration Stuff to make everything work :P
What's next for Haptic Movies
We would love to engineer full body haptic suits at a future date. We talk more about our app at https://docs.google.com/document/d/1Mw57LMZ_Cj2i2jHpVR5LiJYqjxJ8Nfr9UKv1TSgaqjk/edit#