How many times does it happen to you that you are watching a video on Youtube/Netflix/Hulu/TV/etc and you can't recall the name of celebrity you see? Or how many times are you really interested in what the characters of your favortite tv-show are talking about in today's episode but are too lazy to look it up? So were we! And that's the reason we decided to build this hack.
What it does
While watching a video anywhere (Android for POC), you can tap on the "Deep Watch" buton to extract all the information about that frame, including names of any famous people viz actors, statesmen, etc which are displayed along with a very short description about them and a link to know more. We also intrepret the conversation going on around that time of the video, and enlighten you on everything that you might want to look-up. But it doesn't end here, in future you will see that we tell you about interesting objects too, including any landmarks in the video!
How I built it
We have built a native Android application using cognitive APIs from Microsoft (power of CV & NLP applied to video-consumption). As soon as user taps the button, we capture the frame and speech in the vicinity to analyze them extensively ( face detection, face recognition, speech-to-text, text analytics) and bring out all the important attributes related to the scene.
Challenges I ran into
Our biggest challenege was to extract image frame and audio clip but our team managed to overcome it very well. Another roadblock was the same as every analysis hack faces.. find training data or well-trained models. Thanks to Microsoft Cognitive Services for saving hours of time.
Accomplishments that I'm proud of
After spending 36 hours on Georgia Tech campus, drinking lot of caffeine and craving to sleep for one hour, we have built a product viable enough to be demoed. And more importatly, we brought to life what we thought could be done.
What I learned
Well a few things. Its never too late to change the course of plan. Keep improvising than just sticking to it indeifnitely. Grab food and swag whenever you can HaHa #eatOn #codeOn
What's next for DeepWatch
Foremost, basic algorithms need to be trained with huge amount of data in order to perform upto expectation, especially domain specific data would do great advantage. Also, right now for MVP, we implemented just an "on-demand" version of DeepWatch. It can definitely be real-time, throughout the video, if we reduce REST API calls.