We wanted Double Vision to allow a continuous learning experience: watching video while simultaneously reading further up on topics of interest

What it does

Uses audio and image recognition along with Natural Language Processing to generate smart context (i.e. Wikipedia links to articles) while watching your video

How I built it

download video, extract frames using opencv, assign tags to video using tensorflow, download audio and transcribe to text using wit api, extract meaningful tags from text using Alchemy api, select most important tags (based on probability of accuracy) and display related Wikipedia articles along with accompanying Bing thumbnails

Challenges I ran into

Using many different api's

Accomplishments that I'm proud of

Being able to leverage both audio and image content to generate meaningful results

What I learned

To be patient and persistent

What's next for DOUBLEVISION

optimize running time, perhaps transform DOUBLEVISION into a Chrome extension, develop an annotating community around videos and develop a more interactive UI

