This project started as a way to take advantage of visual question answering to improve the online experience of blind people. We specifically looked at Facebook because as this article shows, visually impaired people use Facebook to share as much content as others. We hoped to take advantage of machine learning APIs and data from Facebook to give vivid descriptions and provide meaningful answers to questions asked about a given image.
What it does
Our app pulls photos from someone's Facebook profile, and provides automatic caption generation and visual question and answer about that picture. The caption generated may be something like "There is a group of people standing in front of a waterfall posing for a picture." Typical questions asked by a user could be "Who is in this picture?", "What are the people doing?", and "Is Jack wearing glasses?".
How I built it
We used the Facebook graph API to pull photos from the user's profile and get a list of tagged people. We used Microsoft's Computer Vision API and Face Detection API to learn about the images and generate descriptions and answers. We enhanced current visual question answer techniques using convolutional neural networks and long short term memory to be more effective for the type of images typically found on Facebook, and we used the Microsoft APIs to get more accurate image analysis data to provide more detailed and relevant responses.