Microsoft Computer Vision vs. Reddit Conversation

Vision API in its current state is rather impressive but we can only get so much data from it. Though it can look at the image and guess what is in it, it fails to make any connection between images. We have accomplished this and went on to compare it to what a user community interpreted it as.

How it Works

For our image source, we took top images from popular subreddits and fed them into Microsoft's Computer Vision API. The API would return a JSON, providing guesses of what was inside of the image and a related confidence value. Using this JSON information and many other images, we created a graph correlating images to one another. We would take matching pairs of word descriptors (from the vision API), their respective confidence values, and average it all out to come up with a similarity value from 0 to 1 between any image.

Next, since we took these images directly from Reddit, we were able to use Reddit's API to scrape the comments from the thread. We analyzed the frequencies of words and compared them to the frequencies of words to other threads and also assigned a similarity value from 0 to 1.

Most of the time, using Microsoft's API data would return a similarity value of 0 since it was unable to see common objects; however, Reddit discussion has a much higher similarity value between items.

The Takeaway

Looking at the similarity values we see that both Vision API and and comment thread analysis can show predictable connections between images - and that sometimes the discussion can show similarities that the Vision API cannot see. We have not used machine learning with it but it is apparent that it is possible to extrapolate additional data that humans "interpret", rather than what the Machine sees on paper.