Deep learning is pretty cool, the fact we can take images and identify objects in them could be incredibly useful to the visually impaired.

How I built it


There are several core microservices behind see.


Language is the microservice for taking in sentences, tokenizing them and offering several different services.

  • Similarity - This is the similarity of two words based on their shared synonyms
  • Nouns - This returns all of the nouns in a given sentence
  • Tag - This tags words with their correct word classes.

Language is hosted using Amazon AWS EC2.


Tagging is a microservice that is always connected to the client. Using Socket.IO and base64 encoded image streams I am able to have a realtime tagging service using the Clarifai API.

Tagging is also hosted using Amazon AWS EC2, it also statically serves it's images using Caddy TLS at (which are named using UUID generation). This also using HTTPS.

Challenges I ran into

I had originally spent a lot of time using the Microsoft cognitive service computer vision API. However, I found a flaw in the API regarding it's Image URL parameter.

Accomplishments that I'm proud of

The basic functionality is all there, you can ask if something is in the room, or get a description of the top 5 tags relating to the room. In the future I would love to explore the natural language processing aspect further, potentially generating full sentences describing the room and being able to infer the meaning of more advanced inputs.

Share this project: