See the world in words

SeeSay is a PennApps Fall 2015 project that describes the world out loud using speech, meant for people with severe vision loss. It is a web app that views the world through the user's camera, and describes it in sentence form out loud. This project was inspired by the recent research breakthroughs in automatic image captioning using neural nets, seen in the nytimes article.


According to the WHO, there are 39 million blind people, and 246 million people with low vision. About 90% of the world's visually impaired live in low-income settings. As cheap smartphones spread throughout the developing world, SeeSay can allow anyone with an internet browser to see the world through words.

How it works

SeeSay uses the latest in deep learning image recognition and semantic analysis to analyze images. To do this it uses caffe and neuraltalk. Speak.js is used for the text to speech while jQuery and Bootstrap are used for the rest of the front-end. Advanced HTML5 features are used for the image capturing. It uses to communicate with the backend. The server backend is hosted on an AWS GPU instance to provide the deep learning with sufficient power. nodejs is used for the web server backend but the python npm module is used to communicate with the image analysis portion of the backend which is in Python. In addition to this some form of a computer is used for both the client and server.

Share this project: