PicTalk

Inspiration

We were inspired to build this product because although there are a lot of text-to-speech assistive technologies for blind and vision impaired folks, we didn't think that it enabled a holistic web use experience. We thought, "it's not enough to be read the text of a news article, we would want to know what was also in the images." Our project was also motivated by our interest to integrate machine learning and artificial intelligence into our work.

What it does

Our Chrome extension is an image interpreter that uses the Microsoft Computer Vision API and Bing Text To Speech API to read the elements of an image to users. Our target users are blind and vision impaired folks so they can experience all elements of web content.

How we built it

Our Chrome extension works by detecting an image on a web page using JQuery and then sending that image through the Microsoft Computer Vision API. The Microsoft Computer Vision API uses artificial intelligence to identifies and returns the key elements of the JPEG or PNG image in a JSON file. We then convert the JSON object to a string where we send it through the Bing Text to Speech API that returns and plays an audio file. This allows the user to hear the visual and emotional elements of a photograph. We implemented this in Javascript. Making the Chrome extension required us to also use HTML.

Challenges we ran into

Our biggest challenge was integrating the two APIs together with little to no documentation in some cases. Because of this we needed to hard code our own XML string generator (that was initially written using Node.js) because the other API was not compatible with Node.

Accomplishments that we're proud of

Our team was comprised of three veteran hackers and two first-timers so we are proud that we were all able to contribute to the project and learn a new skill along the way. We are also proud that we challenged ourselves to think beyond the average user and build a product that helps more people engage more holistically with the web and everything it has to offer.

What's next for PicTalk

Right now PicTalk can't detect images on all pages of the web. Our next step would be to work on a seamless integration with Facebook so users could engage with photos on one of the most popular social media sites in the world.

Built With

azure
bing-text-to-speech
html
javascript
jquery
microsoft-computer-vision
restful

Submitted to

Hack for Humanity - Girls in Tech Vancouver x UBC
- Winner Most Innovative Application

Created by

-implemented Microsoft computer vision API
-worked on integrating computer vision API with bing text to speech API

Tanya Tan
Rachel Fishman
Marta Yao
vivienne
Sabrina Smai