We were inspired to build this product because although there are a lot of text-to-speech assistive technologies for blind and vision impaired folks, we didn't think that it enabled a holistic web use experience. We thought, "it's not enough to be read the text of a news article, we would want to know what was also in the images." Our project was also motivated by our interest to integrate machine learning and artificial intelligence into our work.
What it does
Our Chrome extension is an image interpreter that uses the Microsoft Computer Vision API and Bing Text To Speech API to read the elements of an image to users. Our target users are blind and vision impaired folks so they can experience all elements of web content.
How we built it
Challenges we ran into
Our biggest challenge was integrating the two APIs together with little to no documentation in some cases. Because of this we needed to hard code our own XML string generator (that was initially written using Node.js) because the other API was not compatible with Node.
Accomplishments that we're proud of
Our team was comprised of three veteran hackers and two first-timers so we are proud that we were all able to contribute to the project and learn a new skill along the way. We are also proud that we challenged ourselves to think beyond the average user and build a product that helps more people engage more holistically with the web and everything it has to offer.
What's next for PicTalk
Right now PicTalk can't detect images on all pages of the web. Our next step would be to work on a seamless integration with Facebook so users could engage with photos on one of the most popular social media sites in the world.