Point-to-Speech

Inspiration

For individuals who are unable to read due to lack-of-education or visual impairment, audio readers can be expensive and difficult to obtain. We wanted to help contribute to an underdeveloped field of technology that is geared towards assisting those who struggle with reading and linguistic ability. We wanted to offer users an option to control the pace of their own reading rather than to follow an audio book that goes at a set pace. The ultimate goal of the project is for users to be able to read any object they wish through simple intuitive hand gestures.

What it does

The project can be split into 3 major components:

The Raspberry-pi was used to take a picture of the text and send it to the server. The Raspberry-pi is connected to a breadboard circuit and with a press of a button, a picture is taken and uploaded to an online server
Tesseract.js was used to interpret the image and convert to text. Once the image has been uploaded, tesseract takes the image in and converts it to a text string
Leap motion tracks the user's finger and reads what the user is pointing at. Using the leap motion, we are able to track the movement of the user's finger as it moves across the page. Given the location of the finger, we are able to determine which word the user is pointing at and that word is spoken using a text-to-speech library.

How we built it

Using PVC pipes and a wooden base, we were able to create a skeletal structure to mount the raspberry-pi and camera. The leap motion holder and guide were 3D printed here at Massey Hacks. Using the pi camera library and RPI.GUI we were able to connect the raspberry-pi to the breadboard circuit and camera. Once the image was uploaded, tesseract and all of the leap motion components were written in javascript. The website that displays the text was also written using javascript and stylized using HTML and CSS.

Challenges we ran into

The first challenge we faced was the file size created by the raspberry-pi camera. Due to the high quality of the camera, the upload time for the picture took far too long. We solved this by reducing the resolution of the image, although this did make it more difficult for tesseract to interpret the images. The second challenge we faced was connecting the locations of the words on the page that tesseract was giving us with the location of the tracking hand that the leap motion was using. Because the two programs used different units of measurement, conversions between the two needed to be made.

Accomplishments that we're proud of

Overall this was a very ambitious hack that took a lot of coordination between the team. Our biggest accomplishment was being able to put together all the components of our hack and having it run smoothly.

What we learned

MasseyHacks was a great learning experience for the entire team and exposed us to hardware and software working together seamlessly.

What's next for Point-to-Speech

As of right now, point-to-speech is a working prototype but in the future, we hope to see it as a fully working product that is more accurate, efficient, and compact.