While the world continues to grow more digital, we cannot allow those who depend on traditional resources to fall behind. Although I personally rely on textbooks or notebooks less and less, I realize that many individuals still need paper documents to fulfill their jobs or learn new subjects. However, some people have natural barriers to making that information accessible. For example, what if you have a visual impairment, or lack of familiarity with a language? Therefore, to help make paper documents more understandable, we can expand the senses we use to process that information. What if rather than just reading every document you come across, you could instantly hear it in English, or any language you want?

What it does

PicSpeak, the mobile application, allows people to quickly take a picture of a document and hear it 'speak.' You simply need to snap the image, and wait as the application gives voice to the words in front of you. Also, with the ability to translate in French, German, Korean, and more, you can even take a picture of an English paper and hear it come alive in a foreign language!

How I built it

The mobile application uses React Native and various open-source React Native packages to take advantage of the Camera and FileUploading components of an iPhone or Android device. The backend file conversions take place on the cloud with a Linode Ubuntu server. A node.js and express.js framework team up to make a REST framework that allows the user to post Images and texts to specific endpoints. Finally, Firebase allows the user to have feedback with near instantaneous performance.

Challenges I ran into

Determining how all these varied components can work together in a way that keeps the user on one simple interface proved my greatest challenge. At multiple points I could have decided to add another page to simplify the process, but I really wanted the user experience of anyone using my application to feel intuitive and simple.

Accomplishments that I'm proud of

I feel particularly proud that I managed to convert images to text while providing a quick process of file transferring between my frontend and backend systems. The application works with very detailed documents, which allows it to help those who have trouble reading very fine print.

What I learned

I learned that I love this new form of development called the ‘middle-end,’ a type of coding that takes place when you equally devote time to the backend and frontend of your systems. However, I found out that at many times certain tasks like file conversion don’t necessarily have to take place completely separate from each other.

What's next for PicSpeak

The addition of more languages to PicSpeak would mean the addition of its capability to impact more people and communities. Also, PicSpeak has the potential to grow into an application that can provide real-time image-speaking. What if you only had to hover over text as you read a document to simultaneously hear its “voice”? Ultimately, as the algorithm for image-to-text conversion only gets more precise with further coding time, the feasibility of this application making a true difference only improves.

Share this project: