Read To Me

Inspiration

Our inspiration for this app was from one of our team mates Grant Celley. He loves learning things when they are explained to him not just via a medium of pages. So, he suggested an app, which can read to him and also use fun voices to do so like the fictional characters that he loves so much.

What it does

It's a very simple app, which takes your images. The images can be handwritten notes, letters, or even bills and images of books and it will convert those images to a captivating life like audio which just holds the attention of the user

How we built it

Our application is mostly built using Python. We used Streamlit for our simple frontend interface. We used The Google Cloud Platform to host our application. We used services like Google Cloud Engine and docker to host the application. We used Google Cloud vision API to convert image to text and then pass to the Uber duck API to get life like voices of your favorite characters

Challenges we ran into

There were no opensource high level end to end OCR technology
There aren't many good available OCR models online
We have tried multiple models of Tr-OCR and CRAFT-OCR and the results weren't satisfactory
Handling multiple API requests
Issues in processing uploaded images, PDF files

Accomplishments that we're proud of

We got a working product by planning only for half a day and the results were quite impressive.

What we learned

We learned a lot about open source models that can be leveraged to make real world applications. It was also new for team to actually host web apps and deploy so each of us learned how to do that. We also learned ways we can train and try to fine tune some models when we were using open source models

What's next for Read To Me

We want to have sentiment analysis to provide users with an option to understand the tone of the text as well as sentiments and then possibly use the same to make the voices even more life like to humans.
Make a phone app, which makes it easier for a user to just click an image.
Integrate this current app with a voice to talking image technology which can bring life to characters who are trying to talk and can also act as instructors or narrators for various mediums like teaching, conferences, introductory videos, or anything that basically requires a video.