Inspiration
Our inspiration for this app was from one of our team mates Grant Celley. He loves learning things when they are explained to him not just via a medium of pages. So, he suggested an app, which can read to him and also use fun voices to do so like the fictional characters that he loves so much.
What it does
It's a very simple app, which takes your images. The images can be handwritten notes, letters, or even bills and images of books and it will convert those images to a captivating life like audio which just holds the attention of the user
How we built it
Our application is mostly built using Python. We used Streamlit for our simple frontend interface. We used The Google Cloud Platform to host our application. We used services like Google Cloud Engine and docker to host the application. We used Google Cloud vision API to convert image to text and then pass to the Uber duck API to get life like voices of your favorite characters
Challenges we ran into
- There were no opensource high level end to end OCR technology
- There aren't many good available OCR models online
- We have tried multiple models of Tr-OCR and CRAFT-OCR and the results weren't satisfactory
- Handling multiple API requests
- Issues in processing uploaded images, PDF files
Accomplishments that we're proud of
We got a working product by planning only for half a day and the results were quite impressive.
What we learned
We learned a lot about open source models that can be leveraged to make real world applications. It was also new for team to actually host web apps and deploy so each of us learned how to do that. We also learned ways we can train and try to fine tune some models when we were using open source models
What's next for Read To Me
- We want to have sentiment analysis to provide users with an option to understand the tone of the text as well as sentiments and then possibly use the same to make the voices even more life like to humans.
- Make a phone app, which makes it easier for a user to just click an image.
- Integrate this current app with a voice to talking image technology which can bring life to characters who are trying to talk and can also act as instructors or narrators for various mediums like teaching, conferences, introductory videos, or anything that basically requires a video.
Built With
- aiohttp==3.8.3
- docker
- google-cloud
- html
- pil
- python
- requests
- streamlit
- uberduck

Log in or sign up for Devpost to join the conversation.