Result of image recognition and speech
[Table - 25]
Have you ever traveled to a country and you didn't know the local language ? There's some object right in front of you but you don't know what it's called in the local language. For example, Beer is called Cerveza in Spanish. How'd a Japanese person know that ? What if you could have an assistant that can translate things that you see and speak it out for you? Other relevant use cases include describing the scene to blind people and people with eyesight issues. Using real time scene detection, there can be continuous description of the scene in front of the subject so that the subject is aware of the surroundings.
What it does
The web app uses Microsoft Computer Vision API to analyse and recognize objects in image and then pass the text to translate API to translate the text to user selected language. The image is first read by PIL to reduce the file size, making the whole process faster. Once the text is translated, PIL is used to add borders to the image and overlay the original + translated text over the image. The text is also passed to a Text to Speech engine to convert the text into local voice translation. The audio along with overlayed image is then returned to the user which can be saved for later use, as a flashcard.
How I built it
I used Microsoft Computer Vision APIs, Text to Speech APIs and hosted the app on Azure [Ubuntu]. The webapp was built using Python, Flask and requests were served by Apache server.
Challenges I ran into
Getting good python documentation for the MS APIs was a little difficult, but the process wasn't difficult at all. Other than that, configuring the WSGI was a main challenge (thanks to conflicts in Py2 and Py3). Hosting on Azure was a breeze though.
Accomplishments that I'm proud of
The applications of this tech can be huge. I love building products that are useful to people and solve some "real" problem.
What I learned
I learned the different responses that the Microsoft APIs can return. It does fairly well, and is on par with Google Cloud API. Azure was simple to use, it didn't take much time to spin up the instance and connect to the VM. I also learned not to have too many Red Bulls.
What's next for Objectify
I'll be building something for people with vision issues, develop an app that a user can keep in their pocket while the app keeps narrating what is in front of them. Also, a better GUI for the result.