Inspiration

When I was younger I learnt how to read before I knew every word in the English language, so my parents would label certain items, so if I saw a cupboard, per se, and I didn't know what a cupboard was called, I could read the label and infer that a cupboard is called a "cupboard".

What it does

AR Translate is intended to work in a similar fashion, by drawing bounding boxes around objects in an image and labelling them not just in English, but in a language of the user's choice. This makes it a potential tool for learning new languages - if you know how to read the alphabet of a language, but don't know what an item in front if you is called, you can use AR translate to identify it, and its name in that language!

How I built it

Some JavaScript is used to display a live video feed from a camera on the user's device on a webpage, and send snapshots from it to a Flask server written in Python, along with the user's desired labelling language. The Flask server then parses the image and makes a request to Google's cloud vision API to identify the objects in the image (their names and where they are). It then makes requests to Google's cloud translation API to translate the names of each of the objects. The Python Imaging Library is then used to draw bounding boxes around the objects in the image, and their respective labels. The Flask server then returns the annotated image to the user and it is displayed in the webpage below the live feed.

Challenges I ran into

We wanted to use Google's cloud functions API to avoid having to have a back end, at least for a proof of concept, but I couldn't figure out a way to do it in time. Sophia attended the Flask tutorial so we decided to write a server in Python using Flask instead.

Accomplishments that I'm proud of

None of us had ever used any of the Google cloud APIs, Flask, or GitHub before.

What I learned

Some of the team members learnt how to properly use GitHub, and some new uses of JavaScript and HTML for realtime video processing. Some of us also learnt how to produce basic Flask servers in Python, and how to use some of the Google cloud APIs. This is the first time any of us had worked in a team on a project. We also learnt that Josh doesn't need any sleep (i really do i want to go to bed pls thank).

What's next for AR Translate

Perhaps a speech function, so AR Translate will pronounce the word in the chosen language for you, so you know how to correctly pronounce it. It could also put the word in a sentence to see how it is used in context.

There is a bit of an issue with translation, discovered when someone translated an image of their watch to French - it came out as "Regarder", the French verb to "watch" (not the noun corresponding to "a watch"). A quick fix to this was to prepend "A " to the object names the cloud vision API returned, to make them implicitly nouns. I feel like a better solution to this problem could definitely be found given more time.

There are also some issues with browser support, but it is fully functional within chrome on all platforms we have tested (iOS, android, Windows, and OSX).

Built With

Share this project:

Updates