Learning new language vocabulary through an interactive AR experience.


  • Hector Castillo - MIT '20 | Integration, Infrastructure
  • Evan Hostetler - MIT '22 | Hardware Specialist, Branding
  • Anthony Nardomarino - MIT '22 | Object Classification Specialist
  • Tony Terrasa - MIT '21 | AR Specialist, Integration
  • Grady Thomas - MIT '23 | Translation API Specialist


We saw a huge opportunity to transform the way people learn languages by using computer vision to reimagine the most effective language learning strategy: immersion. At Vocab Viz we are driven not only to help the world learn new languages with practical tools, but to bring the world a little closer together.

What It Does

VocabViz is a way to learn vocabulary in different languages by detecting what an object is and translating it in real time using the camera on your device.

How We Built It

VocabViz runs on four main technologies. These include:

We start with a video stream. This can be fed in through any input, but for simplicity we chose the built-it webcam for our laptop. A section of the screen is then selected to be recognized and run through the IBM API.

Challenges Encountered

With such a short time span, some of the hardest challenges were to download the proper dependencies.

Another challenge we faced was to determine which of the outputs from IBM’s API to choose to display. We ended up using a system that works like a weighted average by class number and percentage match.

One of the hardest things about working with several parts that get coded by different people is ensuring that the inputs and outputs of the different modules are compatible. We ran into an issue where the output of the IBM API was a string, but the most convenient way to read that information was by reading it out of a JSON into a dictionary.


We were very proud of the integration between our four main technologies that allowed for a functional visual translator. The fact that we accomplished so much within a day with limited coding experience is something we’re very proud of.

What We Learned

We learned how to use object tracking in OpenCV.

Furthermore, this being one of the first major collaborative projects done by some of the members of the team, several team members learned to use Git through this project. We also learned about the importance of documenting and making agreements as early as possible about the formats of the inputs and outputs of different modules. We saw an incompatibility, and because of this, we were able to combat it and sure that the data was passed as effectively as possible through the workflow.

What's Next

This application could be quite powerful as a mobile application. This would give people the ability to learn new vocabulary on the go.

We also foresee several improvements to the user interface. For example, plugging into a dictionary API could give a way to not just translate the word, but give the option for scrolling over to show more information including definition, sentence examples, synonyms and antonyms. Features could be added to the GUI to allow for easier change of language and the ability to see more than just two languages displayed on the screen.

Furthermore, one of the powerful things of IBM’s API is that it gives you several possibilities for the identification of the object. A useful future feature would be to cycle through different identifications as a means to learn different ways to describe the object that you are looking at and trying to describe in another language.


The following will be necessary to run VocabViz

  • Python2.7
  • Pillow==6.1.0
  • OpenCV-Contrib==3.4.4
  • Numpy==1.16.5
  • ibm-watson==3.4.0
  • google-cloud-translate==1.6.0
  pip install opencv-contrib-python==  
  pip install pillow
  pip install numpy
  pip install ibm_watson
  pip install --upgrade google-cloud-translate

In order to run the google translate, you need the private key to access Google Cloud Services. Download your key and make sure the following environment variable is set:

Share this project: