Personal Inspiration

For a lot of us, staying in touch with relatives from another country faces the language barrier - conversations in disjointed English and Hindi, for instance, end up being more difficult than engaging and meaningful. All of our group members have had specific experiences with this and through research realized it was important not only for us, both for people around the world.


As touched on in our video presentation, we aim to reduce severe disparities that exist in video calling software. One avenue is live translation, which can bring people together by allowing them to comfortably talk in the language they want, and hear others in the language they want. A second avenue is live translation of signed phrases to allow the hearing impaired (many of whom fail to successfully learn spoken language) and those with speaking disabilities to have their messages read aloud quickly and seamlessly.


Linglide is a web app that serves as a Zoom plug-in to generate real-time audio or video translation between two languages. Communication is powerful, and we hope that Linglide can help empower those for whom quality online video-based communication isn’t a given. Our solution bridges the language barrier between two Zoom users by using natural language processing to translate one speaker’s words into the other’s preferred language, all in real time. And Linglide brings Zoom accessibility to American Sign Language users who want to communicate with non-users by using image recognition (machine-learning) algorithms to translate hand signs into spoken word.

Note: Due to time constraints not every feature was able to be demoed in our video. We have provided images in the carousel above to help visualize some of our descriptions :)

We 1) harnessed Google's Media Translation service for speech-to-speech live translation, and 2) created our own dataset of images depicting American Sign Language gestures for popular phrases such as "Hey, how are you?" and "I love you" and built a convolutional neural network for classification via the Ximilar platform. The React/Node+Flask web app channels the video/audio stream into the appropriate service (Google or Ximilar), and then channels spoken output into Zoom

Tech Stack

  • React for UI
  • Flask for Sign language translation script
  • Node for Speech-to-speech translation script
  • Ximilar for sign/gesture image recognition
  • Google Cloud > Media Translation for speech-to-speech streaming-based translation
  • Google Cloud > App Engine for deployment


As we couldn't get it into the video, we've provided an image showing our sign language detection abilities within the app.

+ 3 more
Share this project: