How many times have we been challenged by language barriers in a foreign country? "Oh this thing over here looks nice... But how do tell them that I want it... in Chinese?" This can make learning new languages extremely daunting, especially when faced with fluent local speakers who can recognize right away when you don't know how to say something. Furthermore, it would be too much of a hassle to temporarily learn new words and find a way to review and remember them.
What it does
Our team decided to tackle this issue with the combined power of object recognition, cloud translation, and natural language processing. The app is comprised of a mobile application and a web dashboard. The user selects their desired language, uploads an image of an object, and SnapLang iOS recognizes the object and returns its translated name to the user. This information is then stored on a web dashboard, where the user can access their previous translations with flashcards as a learning tool. The user can also track their progress through visualized analytics in a word-cloud as well as observe statistics of learned languages and subject categories of each word, obtained through natural language processing. To accommodate for users who may not be able to read the translated words, SnapLang provides GCP-powered text-to-speech to guide users towards better pronunciation.
How we built it
The frontend UI of SnapLang is built with Swift for the iOS application and React.js with Semantic UI as the UI kit for the web dashboard. The iOS application takes in image input, recognizes the object, translates it to the desired language, and sends it to the backend for further processing. The web application reads processed data from the backend and displays flashcards, logs, and analytics for the user to manage their learning process. The backend consists of two parts. The first part is a Django REST API to receive the original and translated words, process them using GCP natural language processing, and store the resulting categorized data in an SQL database. The second part is a Node.js REST API that processes words with GCP text-to-speech and serves encoded audio files to the web dashboard to be played back to the client. Ultimately, the two APIs serve to bridge the data sent by the iOS application to a user-friendly display on the web dashboard.
Challenges we ran into
Since three-out-of-four of our team had never used GCP before, learning to integrate four Google Cloud Services into our product was an insightful experience: dealing with different authorizations, the same clients, and different ways to handle requests. Alongside this, working across three different technologies (Python, Node, and Swift) taught us not only technical skills, but also the elegance of creating effective interfaces between multiple applications.
Accomplishments that we're proud of
Related to the above mentioned, we're deeply proud of the success we found in integrating multiple platforms of our app with GCP. Our team was able to fully utilize a number of Google Cloud machine learning services and we created clever workarounds around challenges that we faced such as transporting encoded mp3 files from a Node server to a React client. Ultimately, we were proud to create an application that unites so many pieces of software together elegantly to create a full, unique user experience.
What we learned
We learned to adapt to multiple platforms when integrating software and APIs into our project. Additionally, our team was able to establish an efficient workflow that would benefit us far into the future. We were able to work concurrently on different tasks towards a common goal using different technologies, joining them together relatively seamlessly.
What's next for SnapLang
We would like to implement the following features:
- Suggested curriculum based on previously learned words and categories
- Use GCP's Speech-to-text API to allow speech recognition to learn words in reverse (from foreign word to english)
- Clustering algorithm for suggesting certain words for specific locations
- Camera view for users to upload images within the app
- Catering to travelers, scholars, and universities as a service