Inspiration
VisualLearner is inspired by the concept of making learning a new language easy, fun, and always available. By utilizing Apple's Augmented Reality Kit in conjunction with Google's Machine Learning and Translate APIs, the iOS application provides a fun opportunity for users to be immersed in a new language and constantly learn new vocabulary.
What it does
When you open VisualLearner, your camera opens. You can swipe up and select a language that you know (Your base language) and a language that you want to learn. Close the window to go back to the camera, and now you can point it at any object and tap on your screen. The application quickly takes a picture, and analyzes it for any objects. When it detects an object, it creates a pin on the object showing the word in your base language as well as a translation in your chosen to-learn language.
You can identify as many objects as you want, and your pins will stay in the exact locations where you made them even if you come back to the spot later! Additionally, you can change either your base or learning language (or both!) and all the pins will translate into the new respective languages!
How I built it
We began this project by messing around with Apple's ARKit. Early tests showed that we could place emojis (and eventually text objects) at different places around the room, and that those markers could be placed to give the impression of being "placed at" or "labeling" different objects. After several hours of testing, we realized that we could programmatically edit the text labels placed via ARKit. This sparked the fire for VisualLearner.
Next, we began to tinker with Google's Vision API. We figured out how to take a picture quietly without disturbing the user's user-experience and sent it through the Vision API to be parsed. Upon retrieving the labels the API determined the picture to contain, we took the one it was most confident about and placed it as a label for that object.
Once we were able to label objects in Augmented Reality, we looked towards Google's Translate API. After realizing the complications of multithreaded calls and overcoming the issues they brought, we were able to translate both our base and learning languages and display them into the same tag!
Finally, we began to work towards all the ambitions we have with VisualLearner. Some of our successes include being able to change languages on-the-go, and having all of the tags update in real-time to the the newly chosen languages!
Challenges I ran into
For a seemingly straight-forward project, VisualLearner required us to jump through a number of hurdles. With ARKit being a relatively new technology, there weren't a lot of tutorials available to show us how to accomplish some of our more challenging tasks - for example, editing labels in real-time after they've been created.
Additionally, we ran into numerous problems with Google's Translate & Vision APIs running asynchronously. It took us a little while to figure out what was going wrong -- The logic seemed legitimate, but all of the functions were returning null values. Once we were able to create algorithms to address these issues (usually either nested functions or delays to account for the API), we were back on our feet and progress resumed smoothly.
Finally, working on some of the smaller details within ARKit and Xcode in general proved to be difficult. For example, any buttons that we attempted to place into the screen would be visible sometimes and likewise randomly disappear (We ended up reverting to swipe-gestures). Additionally, pulling up modal windows during operations would confuse the iPhone's sense of location, causing labels to shift to random new locations when the window closed.
Accomplishments that I'm proud of
Coming into Wildhacks, neither of our duo were proficient iOS developers. In fact, one of us had never worked with Swift at all. Since then, our understanding of how to build iOS applications has significantly advanced and we've learned a lot about some of the public Artificial Intelligence APIs available. In under 36 hours we built an application that is capable not just of recognizing objects via AI, but also of superimposing labels on top of them and recalling those labels when necessary.
What I learned
We learned a lot about ARKit and Google's Cloud platform. They provide countless opportunities which we definitely plan to take advantage of in future projects. Regarding the Swift language / programming in general, we gained a lot of experience with asynchronous tasks. While there were a lot of cases where running API calls asynchronously saved time and made our lives easier, there were quite a few times where we had to rewrite chunks of code because we didn't account for the time API calls would take to return and didn't factor that into our logic.
What's next for VisualLearner
We have a lot of ideas for VisualLearner going forward. While the applications is pretty bare-bones right now, we feel that there is tons of room for growth. We love the idea of using our existing technology to create homemade flashcards or interface with different popular applications to enhance the language-learning process and encourage immersion. In the long-term, we want to strive to turn VisualLearner into a go-to language learning application for users of all levels -- whether they are experienced users who want to identify a specific object or beginners to a new language who want to immerse themselves in a world of brand new vocabulary.
Built With
- arkit
- excitement
- google-cloud
- google-translate
- google-vision
- ios
- passion
- swift
Log in or sign up for Devpost to join the conversation.