Inspiration
During our keynote speech from Amitabh Varshney, we were inspired by the ideas of using augmented and virtual reality to improve communication between people. We thought a big step towards a more connected world would be the ability for people of any language to easily communicate with each other by converting their speech to translated text using augmented reality. Having speech translated to text would also enhance the ability for deaf people to communicate with people by reading text transcriptions of ongoing conversation.
What it does
Our program takes audio input from the Microsoft Hololens device and converts any speech detected into text. This text is then run through google translate to change what was said into the preferred language. The translated text is then displayed into the Hololens view. An options button allows for the user to change which languages they want their speech translated into.
How we built it
The speech to text converter was made using a Microsoft library. We accessed google translate through the google API service. The rest of the project was done in Unity.
Challenges we ran into
Getting a computer to run our programs and connect to the Hololens led to several hours of frustration but eventually, a working computer was found. We also tried implementing an imageTarget to allow for a text bubble to appear when the specified image was shown. (ideally on a person's forehead to make the text bubble look like it is coming out from them as they speak) Unfortunately, this led to a very shaky and uncooperative imageTarget display that made reading the text bubble very difficult, which required us to digress to a HUD display for the text bubble.
Accomplishments that we're proud of
We successfully made a functional translator. If anyone (except for blind and illiterate people) wanted to seamlessly communicate with people they would otherwise not understand, they could do that so long as they had enough money for a Hololens.
What we learned
We learned how to make use of an API in an AR application. We also learned how to work with unity in creating a HUD display that shows dynamic text information. Even though we did not end up using the imageTarget ability, we did learn how to use it and could potentially apply this into future projects. Probably the most important lesson learned is that things take time. Some things took more than others and some things took way more time than they should have but these are just the kind of obstacles we have had to adapt to in development.
What's next for Augmented Reality Translator - A.R.T
In the future, we hope to apply a special type of object targeting that will allow for people to be identified so that text bubbles will appear over their heads when they speak. In order to differentiate between people, we will need a voice locator that can pinpoint where any received voices are coming from and add the text bubble to the appropriate person. Lastly, the ability for text to be inputted and turned into speech through our application would allow for further communication (a blind person could talk to a deaf person!) The scale of an application like this is vast. We could soon live in a world where there are no language barriers. A place where everyone is interconnected in real and virtual worlds where differences (at least in language) would no longer exist.
Log in or sign up for Devpost to join the conversation.