Although numerous video and audio call applications exist, there is no convenient way for users to communicate through a language barrier. One user or another must always use a third-party translation method that makes communication much slower and less efficient. Even meetings at the level of international organizations such as the EU are conducted with several in-person translators. A real-time translation software would be revolutionary at all levels: everyday conversational calls, corporate meetings, and even multinational organizations.
What it does
GlobeTalk is a web-based application that provides real-time speech translation over video calls for multiple users. With the increasing globalization of business and technology, the need for a tool that can bridge language barriers has become more pressing than ever. GlobeTalk aims to fill this gap by providing a platform where users can communicate with each other in real-time, regardless of their native language. Unfortunately, we only had time to make the separate components of this project and not integrate them.
How we built it
We split the program into three main components: translation and voice recognition, web-based voice call software, and the front-end design/UI for the web-based voice call software. This is the approach we chose because we believed this would allow us to work in parallel for the 3 days of the competition and allow us to work on what our skills were specialized in. Our plan was to develop the separate components independently, test them, and then get together and integrate them near the end to provide a fully fleshed-out application.
Challenges we ran into
Our initial plan for this project was to create a Python back end and so we started by trying to build a video chat through Python. One of our goals for this project was also to implement voice cloning, so we found an open-source software and tried to train it but realized the implementation would be much too slow, and we wanted the chat to run as close to real-time as possible. We also came to find that the backend would not be feasible through Python, so we had to research other technologies that would support a video call platform, and eventually found a GitHub repo that utilized Node.JS and other web technologies like Socket.io and WebRTC. We stripped that repository down to what we needed for implementation and created the simple video call software. We wanted to add another layer to the translation, and we thought ASL would be a neat data-driven addition that would add complexity to our software. So, we pulled another GitHub repo that we believed could implement this and tried training the AI. Still, after hours of training, the model was mediocre at best and would need much more extensive training before we could implement it. On top of this back-end development, we were also working on the front-end code using React and CSS to start developing our website. Unfortunately, we encountered many more roadblocks on both ends, working with technologies that were unfamiliar to us. WebRTC became very frustrating to use, and we were not able to incorporate all our components into the web page in time.
Accomplishments that we are proud of
We are proud of our Python code which can take in audio input as a file or directly from the microphone, process it, translate it, and output it in the respective language. On top of that, being able to produce super simple web-based video call software was something we are proud of considering that we had to overcome numerous obstacles that we did not predict having to face at all.
What we learned
We learned that the implementation of projects can be much more difficult than they seem, especially when the technologies used are unfamiliar. Next time we will consider implementation more so when planning.
What's next for GlobeTalk
We encountered many challenges implementing the initial plan for our project, therefore our final result lacks the elaboration we had hoped for. This, however, leaves room for later improvements.
Firstly, we plan to integrate the components of our software into one concise webpage. This will take much more debugging on both the front and backend.
Secondly, we plan to add the ASL language to the selection as well as implement subtitles, which would promote this project to an even broader audience. We would do this through more extensive training and refining of the AI model that reads the ASL alphabet as previously mentioned.
Lastly, we plan to add voice cloning to our system. This would require finding a more efficient method of implementation so that the delay from input to output remains reasonable for a conversation. Applying voice cloning would be a major step in making this application user-friendly.