EasyCC

Inspiration

As the Coronavirus pandemic continues to impact our lives, students are forced to stay at home and deal with the difficulties that come with online learning.

Personally, we have struggled with connection issues, professors who speak unclearly, and noisy environments. We can only imagine what lectures are like for students who have a language barrier, struggle with hearing impairment, or do not have access to a quiet and comfortable learning environment.

For our hack we wanted to tackle this problem and create a tool that helps improve learning experiences and make classes more accessible for struggling students, with hopes of making a positive social impact by helping people communicate during a time filled with challenges and uncertainty.

What it does

EasyCC is a chrome extension that provides real-time closed captioning for any audio source running from your computer. EasyCC supports all platforms including Zoom, Collaborate Ultra, Discord, Google Meet, and can even transcribe Youtube videos!

How we built it

We first prototyped the UI in Figma and developed the front-end for the chrome extension using HTML and CSS. Using Node.js, we then integrated tools that allowed us to capture audio from the desktop and process speech into text using Google Cloud’s Speech-to-text engine. Using socket.io, we relayed the transcripts to our front-end to be displayed in real-time for the user.

Challenges we ran into

Most of the issues that we ran into were related to the backend and its integration. In particular, setting up our software architecture was challenging because we need to continuously pass large amounts of data from the backend to the frontend, which requires us to have a good understanding of how the web works and how each component interacts with each other. Since calling the Google Speech To Text API must be done in the backend, we had to effectively integrate it to the frontend so that the transcribed messages are displayed approximately in real time. The main hurdle was the lag due to the constant calls between the frontend and the backend, which required us to integrate Socket.io into our codebase, another feat in and of itself. Initially, the audio stream would not record which we discovered to be a permission issue outside of our code, so we had to address that issue in order for the Google Speech To Text API to work. Oftentimes, the documentation for the APIs are hard to understand due to a lack of explanations and examples, so we had to engage in some trial and error and adapt the code to meet our needs in the application.

Accomplishments that we're proud of

We are proud to have created an application that improves the experience of online lectures using off the shelf technology. We wanted to keep the application straightforward so that we can have it running quickly. Despite having little familiarity with web development and chrome extensions, we managed to create a frontend and backend, and more importantly, link these two together to create a functional application. In the process, we gained exposure to relevant web technologies and picked up researching skills, which is critical to software development. Also, we collaborated effectively to polish our ideas, offer different approaches to solving complex problems, and complement each other’s skills. Finally, we learned how to seek help from mentors effectively, being able to identify issues beyond the scope of our knowledge and research, and using the pointers they provided to devise an effective solution.

What we learned

Since we all had little experience with web development, this was our first time using the relevant technologies in an integrated manner. In particular, connecting the frontend and the backend is a major challenge that we are proud to have completed, enabling us to better understand the architecture of web applications.

We learned a lot about the services we used, namely Chrome Extensions, Google Speech-To-Text API, Socket.io. This was our first time using these resources, and we are very happy with how we used them in our application.

Since our program is constantly communicating between the frontend and the backend, we decided to use Socket.io to facilitate these interactions as it is designed for instant messaging. This vastly improves the performance when displaying the transcribed message on the overlay compared to constantly making HTTP calls. Error diagnosis is a constant thing we dealt with when developing software, especially when incorporating unfamiliar APIs to our codebase. In particular, although the Google Speech to Text API seemed imposing upon first glance, we are able to read through the documentation, understand what the code is doing, and identify errors preventing the service from running correctly. This was a great experience to us and we have been exposed to several great services during this hackathon.

What's next for EasyCC

EasyCC has a lot of potential to become a viable captioning service. We hope to add features that will improve our extension and make it even more accessible and useful. For one, we would like to use a translation API, which will connect users all over the world, allowing them to communicate and understand different languages. We could also potentially publish EasyCC onto the Chrome Web Store, so that our service is readily available to anybody.

Built With

Submitted to

nwHacks 2021
- Winner Honorable Mention

Created by

I worked with Shuhaib on capturing the audio and designing the front end with html and css

Yale Wang
I worked on the configuring the live transcription using Google Cloud Speech-to-Text, as well as piping the data to the front end to be displayed

Benji Li
I worked on obtaining the credentials required for using the Google Speech To Text API and integrating the frontend and the backend of our Chrome extension.

Andrew Wang
I worked on capturing desktop audio, developed the chrome extension as well as frontend and UI.

Shuhaib Mehri