LessonTutor

Architecture Diagram

Inspiration

There are 1000s of languages in the world. Nearly all of the most popular languages can be translated live using software such as Google Translate. For true live translation, some software utilize voices via a mic to translate a message without having the user type it in. This works great for spoken languages. But what about the 70 million individuals who either can’t speak or can’t hear and use sign language as a means of communication?? Sign language is a different language in that it can’t be spoken. If it can’t be spoken, then it can’t be translated by any current translation software. So how do we translate sign language? With a camera of course and a bit of AI magic! Cameras are a method of giving computers vision, allowing them to see the world. Therefore, we decided to try and build a sign language translator for ASL (American Sign Language) Refer to this link for ASL alphabet: https://www.startasl.com/american-sign-language-alphabet/

What it does

LearnTutor serves as a learning platform that helps individuals collaborate with seasoned professionals and build the necessary soft skills to advance in their careers. However, unlike many learning platforms, we specifically address the challenges faced by individuals who cannot hear and speak.

How we built it

Frontend UI

The frontend UI is built with Bootstrap, HTML, CSS, Sass, and JavaScript. When a user first logs in, they enter a guided walkthrough of the certification process. Our interface supports multiple user roles (Learner, Administrator, Course Instructor,). Each role corresponds to a different walkthrough experience tailored based on their permissions and needs. Data is persisted through the backend REST API server in the SQLite DB.

Backend Server

Our backend server is built with Django and Python. We have written our server-side code in python for the ability to do static type checking and less error-prone code. The server stores user data & schema, course evaluations, feedback and certificates to SQLite Db. The server exposes several REST APIs, consumed by our UI. The decoupling of server and UI is beneficial because we can scale up the server without impacting the UI, and the UI can be swapped out as needed. Furthermore, we are leveraging the SMTP protocol, which is used to provide users with email notifications related to class schedules.

Agora Integration

We made use of Agora's Web SDK to integrate real-time Audio and Video content delivery into the Application, by using appId and appCertificate from Console. However, to meet the requirements for higher security, Agora recommends upgrading the projects to token-based authentication. Hence we deployed a token generator on the app server and this would generate unique tokens for various users, which expire after 3600 seconds.

PubNub Integration (Agora Marketplace Extension)

We have also leveraged PubNub https://www.agora.io/en/partners/pubnub/, which powers remote interaction and collaboration using chat & notifications. Using this, the users can interact with fellow students as well as lecturers via chat mode, to enable faster communication.

Sign Language Translator (CNN)

This component takes in the live video feed from Agora and splits them into individual frames. A CNN model has been trained on the ASL dataset https://www.kaggle.com/datasets/grassknoted/asl-alphabet containing ~ 90,000 images. The trained model then performs prediction on the live feed sent by the Agora. We support detection of alphabets (A-Z), numbers (0-9) and special characters (space, delete, nothing) which help in better sentence structuring.

Challenges we ran into

We have faced some difficulties in obtaining an accurate Convolution Neural Network (CNN) model. We have achieved this by using max-pooling which made the sharp features of the video feed stand out clearly.
We had some initial trouble in finding out the suitable Agora marketplace extensions that fit to our scenario. As our application is more intended for users who cannot hear and speak, a built-in chat feature made absolute sense. This is why we chose to proceed with PubNub

Accomplishments that we're proud of

We are proud to have mastered new frameworks such as Agora and PubNub in a very short time. We have implemented the app using many different technologies. We made the choice to not use the easiest software and tools, but the right ones for the requirements. For example, even though it was more work for us, we have decided to use python in our entire stack for the ability to do static type checking, leading to less error-prone code.

What we learned

This project helped sharpen our skills as developers and exposed us to new technologies such as Agora, PubNub, CNN models. We explored several research papers to train an accurate Convolution Neural Network. More importantly, we learned about the challenges posed by disabilities and took a stab at providing a solution from our end. This hackathon has inspired us to more actively contribute to this cause and make LessonTutor more accessible

What's next for LessonTutor

Currently, the sign language translation works by translating individual letters. We want to extend this solution, where we can directly translate words, as this would save time and improve communication.