Non-verbal children can be of vocal impairments, or hearing impairments that hinder their ability to communicate verbally and not able to connect with teachers in classroom due to online mode of learning.
Augmentative and Alternative Communication (AAC) refers to the tools and strategies used to assist people who have little or no speech to communicate. AAC apps assist individuals in forming phrases and sentences using symbols and predictive keyboards. Augmented and Alternative Communication (AAC) apps for adults and children can be used for varied reasons from a temporary disability to a permanent one.
What it does
Sampark offers augmented and alternative communication to people with speaking disabilities. A lot of other solutions exist with in-person mode ( Awaz, CoughDrop) that augment communication with help of pictures and symbols converting them to voice and/or text, but due to current situation of repeated lockdowns in-person communication is not enough, Sampark also offers virtual communication with video calling and sign language detection. Hence, user can connect and communicate with other people using two modes, virtual and in-person.
* Pictorial Text to speech. * Predicts the sentences with words. * Video call feature with additional help of pictures and text communications. * Sign Language to text support while video conferencing. * Control mouse using face for children who can't use sign language/ mouse. * Snap photos to speech. (Obj detection), Crowd-sourcing. * Keyboard accessibility * Cross-platform web app.
How we built it
Brief Elaboration of Services
1. The Web App:
The web app is written in ReactJS and has three major screens, the landing page, the picture board and the video call component. The landing page is quite simple, just contains links to the two other parts. The pictures board which supports in-person communication contains a lot of pictures and symbols resourced from Mulberry, ARA, SAAC and Global Symbols. When you click on a picture, it adds the underlying text to a top bar, which is basically the content of what will be converted to speech once 'speak' button is clicked. The picture board is divided into categories, which when opened will contain more relevant pictures and symbols. There is also an option to add your own pictures to the board, though it's only a prototype and the back-end service to support the operation does not exist as of now.
2. TURN Server
We have created a video chat using the WebRTC and WebSocket protocol.
3. Typing Augmentation API (Autocomplete)
This is similar to what google keyboard autocomplete works, there are three parts:
- It recommends the word you're typing: via a Trie containing 100k+ words.
- It recommends the next word: via two models, both based on LSTM, trained on two different datasets.
- It fixes spelling errors: via the same Trie. We have plans to anonymously store the data user types in and later use the same data to train our models, like a feedback loop.
4. Fingerpose Classifier (Sign detection and its conversion to text)
Finger pose classifier uses the hand landmarks detected by TensorFlow.js' handpose model. As of now it can detect hand gestures like "Victory", "Thumbs Up" , "I love you" , "Thumb Down" , "Hello" inside a webcam source picture.
5. Image storage and retrieval API
This service intends to let user create private and public boards for pictures and/or pictures that are not currently available in the application. The back-end service is not implemented yet, but we have the front-end prototype ready and is present in the app. The back-end service when implemented can store the pictures in S3 bucket and save the reference with name provided by user in a persistent storage. To create boards or categories, there can be additional field for that. Further extending the serve both public and private boards will require authorization support.
6. Additional services
- Mouse control with face and eyes: For people with motor disabilities, we thought that this will be a nice optional feature to have, optional because it takes heavy resources to be able to run. Though this feature was not eventually integrated with the application, but we made a working prototype in a jupyter-notebook. It basically lets the user able to control the mouse pointer with his or her face and click through eye blink.
2. Keyboard accessibility
We tried to put different Aria-labels at all the different components in the react code to be able to support keyboard accessibility when operating system has accessibility mode turned on. There is a lot of scope on this, we can put more labels, support keyboard shortcuts and can collect user data anonymously to see what keyboard shortcuts user is using more in order to have better understanding of user behaviour.
Reactnless noted, additional info is for judges and h
WebRTC & Web Sockets
Challenges we ran into
Initially, we faced a lot of problems while collecting and then preparing dataset for the models used in Autocompletion APIs. Shortly after, out laptops were not able to handle the load of fingerpose which detects hand movements because of heavy computations. But we overcame these problems and shortly were able to reach the solution. We also wanted to deploy the application, but due to time constraints, we could not do that.
Accomplishments that we're proud of
We are proud of the fact that this app has a huge practical application and can be used to help countless people.
What we learned
A lot of the technologies we used were new to us, hence we learnt a lot while applying them. We also learnt how to explore end-user problems and how to push ourselves in their shoes.
What's next for Sampark
We think that the project has a lot of potential and can solve many problems faced by the user.
- Integration to other apps: The intention is to develop Sampark to be able to be used as a single communication device while communicating with any entity, be it humans or computer applications, for example, Sampark can simplify input for other apps like YouTube, Google etc. We aim to provide a Generic Input Interface which can be used at any place where the user needs to provide input. 2. Alexa Support Alexa Support which takes text as an input and respond to the sentences typed by the user for better development of the child.
- Multilingual Support: Support multiple languages, with emphasis on Indian regional languages.
- Port into cross-platform mobile app. A mobile app will provide a more portable solution, but to achieve that, we need to improve and innovate design and user experience for mobile devices.
- Sharable Public Boards Crowdsourcing and sharing images and symbols used in the app will help us increase the number of ways user is able to express them very quickly.
- Improve video calls There is a lot of scope of improvement in the video calls, the overall UI, the mute and video toggle buttons, and the positioning of quick chat section. It is also intended to add all the pictures from the in person communication to the video call section also, based on the user search rather than showing all of them at once. We also intend to increase the size of video calls to more than 2 people at once, which also brings issues like what happens when multiple users start speaking at once.
- Authentication and Authorization Personalised behaviour to some extent for example private picture boards and recommending phrases tailor made for the user, all made possible through user data.