Inspiration

Volunteering at a School for people with hearing disabilities open my eyes on how hard the communication barrier can be. Which is why I wanted to use AI to help tackle the problem.

What it does

The project is a website that provides tutorial lessons to teach the user basics of ASL All tutorial lessons is composed of four parts:

  1. Demonstrate the user a maximum 3 seconds video to teach how to do a certain sign in ASL
  2. Allow the user to record a video of themselves performing the sign
  3. A program inside front-end starts checking whether or not the sign is performed correctly
  4. If sign done correctly, the website moves on to the next word(sign). Otherwise, the website generates a video with user's face doing the sign correctly, and restart from part 2.

How we built it

  1. Use react to build the frontend
  2. Use fastAPI to build the backend
  3. A Kaggle notebook for removing unnecessary frames and extracting the necessary frames
  4. A Kaggle notebook for generating the landmarks of user for each sign language
  5. Preprocessed with intent of building a neural-network

Challenges we ran into

  • Finding a good API to do face swap in an efficient and versatile way.
  • Create a text as the start of an accordion that also contains a link to another page of the project (which we gave up)
  • Github stopped working at the last hour
  • Could not get the webcam to work in deployment

Accomplishments that we're proud of

  • Use FILM to extrapolate a group of frames into a video.
  • Managing to develop a website using Typescript without having used the language before.

What we learned

Simon:

  • How to use FastAPI
  • How to use FILM to add new frames between two frames. Hannah:
  • How to use Typescript to develop a webpage
  • How to add style, links, toggles, enabling use of webcam to record video. Kevin:
  • How to pre-process data
  • How to analyze videos using opencv and mediapose

What's next

  • Train a large model based on all the ASL dataset (ASL Citizen or ASL Youtube)
  • Try to 'predict' the next position of user in order to better nudge him/her during the learning process
  • Instead of face-swapping, use stable diffusion with controlNet so that it will be more seamless and 'believable'

Built With

Share this project:

Updates