What inspired us

Sign language represents one of the oldest and most instinctual forms of communication, leading us to devise a real-time technique employing neural networks for American Sign Language (ASL) finger spelling. The intention of our project is to aid the people who need assistance with visual and oral communication. Our proposal introduces a Convolutional Neural Network (CNN) approach to discern hand gestures in human activities from camera-captured images. The objective is to identify hand gestures corresponding to human tasks depicted in these images. By incorporating hand position and orientation, we derive the training and testing datasets for the CNN. The hand image undergoes initial filtration, followed by classification to predict the specific hand gesture class. Subsequently, the refined images are employed for CNN training.

American Sign Language is one of the most popular and the oldest sign languages in the world. Since the only mode of communication for the disabled people is through sign language, we wanted to devise a technology that assists them better, this tech will be specially useful in video conferences and college lectures. the text to speech converter is also embedded as a feature in the UI interface to provide more clarity.

What we do

In our project we basically focus on producing a model which can recognize Fingerspelling based hand gestures in order to form a complete word by combining each gesture. The gestures we aim to train are as given in the image below.

How we did it

In this hand detection approach, our initial step involves identifying hands within an image captured by a webcam. To achieve this, we utilize the MediaPipe library, specifically designed for image processing tasks. Once we successfully pinpoint the hand within the image, we proceed to define the Region of Interest (ROI). leveraging the capabilities of the OpenCV library, we enhanced accuracy of our model. Furthermore, we enhance the image quality by applying Gaussian blur, a filter readily accessible through the OpenCV library. Following this, we convert the grayscale image into a binary format using threshold and adaptive threshold methods.

Challenges we ran into

In this approach, several limitations become evident. For effective results, certain conditions need to be met: your hand must be positioned against a clean, well-lit background. However, the reality of the world around us often doesn't align with these ideal conditions. Backgrounds can be cluttered, and lighting conditions can be less than optimal.

To address these challenges, we explored various alternative methods and eventually arrived at an intriguing solution. Initially, we detect the hand within a frame using the MediaPipe framework, extracting its landmarks. Subsequently, we draw and connect these landmarks on a plain white canvas. This innovative approach allows us to work with diverse backgrounds and lighting scenarios, enhancing the method\'s robustness in real-world applications

Accomplishments that we're proud of

Training the data set and having an accuracy that is consistently over 93.

What's next for American Sign Language Detection

Evolving the data set and including higher sample size to include phrases.

Built With

  • cv
  • math
  • mediapipe
  • np
  • opencv
  • pyth
  • tkinter
Share this project:

Updates