Inspiration
Most sign language systems in AI focus on narrow datasets, leaving speakers of many languages underrepresented. To address this, I self-created a dataset of 11,025 samples across 49 classes.
- This dataset is not publicly available, as it was curated specifically for this project.
- Each sample was generated by capturing a hand gesture and processing it through MediaPipe.
- MediaPipe extracted keypoints (landmarks) and connected them into a skeleton-like representation of the hand (points + lines).
Instead of raw hand photos, the dataset contains these skeletonized images, which makes the model:
- Less biased by skin tone, background, or lighting.
- More robust and generalizable.
- Privacy-preserving.
What it does
A real-time Hindi Sign Language Recognition System that translates hand gestures into Hindi characters, represented as English transliterations or phonetic sounds for pronunciation with a model accuray of 92%. This project is an attempt to prove that AI is meant for everyone, bringing people closer by bridging communication barriers through gestures and language.
This project was created for the Code With Kiro Hackathon (Best Wildcard / Freestyle Category). It highlights how AI can promote inclusivity, accessibility, and diversity ensuring that technology benefits everyone, regardless of language or ability.
How we built it
The recognition model is a Convolutional Neural Network (CNN) built specifically for skeletonized gesture classification.
CNN Structure:
- Input Layer: Skeletonized hand images from MediaPipe keypoints.
- Conv2D Layer 1: 32 filters, 3×3 kernel, ReLU activation.
- MaxPooling Layer 1: 2×2 pool size.
- Conv2D Layer 2: 64 filters, 3×3 kernel, ReLU activation.
- MaxPooling Layer 2: 2×2 pool size.
- Flatten Layer.
- Dense Layer 1: 128 neurons, ReLU activation.
- Dropout Layer: 0.5 (to reduce overfitting).
- Dense Layer 2 (Output): 49 neurons, Softmax activation (for 49 classes).
Performance:
- Achieved 92% accuracy on the validation set.
- Balanced recognition across all 49 classes.
- Optimized for real-time inference using TensorFlow and GPU acceleration.
This accuracy demonstrates the strength of using skeletonized MediaPipe data combined with CNNs.
Challenges we ran into
Python library issues mainly
Accomplishments that we're proud of
- 92% accuracy of the model
- Code predicts everything in real -time and works accuractelywith a rela-time confidence score
What we learned
Kiro IDE, CNN model, Tensorflow, Keras, Numpy
What's next for Hindi Sign Language Recognition System Real Time
Planned enhancements include:
- Sentence Builder: Combine recognized letters into words and full sentences.
- Multi-Language Mode: Output sentences in both Hindi & English.
- Pronunciation Extension: Expand phonetic mapping for better non-native accessibility.
- Text-to-Speech (TTS): Convert recognized sentences into natural speech.
- Custom Gestures: Let users define and map their own gestures.
Log in or sign up for Devpost to join the conversation.