SilentVoice_BD
Inspiration
Communication is a right, not a luxury. Yet for millions of people who rely on Bangla Sign Language (BdSL) as their primary means of communication, this right remains unfulfilled in our daily interactions. According to UNICEF, around 13.9 million people use BdSL as their main way of communication, but most of society doesn't understand it at all. This creates a massive communication barrier that isolates an entire community from participating fully in education, healthcare, employment, and social interactions.
In a world that's rapidly advancing in AI and technology, we realized that this communication gap represents one of the most pressing accessibility challenges in Bangladesh. We were inspired by the potential to leverage modern computer vision and machine learning to break down these barriers and create a more inclusive society where every voice can be heard.
What it does
SilentVoice_BD is a real-time Bangla Sign Language recognition and translation system that bridges the communication gap between the deaf/hard-of-hearing community and the hearing community. The system:
- Real-time Translation: Converts BdSL gestures into Bangla text and speech instantly
- Multi-input Support: Works with both uploaded video files and live webcam feeds
- Interactive Learning: Provides a practice mode with scoring for users learning sign language
- Continuous Improvement: Incorporates user feedback to improve translation accuracy
- Accessibility Features: Offers downloadable transcripts and live group call subtitles
The platform serves different user types:
- Anonymous users get limited video-to-text/speech translation and basic sign library access
- Registered users enjoy unlimited translations, live webcam support, practice modes, interactive correction feedback, and downloadable transcripts
- Admins can manage user accounts, configure access levels, manage AI models, and monitor system health
How we built it
Our technical architecture combines several cutting-edge technologies:
Dataset & Training
- Utilized the BDSLW60 dataset for training our sign language recognition model
- Implemented data preprocessing and augmentation techniques to improve model robustness
- Created additional synthetic training data to expand vocabulary coverage
Machine Learning Architecture
- Built a Bidirectional Long Short-Term Memory (BiLSTM) neural network for sequential gesture recognition
- Implemented advanced sequence modeling to capture temporal dependencies in sign language gestures
- Used transfer learning techniques to optimize performance with available data
Computer Vision Pipeline
- Implemented MediaPipe for real-time hand and body pose detection
- Developed custom preprocessing algorithms to normalize gesture data
- Created a robust feature extraction system that captures the nuances of BdSL
Backend Development
- Spring Boot Java framework for building a robust and scalable REST API
- Designed comprehensive user management and authentication systems
- Implemented secure endpoints for video processing and translation services
- Built feedback collection and model improvement pipelines
Frontend Development
- React.js for a responsive and interactive user interface
- WebRTC for seamless webcam integration
- Material-UI components for accessibility-first design
- Real-time translation display with confidence scoring
Challenges we ran into
Data Limitations
Working with the BDSLW60 dataset presented unique challenges:
- Limited vocabulary: The dataset contains only 60 words, requiring creative approaches to expand functionality
- Data quality variations needed extensive preprocessing and cleaning
- Regional signing variations within the dataset required normalization techniques
Model Architecture Complexity
Implementing BiLSTM for sign language recognition involved:
- Sequence alignment challenges for variable-length gesture videos
- Temporal feature extraction to capture the dynamic nature of sign language
- Overfitting prevention with limited training data
Real-time Performance Optimization
- Balancing model accuracy with inference speed for live translation
- Memory management for continuous video processing
- Latency optimization to ensure smooth user experience
Development Infrastructure
- Spring Boot backend integration with machine learning models required custom solutions
- Cross-platform compatibility for webcam access and video processing
- API design for handling both file uploads and real-time video streams
Accomplishments that we're proud of
Technical Achievements
- Successfully implemented BiLSTM architecture for BdSL recognition using BDSLW60 dataset
- Created a functional Spring Boot backend with comprehensive API endpoints
- Achieved real-time video processing capabilities for live translation
- Developed a complete user management system with different access levels
System Features
- Multi-user support with anonymous and registered user tiers
- Interactive feedback system that allows users to correct translation errors
- Practice mode with scoring for sign language learning
- Comprehensive admin dashboard for system monitoring and model management
Innovation
- Pioneered BiLSTM application for Bangla Sign Language recognition
- Created an adaptive learning system that improves from user corrections
- Developed a scalable architecture ready for deployment and expansion
What we learned
Technical Insights
- BiLSTM networks are highly effective for capturing bidirectional temporal dependencies in sign language
- Spring Boot provides excellent framework capabilities for building ML-integrated applications
- Real-time video processing requires careful optimization of both model architecture and system resources
Dataset Management
- Working with BDSLW60 taught us the importance of data quality over quantity
- Data augmentation techniques are crucial when working with limited vocabulary datasets
- Preprocessing pipelines significantly impact model performance in sign language recognition
Community Engagement
- The importance of user feedback integration in assistive technology development
- Iterative design based on actual user needs leads to better accessibility solutions
- Scalable user management is essential for growing accessibility platforms
What's next for SilentVoice_BD
Immediate Deployment Goals
- Deploy the Spring Boot backend to cloud infrastructure (AWS/Google Cloud)
- Production testing with real users from the deaf community
- Performance optimization for handling concurrent users
- Mobile app development for Android and iOS platforms
Model Enhancement
- Expand vocabulary beyond the BDSLW60 dataset to include 500+ common signs
- Improve BiLSTM architecture with attention mechanisms for better accuracy
- Implement ensemble methods combining multiple model approaches
- Add contextual understanding for more natural translations
Feature Development
- Bidirectional translation: Text/speech to BdSL using avatar generation
- Group video call integration with real-time subtitles
- Offline mode for areas with limited internet connectivity
- Advanced analytics for tracking learning progress and system usage
Community & Research
- Partner with deaf education institutions for wider adoption
- Open-source the BDSLW60 processing pipeline for research community
- Expand to other regional sign languages in South Asia
- Publish research findings on BiLSTM applications in sign language recognition
SilentVoice_BD represents our commitment to leveraging technology for social good. By combining the BDSLW60 dataset with BiLSTM architecture and Spring Boot infrastructure, we're building a foundation for truly inclusive communication in Bangladesh.
Built With
- 2.0
- api
- architecture
- authentication
- authorization
- bdslw60
- bidirectional
- bilstm)
- boot
- caching
- communication
- components
- database
- dataset
- detection
- frontend
- integration
- java)
- json
- jwt
- lstm
- management
- material-ui
- mediapipe
- model
- oauth
- pose
- postgresql
- python/tensorflow
- react.js
- real-time
- redis
- restful
- secure
- session
- spring
- tokens)
- training
- user
- web
- webcam
- webrtc
- websocket
Log in or sign up for Devpost to join the conversation.