๐ Sign Language Dataset Hub
A curated, verified catalog of 73+ sign language datasets covering 26 sign languages โ the most comprehensive open collection for sign language recognition (SLR) research, gesture recognition, deaf community technology, and assistive AI development.
๐ฏ Mission
To democratize access to sign language technology by providing:
- Verified dataset links โ every URL checked, sample counts from original sources
- Working tools for loading, visualizing, and processing data
- Demo datasets for learning and prototyping
- Proper attribution to all data creators
Helping developers, researchers, and the deaf community build better assistive technology.
๐ Stats
| Metric | Count |
|---|---|
| Verified Datasets | 73 |
| Sign Languages | 26 |
| Modalities | Video, Image, Sensor, Pose, RGB-D, Skeleton, Text |
| Source Verification | 100% (all URLs checked) |
Datasets by Language
| Language | Code | Datasets | Notable |
|---|---|---|---|
| American Sign Language | ASL | 11 | MS-ASL, WLASL, How2Sign, OpenASL, ASLLVD |
| Arabic Sign Language | ArSL | 2 | ArSL2018, KArSL |
| Australian Sign Language | Auslan | 1 | Auslan Signbank |
| Bangla Sign Language | BdSL | 4 | BdSL47, Ban-Sign-Sent-9K |
| Brazilian Sign Language | Libras | 2 | Libras-UFPR, PHOENIX-Libras |
| British Sign Language | BSL | 3 | BOBSL, BSL Corpus, BSL SignBank |
| Chinese Sign Language | CSL | 2 | DEVISIGN, USTC-CSL |
| Dutch Sign Language | NGT | 1 | CNGT Corpus |
| French Sign Language | LSF | 2 | Dicta-Sign LSF, LSF-Dict |
| German Sign Language | DGS | 3 | RWTH-PHOENIX-2014, PHOENIX-2014T, DGS Corpus |
| Greek Sign Language | GSL | 1 | GSL-50 |
| Indian Sign Language | ISL | 3 | INCLUDE, ISL-CSLTR, ISL-Alphabet |
| Irish Sign Language | ISL | 1 | ISL Corpus |
| Italian Sign Language | LIS | 1 | ATIS |
| Japanese Sign Language | JSL | 1 | J-ASL |
| Korean Sign Language | KSL | 1 | KETI |
| Malaysian Sign Language | BIM | 1 | MSL Dataset |
| Mexican Sign Language | LSM | 1 | LSM Sign Language |
| Russian Sign Language | RSL | 2 | RuSLAN, RSL-Signs |
| Swedish Sign Language | SSL | 1 | SSL Corpus |
| Thai Sign Language | TSL | 1 | TSL-51 |
| Turkish Sign Language | TฤฐD | 1 | AUTSL |
| Multilingual | โ | 5 | SIGN-Hub, Dicta-Sign, SpreadTheSign, OpenSLR, SLP Toolkit |
| Linguistic DBs | โ | 6 | ASL-LEX, BSL SignBank, Auslan Signbank, etc. |
Datasets by Modality
| Modality | Count |
|---|---|
| Video | 35+ |
| Image | 10+ |
| Video + RGB-D + Skeleton | 3 |
| Sensor (IMU/Flex) | 1 |
| Linguistic / Dictionary | 6+ |
| Multilingual Corpus | 5+ |
โ Verification Policy
All dataset source URLs in this repo have been verified. This means:
- โ Every URL resolves (HTTP 200 or auth-gated page)
- โ Sample counts are from the original publication, not estimated
- โ Every dataset has a citation or credit to its creator
- โ No placeholder or fabricated links exist in the active catalog
- โ Datasets we couldn't verify are excluded (not listed with fake info)
Found a broken link? Please open an issue.
๐ Browse Datasets
See DATASETS.md for the complete verified catalog with:
- 67 datasets across 22 sign languages
- Source URLs, sample counts, licenses, and citations for every entry
- Organized alphabetically by language
๐ Literature & Benchmarks
- REFERENCES.md โ Key papers on sign language recognition, translation, datasets, and pose estimation
- docs/BENCHMARKS.md โ Published accuracy numbers from real papers (WER, BLEU, accuracy)
๐ Quick Start
Clone & Setup
git clone https://github.com/rudra496/SignLanguage-Dataset-Hub.git
cd SignLanguage-Dataset-Hub
pip install -r requirements.txt
Use the Demo Data (Bangla Sign Language Sensor Data)
from scripts.data_loader import BdSLSensorGloveDataset
# Load demo sensor data (4,824 samples)
dataset = BdSLSensorGloveDataset(split='train')
print(f"Loaded {len(dataset)} samples, 36 gesture classes")
sample = dataset[0]
print(f"Gesture: {sample['gesture_name']}")
print(f"Sensors shape: {sample['sensors'].shape}")
Visualize Sensor Data
python tools/visualize.py --data data/bdsl/BdSL-Sensor-Glove/
Browse Programmatically
import pandas as pd
df = pd.read_csv('datasets_catalog.csv')
# Filter by language
asl = df[df['language_code'] == 'ASL']
print(asl[['name', 'samples', 'source_url']])
# Filter by modality
video = df[df['modality'].str.contains('Video')]
print(f"Video datasets: {len(video)}")
Download External Datasets
# From Kaggle (requires Kaggle API key)
pip install kaggle
kaggle datasets download -d datamunge/sign-language-mnist
kaggle datasets download -d grassknoted/asl-alphabet
kaggle datasets download -d ahmedkhan123/arabic-sign-language
# From Hugging Face
pip install datasets
python -c "from datasets import load_dataset; ds = load_dataset('banglagov/Ban-Sign-Sent-9K-V1')"
# From Zenodo
wget https://zenodo.org/record/7067906/files/BdSL47.zip
๐ ๏ธ Included Tools
| Tool | Description | Location |
|---|---|---|
| Data Loader | PyTorch dataset classes for sensor data | scripts/data_loader.py |
| Download Script | Multi-source dataset downloader | scripts/download_datasets.py |
| Visualizer | Sensor data visualization | tools/visualize.py |
| Data Generator | Demo data creation utilities | tools/generate_realistic_data.py |
๐ Repository Structure
SignLanguage-Dataset-Hub/
โโโ DATASETS.md # Complete verified dataset catalog (67 datasets)
โโโ datasets_catalog.csv # Machine-readable catalog
โโโ STATISTICS.md # Detailed statistics & breakdowns
โโโ README.md # This file
โโโ CHANGELOG.md # Version history
โโโ data/
โ โโโ bdsl/
โ โโโ BdSL-Sensor-Glove/ # Demo sensor dataset (4,824 samples)
โโโ docs/
โ โโโ BENCHMARKS.md # Published accuracy numbers & comparisons
โ โโโ LICENSE_ATTRIBUTION.md # Per-dataset license & citation info
โ โโโ TUTORIALS.md # 9 tutorials (beginner to advanced)
โ โโโ QUICKSTART.md # Quick start guide
โ โโโ CONTRIBUTING.md # How to contribute
โโโ scripts/
โ โโโ data_loader.py # PyTorch data loaders
โ โโโ download_datasets.py # Multi-source downloader
โโโ tools/
โ โโโ visualize.py # Sensor data visualization
โ โโโ generate_realistic_data.py # Data generation
โโโ .github/ # Issue templates & PR template
โโโ CITATION.cff # Citation metadata
โโโ LICENSE # CC BY 4.0
โโโ requirements.txt # Python dependencies
๐ Tutorials
We include 9 tutorials from beginner to advanced:
| # | Tutorial | Level |
|---|---|---|
| 1 | Introduction to Sign Language Recognition | Beginner |
| 2 | Loading and Exploring Datasets | Beginner |
| 3 | Visualizing Sign Language Data | Beginner |
| 4 | Building Your First Classifier | Intermediate |
| 5 | Hand Pose Estimation with MediaPipe | Intermediate |
| 6 | Data Augmentation Techniques | Intermediate |
| 7 | Real-time Recognition System | Advanced |
| 8 | Continuous Sign Language Recognition | Advanced |
| 9 | Multilingual Sign Recognition | Advanced |
See docs/TUTORIALS.md and docs/QUICKSTART.md.
๐ Citation
If you use this repository, please cite:
@misc{signlanguage_dataset_hub,
title = {Sign Language Dataset Hub: A Verified Catalog of Sign Language Datasets},
author = {Sarker, Rudra and Contributors},
year = {2026},
url = {https://github.com/rudra496/SignLanguage-Dataset-Hub}
}
Please also cite the original dataset creators when using their data. See docs/LICENSE_ATTRIBUTION.md for per-dataset citation information.
๐ค Contributing
We welcome contributions! See docs/CONTRIBUTING.md.
๐ค Why This Repo?
| Feature | This Repo | Typical SLR Papers/GitHub Lists | Kaggle Collections |
|---|---|---|---|
| Datasets | 73+ curated | 5โ20 mentioned inline | 10โ30, unverified |
| Sign Languages | 26 | 1โ5 | 3โ10 |
| URL Verification | โ All checked | โ Often broken links | โ Mixed |
| Sample Counts | From original sources | Inconsistent | User-reported |
| License Info | โ Per dataset | Rarely included | Rarely included |
| Modality Tags | โ All datasets | Partial | Tags vary |
| Tools & Scripts | โ Included | โ | โ |
| Demo Datasets | โ Included | โ | โ |
| Open Source | CC BY 4.0 | Varies | Varies |
| Actively Maintained | โ | Usually one-time | Community |
Rules:
- Every dataset must have a verifiable source URL
- Sample counts must come from the original source
- Include license and citation information
- No placeholder or fabricated data โ ever
๐ Citation
If you use this dataset catalog in your research, please cite:
@misc{signlanguage_dataset_hub,
title = {Sign Language Dataset Hub: A Curated Catalog of 73+ Verified Datasets for 26 Sign Languages},
author = {Sarker, Rudra},
year = {2025},
url = {https://github.com/rudra496/SignLanguage-Dataset-Hub},
note = {Version 1.0}
}
๐ License
This repository is licensed under CC BY 4.0.
Individual datasets have their own licenses โ see docs/LICENSE_ATTRIBUTION.md for details. Some datasets are research-use only and may require institutional agreements.
๐ Acknowledgments
This hub would not be possible without the researchers and organizations who created and shared these datasets:
- Microsoft Research (MS-ASL)
- Oxford VGG (BOBSL)
- RWTH Aachen University (PHOENIX)
- Boston University (ASLLVD)
- IISc Bangalore (INCLUDE)
- CASIA / USTC (Chinese SL datasets)
- Ankara University (AUTSL)
- University of Hamburg (DGS Corpus)
- UCL DCAL (BSL Corpus, BSL SignBank)
- Stockholm University (SSL Corpus)
- Radboud University (CNGT)
- European Sign Language Centre (SpreadTheSign)
- Macquarie University (Auslan Signbank)
- EU Framework Programme (SIGN-Hub, Dicta-Sign)
- Hugging Face community datasets
- Kaggle community datasets
- And many more individual researchers and Deaf community members
๐ Contact
- Maintainer: Rudra Sarker
- Institution: Shahjalal University of Science and Technology
๐ Connect
Built with โค๏ธ by Rudra Sarker
CC BY 4.0 License ยท Free & Open Source Forever
Built With
- jupyter-notebook
- python

Log in or sign up for Devpost to join the conversation.