Inspiration

The lack of robust machine translation models for Kashmiri motivated us to develop IndicMT-Kashmiri. While many major Indian languages have decent translation support, Kashmiri remains underrepresented. Our goal is to bridge this gap and enhance accessibility for native speakers, researchers, and developers.

What it does

IndicMT-Kashmiri translates English text into Kashmiri using the AI4Bharat IndicTrans2-en-indic-1B model. The model leverages deep learning to produce high-quality translations, making it easier to communicate, learn, and preserve the Kashmiri language.

How we built it

We integrated the AI4Bharat IndicTrans2-en-indic-1B model into our pipeline. Preprocessing steps included tokenization and normalization of input text. The model was fine-tuned and tested using a dataset of English-Kashmiri text pairs.

Challenges we ran into

Data Scarcity: Finding high-quality parallel English-Kashmiri datasets was a major hurdle. Model Performance: Ensuring accurate and context-aware translations required experimentation with preprocessing techniques. Deployment Issues: Optimizing the model for real-time performance while keeping resource usage minimal.

Accomplishments that we're proud of

Successfully integrating a state-of-the-art machine translation model for Kashmiri. Overcoming data limitations with custom preprocessing techniques. Developing an accessible tool that can help students, researchers, and language enthusiasts.

What we learned

The complexities of low-resource language translation and model optimization. The importance of dataset curation in improving translation quality. How to deploy machine translation models efficiently in a lightweight application.

What's next for IndicMT-Kashmiri

Dataset Expansion: Collecting more parallel English-Kashmiri text to enhance accuracy. Model Fine-tuning: Exploring domain-specific adaptations for improved performance. Multimodal Integration: Extending the system to support speech-to-text translation. Wider Deployment: Hosting the service on a scalable cloud infrastructure for broader accessibility.

Built With

Share this project:

Updates