Roman Urdu Hate Speech Detection

A machine learning web application that detects hate speech in Roman Urdu text. The system analyzes input text and classifies it as either a normal or hateful tweet.

📋 Table of Contents

Overview
Features
Demo
Technology Stack
Project Structure
Installation
Usage
Model Training
Contributing
License

🔍 Overview

This project implements a machine learning-based hate speech detection system specifically designed for the Roman Urdu language. The system uses natural language processing techniques to analyze and classify text as either normal or hate speech.

The web application provides a user-friendly interface where users can input text and receive immediate analysis results.

✨ Features

Text Analysis: Input any Roman Urdu text for analysis
Real-time Classification: Instantly classifies text as normal or hateful
Responsive Design: Modern UI that works across devices
Custom Preprocessing: Specialized tokenization and stopword removal for Roman Urdu
Machine Learning Backend: Trained on a dataset of Roman Urdu tweets

🎬 Demo

Interface Preview

Screenshot of the application analyzing sample text

🛠️ Technology Stack

Frontend: HTML, CSS, JavaScript
Backend: Flask (Python)
Machine Learning: scikit-learn
NLP Processing: Custom regex-based tokenization
Data Storage: Pickle (model serialization)

📁 Project Structure

roman_urdu_hate_speech/
│
├── app/                      # Application package
│   ├── __init__.py           # Initialize Flask app
│   ├── routes.py             # Application routes
│   └── utils.py              # Utility functions
│
├── models/                   # Trained models
│   ├── model.pkl             # Serialized classifier model
│   └── vectorizer.pkl        # Serialized text vectorizer
│
├── static/                   # Static files
│   ├── css/                  # CSS stylesheets
│   │   └── styles.css        # Main stylesheet
│   └── img/                  # Images
│
├── templates/                # HTML templates
│   └── index.html            # Main template
│
├── notebooks/                # Jupyter notebooks
│   └── Hate_Speech_Detection.ipynb  # Model training notebook
│
├── data/                     # Data files
│   ├── Dataset.csv           # Original dataset
│   └── Preprocessed_Dataset.csv  # Processed dataset
│
├── .gitignore                # Git ignore file
├── config.py                 # Configuration settings
├── requirements.txt          # Project dependencies
├── run.py                    # Application entry point
└── README.md                 # Project documentation

🚀 Installation

Clone the repository:

git clone https://github.com/hassanrrraza/hatefull-speech-detection.git
cd hatefull-speech-detection

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python run.py
```
Open your browser and navigate to:
```
http://127.0.0.1:5000/
```

💻 Usage

Enter Roman Urdu text in the input field
Click the "Analyze Text" button
View the classification result (Normal Tweet or Hateful Tweet)

🧠 Model Training

The model was trained on a dataset of Roman Urdu tweets, which were manually labeled as either normal or hateful. The training process involved:

Text preprocessing (tokenization, stopword removal)
Feature extraction using TF-IDF vectorization
Training a machine learning classifier
Model evaluation and hyperparameter tuning

For more details, see the Jupyter notebook in the notebooks directory.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is available for use under the MIT license. You are free to use, modify, and distribute this code in your work, provided that you give appropriate credit to the original author.

Created with ❤️ by Hassan Raza

Built With

css
html
jupyter-notebook
python

Updates

Hassan Raza started this project — May 06, 2025 03:06 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.