Hate Speech Detection Using HateXplain Dataset

Inspiration

The growing prevalence of hate speech on social media platforms like TikTok inspired us to develop a solution that promotes a more inclusive online environment. We aimed to use advanced machine learning techniques to accurately detect and classify hate speech, thereby assisting content moderators in maintaining community standards.

What it does

Our project employs a BERT-based model to identify and categorize hate speech in HateXplain dataset. We finished 3 tasks. text label Classification, target group Classification and token tagging. Then we used performance metrics like accuracy, bias, and explainability to assess and analyze the outcomes.

How we built it

Data Preprocessing

Tokenization Posts are tokenized using the BERT tokenizer (bert-base-uncased).

Majority Voting for Labels and Target Groups The dataset contains posts that are classified into three labels ("hatespeech" , "normal", "offensive"). Also, each post is identified a target group like race, religion, and gender. Consequently, by reviewing every post in the dataset, the annotations are totaled up using majority voting to identify the dominant labels and target groups.

Rationales Processing Softmax scores are computed by processing the given rationales to determine the relative value of each token in the post.

Data Structuring Each post is structured with the following components: post_id: Unique id for each post input_ids: Input tokens of each post target_group: Predominant target group derived from majority voting label: Predominant label derived from majority voting rationales: Softmax scores indicating token importance within the post

Data Saving Then preprocessed data is divided according to file post_id_divisions.json and saved into distinct files (test_data.pkl, train_data.pkl, and val_data.pkl).

Model Training

Model Selection As BERT(Bidirectional Encoder Representations from Transformers) has consistently demonstrated strong performance in text classification or other textual challenges, we began with the BERT base model and chose the bert-base-uncased model.

Model Architecture The core of the model architecture is a custom implementation, named ‘BertForMultiTaskClassificationAndTagging’, builds upon the foundational BertModel from the transformers library. This unique model is intended for multi-task learning and addresses two different tasks: Task1: Classification of Hate Speech Three classes of texts are distinguished in this task: "normal," "offensive," and "hatespeech." Based on the input text, it makes use of a classifier (classifier1) to predict the correct label. Task2: Classification of Target Groups Text must be categorized according to its affiliation with several target groups, such as "Miscellaneous," "Race," "Religion," "Gender," and "Sexual Orientation." It uses a second classifier (classifier2), just like Task 1, to determine which target group the text belongs to. We specify that each text in this assignment can only be a part of one community.

The model also includes a tagging mechanism that makes use of attention mechanisms to tag rationale. The attention ratings from the final layer of the BERT model (last_layer_attention) are collected and averaged.

Model Training Both task 1 (classifying hate speech) and task 2 (classifying target groups) use nn.CrossEntropyLoss() as loss function. Apart from that, a new loss function named MaskedBCELoss is built for the tagging reason task. With the use of binary cross-entropy loss with masking, this loss function, MaskedBCELoss(), makes it easier to tag rationales by ensuring that only pertinent points in the input sequence are used to calculate the loss.

The model was trained for five epochs. The optimizer chosen for model parameter updates is Adam optimizer with weight decay (torch.optim.Adam()). This is because our model fits well for this optimizer, which offers effective updates while reducing overfitting with weight decay regularization.

Challenges we ran into

At first, we used traditional machine learning models. The performance of such models is about the same as the least effective model reported in the initial study done by Mathew et al. (Mathew et al., 2021), suggesting that applying conventional machine learning methods to this problem yields no appreciable advantage. Then we turn to deep learning models. As for explainability, our model shows lower scores in explainability metrics compared to the top models, such as BiRNN-HateXplain [Attn] with an IOU F1 of 0.222 and Token F1 of 0.506. This implies room for improvement in how well our model's explanations align with ground truth.

Accomplishments that we're proud of

In terms of task 1 performance, our model's accuracy, Macro F1 score, and AUROC are slightly lower than the best-performing model (BERT-HateXplain with 0.698, 0.687, and 0.851, respectively) in the study of Mathew et al. (Mathew et al., 2021).

On the other hand, our model does rather well in task 2. The results of accuracy, Macro F1 score, and AUROC are quite competitive. This demonstrates the potential of our approach for multi-classification challenges.

Furthermore, our model excels the best model results (BERT-HateXplain with 0.807, 0.745, and 0.763, respectively) in the work of Mathew et al., showing high in terms of bias measures (Mathew et al., 2021). All values are competitive compared to BERT-HateXplain model.