content-moderation-cookbook

Inspiration

This project was an introduction to LLMs for content moderation. In particular, I wanted to explore how to define relevance (depending on how broad or narrow the context of the classification problem is).

What it does

The system runs some classification of google reviews dataset based on whether it does or doesn't violate policies for relevance, spam and toxicity. It does not take the location into account or perform multi-modal classification based on the images, however.

How we built it

Google Colab with Kaggle and HuggingFace
Libraries such as pandas, pytorch and sklearn

Challenges we ran into

Fine tuning training is slow - requires some performance engineering.
Lack of labelled data specific to location / reviews that are irrelevant.

Accomplishments that we're proud of

Reasonably accurate results but can be fine tuned further

What's next for content-moderation-cookbook

Using larger datasets for location based inference as well as greater samples on which to run the tests. Fine tuning the model on more positive expressions that might still be spam - at the moment, a large proportion of comments classified as "spam" are negative reviews, which might skew the overall rating of locations that truly deserve the negative. From a UX perspective, a simple workaround might be to encourage the user to leave more descriptive reviews OR encourage them to remove the short reviews and simply use the rating. Accounting for informal variants of English including contractions and colloquials, as well as emoji / unicode. Multi class classification instead of binary - the model is suited for overlapping class classifications (spam + toxicity + irrelevant) - for classifications at higher granularities. Different ways of modelling the problem - multiple choice classification or prompt based classification to generate labels for unlabelled data and test against a separate corpus of labelled data. Hyperparameter tuning - the training process takes a significant amount of time, so only one run was performed optimised for reduced speed, however, tuning the parameters might provide different sets of results and possibly greater accuracy on the test data. At the moment validation and training are both performed on the labelled data whereas the efficacy of the performance on unlabelled data can be verified less precisely (aside from manual review). To address the above, human-in-the-loop methods can be employed to rerun the same train-test loop after annotating any false positives or false negatives in the tested data.

Built With

colab
keras
python

Updates

Raisa Kabir started this project — Aug 30, 2025 08:09 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.