Toxic Trap AI

Example for testing a Non - Toxic comment
Example for testing a Toxic comment with multiple labels

Inspiration

The internet was meant to bring people together, but every day we see online spaces becoming more polarized. Increasing hate speech and toxic behavior are making digital communities unsafe for many. We believe that people need to be kind, and we wanted to build a tool that doesn't just block words, but understands the intent behind them. This inspired us to create Toxic Trap AI—a system designed to ensure safety without compromising the flow of conversation.

What it does

Toxic Trap AI is a high-performance moderation engine that identifies harmful content in real-time. It provides a dual-layered analysis:

Binary Classification: A strict safety check that flags content as either Safe (0) or Toxic (1) for immediate moderation.

Deep Insights: If a comment is flagged, the system breaks it down into specific categories like threats, insults, or identity-based hate, allowing admins to understand the severity of the violation.

How we built it

We built the core engine using XLM-RoBERTa, a powerful multilingual transformer model.

Backend: We used Python and PyTorch to fine-tune the model on over 150,000 labeled comments, specifically optimizing it for the nuances of multilingual code-switching (English + Indian languages).

Frontend: The dashboard was developed using Gradio, focusing on a clean, enterprise-SaaS aesthetic that provides real-time "Neural Scans."

Data Pipeline: We implemented a custom preprocessing layer to handle noisy text and ensure the model remains accurate even with slang and informal language.

Challenges we ran into

One of the biggest hurdles was the sheer size of the multilingual models; managing 800MB+ weights while ensuring sub-second inference speed was tough. We also faced challenges in balancing the "Binary" requirement of the competition with our desire to provide "Multi-label" depth. We solved this by designing a dual-engine pipeline that runs both checks simultaneously without lagging.

Accomplishments that we're proud of

We are incredibly proud of achieving a high accuracy score (Mean ROC-AUC of 98%+) during our testing phase. More importantly, we successfully built a UI that feels like a real-world product, capable of being integrated into actual community platforms rather than just being a classroom experiment.

What we learned

This project was a deep dive into Transfer Learning. We learned how to take a massive pre-trained model and "teach" it the specific toxic nuances of our local digital culture. We also improved our skills in UI/UX design, realizing that a tool is only as good as its usability for the end admin.

What's next for Toxic Trap AI

Our vision for Toxic Trap AI doesn't end here. We plan to:

API Integration: Develop native SDKs for Discord, Telegram, and Slack bots. Contextual Threading: Analyze entire comment threads to detect sarcasm and persistent harassment. Visual Moderation: Use OCR to extend our safety shield to images and memes.

Built With

colab
hugging-face-transformers-model:-xlm-roberta-(fine-tuned-multilingual-transformer)-frontend:-gradio-(custom-css/html-for-saas-ui)-data:-numpy
pandas-(150k+-dataset-handling)-infrastructure:-google-colab-(t4-gpu)
python
pytorch
vs-code

Submitted to

NeuroLogic '26: Global NLP Datathon

Created by

I also coordinated with team members to manage tasks, track progress, and ensure smooth collaboration, contributing to accurate results and timely completion of the project.

priyansh koshti
Naisha Narula
mallika bharsakle
Kashish Bhargava

Updates

Kashish Bhargava started this project — Apr 25, 2026 12:21 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.