Datasets

We used two publicly available datasets:

  1. Formspring Labeled for Cyberbullying
  2. MySpace Group Data Labeled for Cyberbullying

link: http://chatcoder.com/DataDownload

What it does

A user signs up, and then sends an SMS using Twilio API. When the server receives the text, its classified and forwarded to the intended recipient.

A D3 graph accompanies the hack that visualises the user messages and updates the colours (red/green) to show if a person has committed harassment.

How it Works

We are using an SVM and an LLDA (Labelled Latent Dirichlet Allocation).

For the SVM we are using a Bag-of-Words model.

For the LLDA, we using Google's list of banned words as labels. When we get a new message we get the topic distribution for the message, and classify the message as harassment based on the sum of the topic distributions.

Challenges we ran into

Improving the accuracy for the model. We discovered that ensemble learning had the best results after continuously testing with 10-KStratified Fold.

Accomplishments that we're proud of

F1 Score: 0.663871351995
Accuracy: 0.729411764706
Precision: 0.655128205128
Recall: 0.677898550725
Share this project:
×

Updates