PHP: PHP Hackathon Hackers Preprocessor

Inspiration

Almost all moderation tools work to moderate after a post has been placed. For large groups such as Hackathon Hackers this can be a problem in terms of allowing banned users back into the group or allowing new users to post. Usually this is remedied via waiting for approval by admin, but this creates more lag.

If there was a way to predict how controversial a post might be based on it's content, we could create automated moderators that could offer advice to make posts better recived in a given group.

What it does

By analyzing the posting history of any given facebook group, parsing the text, the comments, the likes, and the reactions to those posts, we can train a bot using Bayes probabilities at the how on a scale of 0 to 1 each post would score given categories like controversial, success rate, negative feedback, etc.

How we built it

We used facebook Graph API to fetch posts from Hackathon Hackers. We then trained a Naive Bayes (independent probability model) bot on a set of data marked popular / not popular by a simple 'likes' algorigthm. We then ran it through a set of test data for which we were able to analyze the accuracy at which it correctly identified good posts from bad.

Challenges we ran into

Turning python into a decent front end

Accomplishments that we're proud of

Given a varied dataset, we were scoring 80-100% accuracy at determining the the likelihood of a popular post.

What we learned

Naive Bayes modelling and machine learning

What's next for PHP

-Add more categorical options and nicer GUI features -Grab data with more than 1 thread to bypass facebook's time limit. Train on thousands of documents -Use full Bayes (words depending on each other instead of independent modelling) for adding comments analysis and/or larger n-grams (currently 1 word gram) to analyse phrases to make more sense in terms of natural linguistics