Inspiration

The internet is filled with user-generated content, and it has become increasingly difficult to manage and moderate all of the text that people are producing on a platform. Large companies like Facebook, Instagram, and Reddit leverage their massive scale and abundance of resources to aid in their moderation efforts. Unfortunately for small to medium-sized businesses, it is difficult to monitor all the user-generated content being posted on their websites. Every company wants engagement from their customers or audience, but they do not want bad or offensive content to ruin their image or the experience for other visitors. However, hiring someone to moderate or build an in-house program is too difficult to manage for these smaller businesses. Content moderation is a heavily nuanced and complex problem. It’s unreasonable for every company to implement its own solution. A robust plug-and-play solution is necessary that adapts to the needs of each specific application.

What it does

That is where Quarantine comes in.

Quarantine acts as an intermediary between an app’s client and server, scanning the bodies of incoming requests and “quarantining” those that are flagged. Flagging is performed automatically, using both pretrained content moderation models (from Azure and Moderation API) as well as an in house machine learning model that adapts to specifically meet the needs of the application’s particular content. Once a piece of content is flagged, it appears in a web dashboard, where a moderator can either allow or block it. The moderator’s labels are continuously used to fine tune the in-house model. Together with this in house model and pre-trained models a robust meta model is formed.

How we built it

Initially, we built an aggregate program that takes in a string and runs it through the Azure moderation and Moderation API programs. After combining the results, we compare it with our machine learning model to make sure no other potentially harmful posts make it through our identification process. Then, that data is stored in our database. We built a clean, easy-to-use dashboard for the grader using react and Material UI. It pulls the flagged items from the database and then displays them on the dashboard. Once a decision is made by the person, that is sent back to the database and the case is resolved. We incorporated this entire pipeline into a REST API where our customers can pass their input through our programs and then access the flagged ones on our website.

Users of our service don’t have to change their code, simply they append our url to their own API endpoints. Requests that aren’t flagged are simply instantly forwarded along.

Challenges we ran into

Developing the in house machine learning model and getting it to run on the cloud proved to be a challenge since the parameters and size of the in house model is in constant flux.

Accomplishments that we're proud of

We were able to make a super easy to use service. A company can add Quarantine with less than one line of code.

We're also proud of adaptive content model that constantly updates based on the latest content blocked by moderators.

What we learned

We learned how to successfully integrate an API with a machine learning model, database, and front-end. We had learned each of these skills individually before, but we has to figure out how to accumulate them all.

What's next for Quarantine

We have plans to take Quarantine even further by adding customization to how items are flagged and taken care of. It is proven that there are certain locations that spam is commonly routed through so we could do some analysis on the regions harmful user-generated content is coming from. We are also keen on monitoring the stream of activity of individual users as well as track requests in relation to each other (detect mass spamming). Furthermore, we are curious about adding the surrounding context of the content since it may be helpful in the grader’s decisions. We're also hoping to leverage the data we accumulate from content moderators to help monitor content across apps using shared labeled data behind the scenes. This would make Quarantine more valuable to companies as it monitors more content.

Share this project:

Updates