Everybody at some point wishes to unsee something they'd accidentally seen, and would rather not see, be it violence, suggestive or adult content, or even in the extreme case, bright flashing effects which can trigger photosensitive epilepsy. By applying machine learning and computer vision, we thought we could make seeing controversial content a conscious choice, instead of something that you can be inadvertently exposed to.
What it does
MeerKat is the world's first user for detecting and filtering violent, suggestive and adult content in real-time on web videos, as well as content that could be triggering to users with photosensitive epilepsy. By selectively applying normal blurring for scenes with inappropriate content and temporal blurring (essentially an extreme version motion blur) for scenes which have rapid brightness and hue transitions, a large amount of visual content can be made safe for children and epilepsy sufferers automatically, in real-time.
How we built it
MeerKat proxies the videos viewed by its users, as they are watching, and applies a variety of filtering strategies based on a combination of AI (for inappropriate and explicit content) and traditional computer vision algorithms (for rapid brightness transitions). The AI detection algorithm is implemented on a multi-cloud architecture, using an ensemble of AWS Rekognition and Google Cloud Vision APIs for simultaneously lower false positives and false negatives. The filtered video stream is then sent back to the user, where it plays seamlessly in their existing web context.
Challenges we ran into
Initially, we tried to only use Google Cloud Vision only, and while it was extremely effective at detecting inappropriate and explicit content, it had a high false-negative rate for detecting violent content, which is a challenging problem when only presented with a single still frame from a video. When we tried AWS Rekognition, we found it much more adept at identifying violent content and also provided more granular confidence outputs, which helped when tuning the balance between false positive and false negative rates of the algorithms.
Accomplishments that we're proud of
We were happy to able to effectively develop a real-time web video filtering system that generalizes well to unseen and untested videos, without having to fine-tune parameters for individual videos by hand, all in the short span of 24 hours, with the integration of 2 different cloud provider APIs, complex video encoding and decoding pipelines, and all running faster than real-time on affordable, commodity hardware.
What we learned
- Pretrained Vision models from cloud service providers can be extremely effective for the relatively low integration effort
- Cloud compute can help significantly when developing performance-sensitive code, as it offers a stable platform for benchmarking compared to laptops which can have significant thermal, and thus performance fluctuations.
- Integrating with/modifying existing software for our explicit purpose is often significantly faster both in terms of developer time and performance. If someone has already written it, there's no reason not to use it! Really evident in the limited time and "mental effort" at a hackathon.
What's next for Meerkat
In the short time we had at HackMIT, we focused mainly on the visual aspect of the video. With more time, we would like to attempt a more generalized real-time video "cleaner", for the words spoken in the video and written on the page, for a safer, more inclusive online environment for all.