Iris | Devpost

Inspiration

With the explosive growth of TikTok, millions of users, including children, share videos and create stories daily. However, the current system struggles to detect and redact sensitive information in real-time, often missing crucial details that can lead to privacy breaches and inappropriate content exposure. Most of the time, videos containing personally identifiable information, so-called PII, such as addresses, credit card details, and IDs, are taken down only after users have reported them, but by then, the information has already been leaked.

This is where Iris steps in, offering an advanced solution that scans TikTok videos for sensitive data and offensive language before they are published.

What it does

Real-time Scanning: Iris automatically analyzes your TikTok videos for any sensitive information or offensive language before they are published. This feature ensures that your content is reviewed thoroughly, allowing you to address any potential issues before your video goes live, thereby protecting your privacy and maintaining the integrity of your content.
Content Violation Report: Iris provides clear content violation reports that highlight any detected sensitive or offensive content in your videos. These detailed reports help you understand exactly what needs to be addressed, making it easier to ensure your videos comply with community guidelines and protect user privacy.
Content Censorship: Iris also facilitates quick and efficient censorship of inappropriate words and sensitive information. It does this by beeping out words in the audio and blurring text in the video, returning a sanitized version of your video that is ready for safe publishing on TikTok. This ensures that your content remains appropriate and secure, providing peace of mind that your videos are suitable for sharing with your audience.

With Iris, we anticipate a significant reduction in privacy breaches and inappropriate content on TikTok. Our solution not only protects user privacy but also ensures compliance with TikTok’s guidelines, fostering a safer and more enjoyable platform for everyone.

How we built it

The process starts with users uploading videos to Iris through the Streamlit interface. These videos are securely stored in Streamlit’s cloud storage. Once uploaded, Iris leverages OpenAI Whisper for accurate speech-to-text conversion and timestamp generation. EasyOCR detects and extracts text within video frames.

Next, we use Google Gemini to identify inappropriate content within the transcribed text and extracted frame text, flagging offensive language and PII.

Iris then generates a content violation report, providing a compliance rating and detailing the detected sensitive and offensive content. This report helps users understand the privacy and compliance status of their content.

From here, Iris helps users censor the video or audio content accordingly and returns the sanitized version to the user, ready for safe publishing. We modify the audio by beeping at the locations of detected inappropriate words based on the timestamps, and blur the text in the video frames where inappropriate words were detected. The processed video, with beeped audio and blurred text, is then returned to the user, ready for safe publishing.

Use Cases

Corporations, brands, and influencers can prevent accidental leakage of confidential information in promotional or internal videos, adhering to corporate data protection standards and policies.
TikTok content creators can protect the privacy of bystanders in public videos, avoiding potential legal issues.
Journalists and reporters can safeguard sensitive information when covering news stories that involve private individuals or confidential locations, maintaining the anonymity of sources and interviewees who wish to remain unidentified.
Parents and guardians can ensure that videos created by children are free from offensive language and sensitive information.

Challenges we ran into

Accurate text extraction from video is very challenging as it requires a trade-off between quality and speed. We had to optimize the process while also ensuring a level of quality. Additionally receiving regular and easily parsable information from the Gemini API was difficult as it required careful planning and pruning. Blurring was also a bit of a challenge as we had to get feedback from our API, thus creating a loop workflow that needed a great deal of accuracy.

Please note that the latency for censoring inappropriate content in a 10-second video (4K resolution at 60 frames per second) is approximately 3-5 minutes. We appreciate your patience during this process.

Accomplishments that we're proud of

We are proud to have successfully implemented the general workflow, which was a very exciting milestone for us! One of the highlights was achieving highly accurate blurred-out portions of our videos, ensuring sensitive information was effectively concealed. Additionally, the accuracy of our transcribed audio exceeded our expectations, which was crucial for our project’s success. Creating a coherent architecture that integrated all components smoothly was both rewarding and an invaluable learning experience.

Despite living in different time zones and balancing our internships simultaneously, we managed to assemble our team and start working on this project just one week before the deadline. We only had nighttime to collaborate, but we made it work by hopping on calls whenever one of us needed assistance. This commitment and teamwork allowed us to overcome the challenges and deliver a functional and effective solution.

What we learned

During this process, we learned that to achieve faster latency, we need more compute power. Dedicated GPUs are essential for accurately processing the videos in a timely manner. Additionally, we discovered that closed-source models like EasyOCR and Google Gemini AI have a significant disadvantage: they cannot be improved or customized. To combat this, we focused on enhancing our post-processing and pre-processing techniques, allowing us to maintain high accuracy and efficiency despite the limitations of the models.

What's next for Iris

Integration Plans

Currently, for our demo, we are using Streamlit for the front end, but the long-term plan is to integrate Iris directly into TikTok’s system as an additional layer after users upload their videos and before publishing. We are excited about the opportunity to collaborate with TikTok to make Iris an integral part of its ecosystem, enhancing the safety and quality of content shared by millions of users worldwide.

Cloud Architecture Integration:

We aim to fully integrate our workflow with a cloud architecture, leveraging online GPU compute and reliable database infrastructure to enhance performance and scalability.

User Feedback Loop and Autonomy

We want to provide users with more control over their videos by allowing them to choose which portions to blur, adjust the opacity of the blur mask, and add options for face and scenery blurring. This feedback loop is crucial for users to feel secure and confident in how their data is managed. Additionally, we plan to integrate advanced face detection to blur faces of strangers automatically, ensuring privacy. Reducing latency and increasing precision will also be priorities, allowing users to decide exactly how blurry they want their content to be.