Inspiration

With recent booms in AI development, deepfakes have been getting more and more convincing. Social media is an ideal medium for deepfakes to spread, and can be used to seed misinformation and promote scams. Our goal was to create a system that could be implemented in image/video-based social media platforms like Instagram, TikTok, Reddit, etc. to warn users about potential deepfake content.

What it does

Our model takes in a video as input and analyzes frames to determine instances of that video appearing on the internet. It then outputs several factors that help determine if a deepfake warning to a user is necessary: URLs corresponding to websites where the video has appeared, dates of publication scraped from websites, previous deepfake IDs (i.e. if the website already mention the words "deepfake"), and similarity scores between the content of the video being examined and previous occurrences of the deepfake. A warning should be sent to the user if content similarity scores between it and very similar videos are low (indicating the video has been tampered with) or if the video has been previously IDed as a deepfake by a different website.

How we built it

Our project was split into several main steps:

a) finding web instances of videos similar to the video under investigation We used Google Cloud's Cloud Vision API to detect web entities that have content matching the video being examined (including full matching and partial matching images).

b) scraping date information from potential website matches We utilized the htmldate python library to extract original and updated publication dates from website matches.

c) determining if a website has already identified the video as a deepfake We again used Google Cloud's Cloud Vision API to determine if the flags "deepfake" or "fake" appeared in website URLs. If they did, we immediately flagged the video as a possible deepfake.

d) calculating similarity scores between the contents of the examined video and similar videos If no deepfakes flags have been raised by other websites (step c), we use Google Cloud's Speech-to-Text API to acquire transcripts of the original video and similar videos found in step a). We then compare pairs of transcripts using a cosine similarity algorithm written in python to determine how similar the contents of two texts are (common, low-meaning words like "the", "and", "or", etc. are ignored when calculating similarity).

Challenges we ran into

Neither of us had much experience using Google Cloud, which ended up being a major tool in our project. It took us a while to figure out all the authentication and billing procedures, but it was an extremely useful framework for us once we got it running.

We also found that it was difficult to find a deepfake online that wasn't already IDed as one (to test out our transcript similarity algorithm), so our solution to this was to create our own amusing deepfakes and test it on those.

Accomplishments that we're proud of

We're proud that our project mitigates an important problem for online communities. While most current deepfake detection uses AI, malignant AI can simply continually improve to counter detection mechanisms. Our project takes an innovative approach that avoids this problem by instead tracking and analyzing the online history of a video (something that the creators of a deepfake video have no control over).

What we learned

While working on this project, we gained experience in a wide variety of tools that we've never been exposed to before. From Google Cloud to fascinating text analysis algorithms, we got to work with existing frameworks as well as write our own code. We also learned the importance of breaking down a big project into smaller, manageable parts. Once we had organized our workflow into reachable goals, we found that we could delegate tasks to each other and make rapid progress.

What's next for Deepfake ID

Since our project is (ideally) meant to be integrated with an existing social media app, it's currently a little back-end heavy. We hope to expand this project and get social media platforms onboard to using our deepfake detection method to alert their users when a potential deepfake video begins to spread. Since our method of detection has distinct advantages and disadvantages from existing AI deepfake detection, the two methods can be combined to create an even more powerful deepfake detection mechanism.

Reach us on Discord: spica19

Built With

Share this project:

Updates