Inspiration

The inspiration for this project came from how quickly AI-generated faces have become indistinguishable from real ones on social media. On platforms like Instagram Reels, short videos spread rapidly, and viewers naturally assume the faces they see belong to real people. However, modern GANs and diffusion models can now generate highly realistic human faces, making it easy to create fake identities, impersonate others, or spread misinformation.

We realized that human judgment alone is no longer enough. This motivated us to build a tool that can automatically analyze faces and help answer a critical question: is this face real, or was it generated by AI?

What it does

Our project is an AI-powered face detector that analyzes faces appearing in Instagram Reels and predicts whether they are real or AI-generated. The system extracts frames from videos, isolates the facial region, and outputs a probability score indicating whether the face comes from a real camera image or from an AI model such as a GAN or diffusion-based generator.

By focusing specifically on faces rather than full images or videos, the detector targets the most common and impactful use case of synthetic media: fake or artificial human identities.

How we built it

We built an end-to-end machine learning pipeline designed for social media content. First, frames are extracted from Instagram Reels at regular intervals. We then perform face detection and crop the facial region to remove background information. The cropped faces are resized, normalized, and augmented to simulate real-world conditions such as compression and lighting variation.

For the model, we used EfficientNet-V2 as our backbone architecture due to its strong performance in texture and fine-detail analysis. The model outputs class probabilities using a softmax function:

[ \text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}} ]

Training was done using cross-entropy loss:

[ \mathcal{L} = -\sum_{i=1}^{K} y_i \log(\hat{y}_i) ]

The system was implemented in Python using PyTorch and fine-tuned on a dataset containing both real faces and AI-generated faces from multiple sources.

Challenges we ran into

One major challenge was dealing with high-quality AI-generated faces. Many faces appear visually flawless, making simple visual cues unreliable. This forced us to rely on frequency-domain and texture-based artifacts rather than obvious visual errors.

Another challenge was dataset balance and generalization. If the model sees too many examples from a single generator, it can overfit and fail on new AI models. We had to ensure diversity in the training data so the model learned fundamental properties of synthetic images.

We also faced issues with face detection and cropping, where imperfect crops sometimes removed important boundary information or introduced artifacts that affected predictions.

Accomplishments that we're proud of

We are proud of building a complete, working AI detection pipeline from data preprocessing to model inference. The model is able to detect subtle artifacts that are not obvious to human viewers, even in high-quality AI-generated faces.

We also successfully designed the project to be modular and extensible, with a working scaffold for future deployment as a browser extension. Most importantly, we tackled a real and growing problem at the intersection of AI, media, and trust.

What we learned

Through this project, we learned that AI-generated images still leave behind evidences, even as generation quality improves. We discovered that texture, noise, and frequency patterns are often more informative than semantic features like facial expression.

We also learned that building AI detectors is often harder than building AI generators, since detectors must generalize to unseen and future models. Clean data pipelines, careful preprocessing, and balanced datasets proved just as important as model choice.

What's next for DeepSight

Next, we plan to extend the system from single-frame analysis to full video-based detection, using temporal inconsistencies across frames. We also want to add explainability tools, such as heatmaps, to show users which regions influenced the model’s decision.

Our long-term goal is to deploy the detector as a real-time Instagram browser extension, helping users verify authenticity directly as they consume content and restoring trust in digital media.

Share this project:

Updates