Best model out there

🧠 Project Story: Deepfake Duel – Truth vs. Trickery

Team: DataCrafter

🎯 Inspiration

In a world where synthetic media is growing at an unprecedented pace, the line between real and fake is increasingly blurred. Inspired by the societal implications of deepfakes—ranging from misinformation to digital identity theft—we wanted to build a solution that doesn't just detect fake images, but also understands what they represent. Our goal was to combine robust classification with real/fake detection into a single, intelligent model.

💡 What it does

Our solution, KANVisionLSTM_FFT, is a dual-headed deep learning model that:

Detects whether an image is real or fake using spatial + frequency cues.
Classifies the content into one of three categories: human_faces, animals, or vehicles.

It uses an innovative combination of:

RGB + FFT image features
CNN backbone (Xception)
Bi-directional LSTM with attention
Dual output heads for multitask prediction

🛠️ How we built it

Dataset: We used the ArtiFact_240K dataset, structured into real/fake categories across three object classes.
Architecture:
- 1x1 convolution to project RGB+FFT inputs
- Xception backbone to extract features
- LSTM + Attention to model token dependencies
- Real/Fake and Class prediction heads
Frameworks: PyTorch, timm, torchvision
Training: 3 epochs on a CUDA GPU with Adam optimizer and composite loss function (BCE + CrossEntropy)

🧗 Challenges we ran into

Balancing real/fake detection and multi-class classification in a single model was tricky.
Integrating FFT required managing tensor dimensions and GPU memory constraints.
Working with large pretrained backbones (like Xception) demanded significant computational resources.
Designing an attention mechanism that enhances both tasks equally was a key tuning challenge.

🏆 Accomplishments that we're proud of

Successfully fused spatial and frequency-domain features for better fake detection.
Achieved strong accuracy on both real/fake and class prediction tasks.
Developed a visually interpretable model flow, ideal for future explainability integrations.
Clean, modular, and reproducible code with test-ready output in CSV format.

📚 What we learned

The power of combining frequency-domain data with spatial CNN features.
Attention and LSTM architectures still offer value, especially when applied post-CNN.
Pretrained backbones (via timm) accelerate development and offer exceptional feature extraction.

🔮 What's next for Deepfake Duel: Truth vs. Trickery

Expand to video frame analysis for real-time fake detection.
Integrate explainable AI tools like Grad-CAM for transparency.
Extend dataset coverage to include other object classes and real-world generative attacks.
Wrap the model into a web app or browser extension to flag AI-generated images in the wild.