Inspiration
We are currently living in a "digital arms race." Generative AI models have evolved so rapidly that they have effectively broken the boundary of truth. Recent studies show that the human eye and ear fail to identify deepfakes roughly 50% of the time—essentially a coin flip.
We were inspired to build TrueLens because relying on human intuition is no longer a viable security strategy. We wanted to build a unified forensic engine that acts like an "X-ray" for digital media, looking beyond the surface pixels and sound waves to find the mathematical fingerprints that AI generators cannot hide.
What it does
TrueLens is an omni-modal AI detection ecosystem that analyzes four distinct types of media:
Images: Detects diffusion artifacts and upsampling noise.
Video: Analyzes temporal inconsistencies and physics violations between frames.
Audio: Identifies robotic phase discontinuities in voice clones.
Text: Scans for statistical patterns like low perplexity and lack of "burstiness." It provides users with a simple dashboard where they can upload files and receive an immediate "Real vs. Fake" probability score.
How we built it
We adopted a Multi-Modal Ensemble Architecture, where specialized neural networks handle different data types, all feeding into a central FastAPI backend.For Images:We built a dual-stream network. One stream uses a fine-tuned EfficientNet-B0 for visual semantics, while the second stream applies a Discrete Cosine Transform (DCT) to analyze the frequency domain. This allows us to spot the "checkerboard" patterns typical of upsampling.
The 2D DCT is calculated as follows: $$F(u,v) = \alpha(u)\alpha(v) \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} f(x,y) \cos\left[\frac{(2x+1)u\pi}{2N}\right] \cos\left[\frac{(2y+1)v\pi}{2N}\right]$$
For Video:We utilized 3D Convolutional Neural Networks (3D CNNs) to track optical flow over time ( t ), flagging objects that morph or flicker in ways that violate physics.For Audio:We converted waveforms into Mel-Spectrograms and trained a classifier to spot the subtle metallic artifacts left by vocoders.
The Stack: The core engine is built in Python/PyTorch, served via FastAPI, with a responsive frontend built in Streamlit.
code block example
Example of the frequency domain preprocessing
Python
import numpy as np
from scipy.fftpack import dct
def get_dct_img(img):
return dct(dct(img.T, norm='ortho').T, norm='ortho')
Challenges we ran into
The "Compression" Problem: Social media platforms aggressively compress files (JPEG/MP3). This acts as a low-pass filter, often destroying the high-frequency artifacts we hunt for. We solved this by training our models with aggressive Data Augmentation (Gaussian blur, quality degradation) to make them robust against low-quality inputs.
Temporal Compute Costs: Analyzing video frame-by-frame was too slow. We overcame this by implementing a Keyframe Extraction Algorithm that only analyzes significant changes in the scene, reducing processing time by 80%.
Accomplishments that we're proud of
94% Accuracy on Midjourney v6: Achieving high reliability against the latest, most advanced image generators.
Unified Pipeline: Successfully integrating video, audio, and image processing into a single, stable application without memory overflows.
Real-Time Visualization: We are particularly proud of the frontend visualization that allows users to see the "Heatmap" of where the AI manipulation occurred.
What we learned
94% Accuracy on Midjourney v6: Achieving high reliability against the latest, most advanced image generators.
Unified Pipeline: Successfully integrating video, audio, and image processing into a single, stable application without memory overflows.
Real-Time Visualization: We are particularly proud of the frontend visualization that allows users to see the "Heatmap" of where the AI manipulation occurred.
What's next for TrueLens - AI Content Audit
Browser Extension: We plan to build a Chrome extension that automatically flags AI images on social media feeds in real-time.
Blockchain Integration: Developing a "Proof of Human" certification where verified real content is hashed onto a blockchain.
Enterprise API: Scaling our backend to handle high-throughput requests for media platforms and news agencies.
Log in or sign up for Devpost to join the conversation.