Team Members
- Francis Gonzalez (mgonza49)
- Ashton Glover (agglover)
- Alex Huang (ahuang73)
- Hannah Zhang (hzhan196)
Final Writeup:
Introduction
We chose to implement an existing paper: “Detection of AI-Created Images Using Pixel-Wise Feature Extraction and Convolutional Neural Networks.” This paper outlines the creation of a model for detecting AI generated images using convolutional neural networks as well as feature extraction techniques such as PRNU (photo response non-uniformity) and ELA (error level analysis). Specifically, this paper was interested in differentiating between real images and photo realistic AI-generated images. We chose this paper due to the relevance of emerging technologies in generative AI, and we were interested to see just how “realistic” these AI generated images are, or if it is possible to apply deep learning techniques to differentiate between real and generated images. This is a classification problem where we are classifying images as either real or AI generated.
Related Work
AI-Generated Image Detection using a Cross-Attention Enhanced Dual-Stream Network
This paper is referenced by the author’s of our chosen paper, and it lays out a slightly more complex method for detecting AI-generated images. This paper discusses using a dual-stream network consisting of a residual stream and a content stream. The residual stream utilizes the Spatial Rich Model to extract texture information from the images while the content stream captures low frequency forged traces that the residual stream might have missed. A cross multi-head attention mechanism is then used to exchange information between the two streams. This paper tackles the same task as the paper that we have chosen, just utilizing different methodologies to achieve the result.
Data
We will be using a dataset consisting of two different groups: AI generated and real camera images. The AI generated images will come from platforms like DALL E, Stable Diffusion, and OpenArt. We reached out to the authors of the paper, and we are hoping to utilize the same dataset that they used when implementing the paper to avoid curating the dataset ourselves.
Methodology
PRNU Extraction (denoising): Idea behind this is that due to manufacturing imperfections, cameras will have a unique sensor noise pattern. In theory, since AI generated images don’t originate from a physical sensor, they should lack such consistent or identifiable noise/PRNU patterns. We can find the PRNU pattern for an image by denoising the image and subtracting it from the original image.
ELA Error Level Analysis: An ELA pattern is used to detect irregular distributions of noise, which is normally used to detect image editing. The idea behind this is that unaltered and genuinely captured photos should compress into JPEGS in a way that leads to gradual imperfections or losses of detail in certain areas that reflects the natural degradation of data. Due to the nature of AI images coming from training with many JPEG-coded photographs, AI generated images would instead display abrupt shifts in ELA values which reflect unnatural manipulation.
CNNs: This stage is done largely in the same way that we have done in class thus far. This specific implementation uses Stochastic Gradient Descent with Momentum (SDGM) which aims to minimize the mean squared error of the obtained output from the desired one. For this specific problem, this process is implemented through five separate stages, including pooling, normalization, dense, and softmax. The obtained results are then compiled into a confusion matrix.
Metrics
Authors applied a high pass filter to results from using the PRNU and ELA methods in order to more clearly see if there are discernable patterns between the AI and real images. These results were then passed into a CNN in order to classify the data, gauging success based on the accuracy produced after training on 100 epochs.
- Base Goal: implementing the CNN
- Target Goal: implementing both PRNU and ELA and the CNN
- Stretch Goal: implementing possibly LBPs alongside everything else
Ethics
Recent years have facilitated debates regarding the ethics of AI image generation, especially with risks related to Deepfakes. Many of the concerns regarding Deepfakes are (rightfully) related to potential pornographic use, as the (relatively) new technology allows for realistic fake images to be generated from web sources. This technology can pose many more risks, however, such as forging evidence for legal or defamatory purposes. In each of these cases, there is the risk of AI image generation causing significant harm towards other people, and the release of applications such as Sora have made this technology much more accessible to the general public. Many lives may be (and already have been) ruined by this technology. Our project, in identifying AI-generated images, works to somewhat mitigate this risk by identifying fake images and videos. While this does not solve the issue to any extent—the AI-generated images and videos being on the internet will be an issue regardless of whether or not they are identified—it at least is a starting point for the future regulation of such softwares.
Division of Labor
- Ashton and Hannah: CNN model + PRNU technique
- Alex and Francis: ELA technique + preprocessing + results
Log in or sign up for Devpost to join the conversation.