Texture Synthesis with CNN

Project Poster

Introduction

Texture synthesis allows for the creation of realistic and visually appealing images, which is an essential component of computer graphics, video games, and animation. The importance of this problem lies in the ability to create realistic, high-quality textures while maintaining innovation and diversity in synthesized images. For this project, we implemented the Texture Synthesis Through Convolutional Neural Networks and Spectrum Constraints paper by Liu, et. al., which treats texture synthesis as an optimization problem. This approach offers a significant improvement over the existing CNN-only approaches. By using spectrum constraints, the model is able to capture both fine-scale details and coherent large-scale structures even in complex texture images, while minimizing computational cost. As a result, the implementation of this model expands the capabilities of texture synthesis, allowing for higher-quality and more diverse and realistic textures that can be used for computer graphics and other visual applications.

Methodology

Model Architecture

The architecture uses two VGG19-like networks in conjunction - the first of which is used to produce ideal "feature maps" and the other uses these ideal feature maps to turn a random noise image into an image consisting of features that are similar to the texture exemplar. As such, the first network is initialized with pre-trained VGG19 weights and takes in the texture exemplar. The feature maps generated from both these networks are tracked. To optimize the white noise image passed to the second network, the model tries to minimize the gap between these two sets of feature maps via a weighted sum of Gram matrix losses computed from each convolution block and uses the L-BFGS algorithm to synthesize the model’s output. A texture is synthesized during this process of closing the gaps between the "ideal" feature maps from the first network and the "predicted" feature maps from the second network. Furthermore, using Fourier transforms, a spectrum loss is calculated in addition to the Gram loss to further constrain the optimization process and improve synthesis.

We used PyTorch to build the overall model and utilized texture images from the Descriptive Textures Dataset, introduced in the “Describing Textures in the Wild” paper to test our model.

Results

We noted that the texture synthesizer works really well on textures that are filled to the brim with arbitrary patterns (eg. bubbles, cracked surfaces, blotchy images). For others involving a more natural texture (eg. sparse leaves, folds in clothing) or fixed repeating patterns (eg. chessboard), the model's loss plateaus at a higher loss value. Among the textures we've tested, we saw that the loss starts to drop really slowly around 250-300 epochs - hence the loss value at these epochs is a good indicator of the final result being a good synthesis or not. We found that a loss value below the 1500s range gave good texture synthesis results. In the results below, the first image is the texture exemplar, and the second image is the synthesized texture. We tracked the model’s loss at the end of each epoch.

The first image represents the texture exemplar and the second image represents the synthesized texture.

Epoch 500, Loss 1514.53

Epoch 300, Loss 203.703

Epoch 300, Loss 97.055

Epoch 500, Loss 4819.45

We also conducted a human survey to evaluate our model's performance. First, participants were asked to rate how well a set of 5 synthesized images captured the details of the exemplar images on a scale of 1 (worst) to 10 (best). The average rating was 8.05. Second, participants were asked to determine whether a given image was real or synthesized from a set of 5 images. The results showed that 33.9% of the time participants identified the image as synthesized, 34.8% of the time as real, and in 31.3% of cases, they were unsure. The survey demonstrates that our model is effective at synthesizing images with a high level of detail that is often indistinguishable from real exemplars.