CamoTransform

Presentation Poster
Division of labor

Title: You Can't See Me: style transfer for optimal camouflage

Who: Names and logins of all your group members. Katie O’Leary (koleary1), Mike Tapia (mtapia1), Harrison Cho (hcho57), Andrew Wang (awang270)

Introduction: What problem are you trying to solve and why?

Here, the goal is to implement a style transfer application of GANs between natural landscapes and camouflage patterns. A great number of both are accessible, but the methodology of selecting these patterns in an optimized and automated manner presents an interesting challenge. More specifically, what if interested stakeholders such as hunters or military personnel could photograph some landscape of interest and be given a printable pattern that best suits their purposes? In this way, the goal of the project opens the door to interested individuals and has the potential to reduce labor for organized operations. As an unsupervised learning model, this project will be training GANs using tyle-transfer methodology. Specifically, landscape imagery from diverse settings will be transferred onto web-scraped camouflage patterns. Ultimately, given a landscape, the GAN will select the best fitting camouflage pattern, transfer the style from the biome to that pattern, and adequately supply a color palette. While neither style transfer nor GANs are unique to this project, the novelty is found in the application The objective is to provide opportunity to create optimized low-visibility patterns from a moving target of landscape.

Related Work

Neural style transfer renders images that combine distinct stylistic features of a reference image with the content of an input. As research has expanded due to increased interest within the field, a natural dichotomy for neural style transfer (NST) has arisen between image-optimization and model-optimization based methods. General adversarial networks (GANs), in particular, have become an increasingly popular way to train NSTs and reconstruct images. In particular, a paper by Yang et al. details the task of combining text and source image to match specific shapes and textures for NST implementation. In particular, texturization enhances the variation of images such that NST-generated images present realistically.

Below are several github repositories that outline the process of style transfer: https://github.com/lengstrom/fast-style-transfer https://github.com/titu1994/Neural-Style-Transfer

Data

Scraping camouflage patterns from a database that indexes international camo patterns
For preprocessing, we will have to remove watermarks from the images, and then standardize color channels, image sizes, etc.
Landscape data from another dataset

If you’re using a standard dataset (e.g. MNIST), you can just mention that briefly. Otherwise, say something more about where your data come from (especially if there’s anything interesting about how you will gather it). How big is it? Will you need to do significant preprocessing?

-There are over a thousand camouflage images and multiple thousands landscape images that will correspond to these labels.

Methodology

The preliminary architecture we will be adopting is that of ‘CycleGAN’. The main goal of CycleGAN is to learn a bi-directional mapping between two domains: in our case, these will be the ‘styles’ of real landscape pictures, and camouflage patterns. The architecture consists of two generators: one to generate camouflage pictures from landscape images (G), and one to do the reverse (F); and two discriminators to judge the outputs of the generators compared to the real camouflage (Df)/landscape (Dg) images respectively.

Model training involves passing the real landscape images to the generator G, which will attempt to generate appropriate corresponding camouflage. A secondary generator F then attempts to reconstruct the original image from the generated camouflage. The first discriminator Dg takes in real examples of camouflage images, and attempts to differentiate between the real camouflage images and the generated ones, while the second discriminator Df attempts to discriminate between real and generated landscape images in the ‘reversed data flow’, where the second generator (F) uses real camouflage images to attempt to generate landscapes. The loss to minimize is the combination of the reconstruction loss of the generated images with the adversarial loss between the discriminators and their respective generators.

We are implementing an existing paper, but only as a baseline to test how well it works on generating camouflage images. While we are not as worried about mode collapse, as the ‘variation’ in the target domain of camouflage images is not as important as in other style transfer tasks, we are aware that CycleGAN may not be the most optimal architecture for the purposes of camouflage creation specifically: even the CycleGAN paper4 notes that ‘tasks that require geometric changes’ such as from dogs to cats were not very successful. Especially for our task, the reconstruction loss does not seem to be a very helpful metric, as we do not necessarily want to be able to restore the original landscape from the generated camouflage images to begin with; we absolutely plan on making architecture adjustments, such as potentially removing the second generator and discriminator, and looking to other papers and more successful GANs such as the recent StyleGAN2.

Metrics

GANs, especially for the purposes of style transfer, have unfortunately few quantitative measures of success. The CycleGAN paper itself uses Amazon Mechanical Turk to have real participants evaluate the results of the model, as well as semantic segmentation models to determine whether the generator is producing images that are recognized to be what they are intended to be by those models (eg. does the semantic segmentation model see a zebra in the horse → zebra generation task?). Unfortunately, neither of these will be very accessible options for more or less obvious reasons. One potential quantitative evaluation metric is to calculate the ‘effectiveness’ of the style transfer, versus the ‘coherence’ of the generated image. Coherence as a quality is also less desirable as a whole: our task is less of a specific ‘style transfer’ to camouflage images, and rather just generating camouflage images that are appropriate to an original landscape.

Because of this, effectiveness is definitely a measurement we can draw on to determine camouflage quality. We also want to quantify how ‘good’ the camouflage would be on the original landscape: there are existing methods to quantity the quality of camouflage based on metrics such as how well computer vision techniques like canny edge detection finds the actual border of the imposed camouflage. The actual evaluation of our camouflage quality then is not set, but will likely be the following: from the original landscape images, determine ‘test sections’ to replace with the generated camouflage, and calculate metrics such as ‘VisRat and DisRat’ disruption metrics to determine generation quality. Style transfer effectiveness metrics can determine how well the camouflage matches the style of the true camouflage images input.

Our goals are more qualitative in nature. Our base goal would be to generate images that maintain the correct color palette, without any particular aspirations regarding coherence, or rather incoherence. Our target goal would be to in addition score well on canny edge detection metrics, or otherwise form ‘ungeometric’ outlines under similar edge detection methods. Our stretch goals are to be able to replace any section of the original landscape with the camouflage and ‘make it difficult for the average person to immediately determine where that patch is’. Maybe we would have to create a separate computer vision model that tries to identify images that have camouflaged sections.

Ethics

Using deep learning in this context is attempting to optimize an otherwise laborious problem that involves both artistic and quantitative effort. Without machine learning, the process involves a concentrated search on target landscapes and using the most common color palettes and typical patterns to develop an adequate dress code for personnel. Here, we are hoping to reduce the burden on individuals by providing an automated way of concealing their location. At its best performance, this type of model could allow individuals to freely upload imagery of some location and get quality, printable camouflage designs as well. Thus, this problem is appropriate for machine learning because it reduces human effort, has a clear metric of success without being overly entangled in subjective decision making, but also provides an open marketplace for individuals to access optimized decision making processes.

In practice, certain applications have the potential to affect human lives substantially. The cost of a poor camouflage pattern when individuals are in perilous situations could lead hunters or soldiers to be at risk. Therefore, when designing the model, we should ensure that the training data is equitable across pattern representation so that no one biome or country preference is represented. Further, we should cross-validate performance so that our end result can be accompanied with some disclaimer (if not improvement). Because the main consumers of this product would be military or hunting oriented, ensuring a high success rate is important for these stakeholders and the tangentially associated actors (such as individual’s family, associated organizations that follow U.S. military decision making, etc.)

Division of labor

{Please see Project Media to division of labor table}