Species Generation CGAN

Project poster

Who: Zach Miller, Caleb Ellenberg, and Rohan Kher

Introduction: We are interested in generating phenotypically accurate images of species of various animal types. Our main model will be a conditional GAN (CGAN) designed to take as input a species name and output a realistic image of that species. We plan to train our CGAN with a custom loss function based on a separately trained species classification model. If the classifier model can correctly identify the CGAN generated images, loss will be low, and vice versa. We plan to train on a number of different animal types in the same model to test performance on that more difficult task. Our goal is for the final model to be able to produce images that a human can recognize as matching the desired species.

Related work: In March of this year, Google released an open-source model called SpeciesNet, which is “designed to identify animal species by analyzing photos from camera traps.” The intended use case of this model is for wildlife biologists to monitor biodiversity and assist conservation methods. It was trained on “over 65 million publicly available images and images from organizations like the Smithsonian Conservation Biology Institute, the Wildlife Conservation Society, the North Carolina Museum of Natural Sciences, and the Zoological Society of London.” This model, however, focuses only on classification through architectures such as You Only Look Once (YOLO). Sources: Article, Google Press Release, Github, Paper.

Data: We plan to use a number of data sets with images of species and identification labels. We have found suitable datasets for trees, birds, and insects. We anticipate using a gigabyte or two of data, including about 10,000 individual images. We will add more types of animals as time and computing capacity permits. In order to train our model on multiple datasets with different kinds of animals at once, we will likely need to merge these datasets into one, requiring some pre-processing and relabeling to work properly. Also, some of the datasets don’t have mappings from numerical identifications to actual species (the bird dataset for example), so we’ll need to manually identify those species.

Methodology: We will start by training the classifier model for the custom loss, which we plan to do by fine-tuning the EfficientNetB0 model (or similar) to recognize the species types we are interested in. In initial testing, we tried training a custom classifier from scratch, but performance was quite poor. Since the classifier is critical to the correct learning of the rest of our model, we think it will be better to fine tune it to get better performance.

Our CGAN will be trained from scratch. We plan to experiment with various model configurations to see what performs best, but the general set up of 2D convolutions with our custom loss functions will remain. Training the CGAN will definitely be the most difficult part of the project, as we will have to handle limited computing power and potential training collapse. However, in initial testing we have gotten some positive results, so we think it should be possible to get at least decent performance. Below is an image from one of our test runs on the bird dataset, and you can see that the images are starting to resemble birds:

Metrics: We plan to experiment with a number of model set ups with both training on one type of animal at a type and training on a combined dataset. We will also experiment with larger, more computationally intensive models vs. simpler, faster models.

We will measure the performance of our CGAN model against the classifier. Our base goal is to get 50% of generated images correctly classified. We’re unsure of exactly how difficult this will be, but if possible we would like to get 75% accuracy. Our most ambitious goal would be 90% accuracy, but we think it is highly unlikely that will be possible.

Ethics: Like Google’s model, our model, if done at a larger scale, would have a potential to impact spaces around conservation and biodiversity. The generator could be used to create currently non-existent data for rare species, for instance in specific poses, lighting conditions, environments, etc. It could also improve camera trap systems, simulating animals to test the system. These are just two of many examples for how our model can impact conservation and biodiversity strategies. However, a concern is that the fake images that the generator creates could be used by a bad actor for misleading purposes, since the goal is that the generated images would look indistinguishable from real images. If our model, like Google's, could affect conservation or biodiversity strategies, then major stakeholders of this model include the animals and plants themselves. The model failing could lead to misguided strategies and biodiversity loss.

Division of Labor Pre-processing data/combining datasets: Zach Fine-tuning the classification model: Caleb and Rohan Training the CGAN: Caleb, Zach, and Rohan

Built With

python
tensorflow

Updates

Rohan Kher started this project — Nov 23, 2025 06:28 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.