Neural networks are powerful computer vision techniques which can map one image to another. However, they are so effective that a new problem is rising; models are sensitive to image intensity heterogeneity. One such example are computer vision systems which fail to operate over images of people of color, women, or those with non-traditional hair colors.
The issue of neural network failure on these images lies in the dataset, as such models only will learn what they are given. That is, the lack of diversity in machine learning datasets leads to neural networks that only make decisions based on that data. If a neural network trains on a dataset in which all minorities are labeled as criminals, for example, then the neural network will overfit to this data and learn this mapping. This is obviously undesirable and has heavy ethical consequences.
StarGAN is a convolutional neural network model which permits style transfer from one set of categories to another set of categories. Its ability to change a person's face, hair, age, and skin color can be a first step forward towards improving diversity in already existent datasets. In tandem, we hope that data houses will strive to collect data which is more properly representative of the diversity we have on this planet.
What it does
To demonstrate our first steps to improving diversity in human face datasets, we apply StarGAN to a live webcam feed and encourage users to toggle their visual appearance, including hair color, gender, and age. We hope to use the wisdom of the crowd to help identify where and how our model fails to accurately apply style transfer to images. In this way, we can converge to a solution where StarGAN could transform a dataset limited in diversity to one which encourages good performance in a variety of scenarios.
How we built it
- Backend: Python server running StarGAN on GPU. Locate face with Haar Cascade, rough edge segmentation with blurred Canny operator, StarGAN transformation with the PyTorch framework.
Challenges I ran into
- Converting academic code to functions usable in real-time
- Maintaining compatibility across multiple operating systems
- Dealing with browser limitations on webcam streaming.
- Use of computer vision filters for traditional edge detection as preprocessing
- Dealing with noise artifacts in webcam capture
Accomplishments that I'm proud of
- Combining segmentation and adversarial transformation
What I learned
- How to use PyTorch!
- That OpenCV has great built-in filters like the Canny Operator
- Webcams are noisy
What's next for Style Transfer Fashion Model
- Improved preprocessing s.t. the StarGAN is crisper
- Continued training to enhance performance
- Research into efficacy of diversifying datasets
- Allow users to submit screenshots and written text to identify areas of improvement in this method
Choi, Yunjey, et al. "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Source of initial StarGAN implementation: https://github.com/yunjey/stargan/