Coming from distinct areas of expertise (iOS vs. machine learning), we wanted to build a fun app that would push our technical and collaborative abilities. We started by toying around with the idea of image style translation (e.g. stylizing a photo as a Monet painting) and came across an online demo of converting line drawings to handbags and shoes. However, we saw much more potential for this idea as an iOS app with retro Etch-A-Sketch controls.
What it does
- Draw anything by rotating the Etch-A-Sketch knobs. (Or cheat and use your finger directly on the drawing surface.)
- Hit "Submit" when you are ready to fill in your image.
- A neural network running on Microsoft Cognitive Services determines whether your drawing is more likely to be of a cat or of a dog.
- A set of two trained-from-scratch neural networks running on a Microsoft Azure VM attempts to convert your image into an image of either a cat or a dog, based on the outcome of the previous step.
How we built it
The iOS app was written in Swift and played heavily off of Apple's CoreGraphics framework and touch recognition to manipulate what the user sees on the screen as well as what gets passed behind the scenes to the API.
The brains of the image conversion relies on lots of machine learning. The background work first involved curating labeled datasets for cat and dog images. We fed the images through a Canny Edge Detector to create "sketches" of each image.
The first machine learning model is a neural network binary classifier trained to determine whether a sketch is a cat or a dog. This was easy to do thanks to the Microsoft Cognitive Services Custom Vision service. We uploaded roughly 500 images each of cats and dogs that we had fed through the Canny edge detector.
The second machine learning model was much more involved. First, we modeled our pipeline based on a TensorFlow port of the pix2pix image style transfer deep neural network. As we could not find any pre-trained models for converting image "edges" into actual images, we had to run the training process ourselves. We trained separate GANs for cats and dogs, running training for ~20 epochs (~1 hour) each. Thankfully, a lot of the infrastructure for setting up the data preprocessing and training parameters was already available on the pix2pix-tensorflow GitHub repo.
Lastly, we had to link each of the three parts (iOS app, cat vs. dog classifier, and our edges to full image generative model) via URL requests and responses. We learned how to to effectively do this in both Swift and Python.
Challenges we ran into
We both tackled new skills. Olivia focused on learning CoreGraphics and dealing with manipulating images on iOS. In particular, the Etch-A-Sketch knobs proved to be challenging.
Chris focused on the machine learning aspects. This was his first time using Microsoft Cognitive Services, Docker, and setting up Azure VMs. This was his first experience using a GAN and also the first time having to curate a dataset.
Accomplishments that we're proud of
Learning so many new skills.
What's next for Etch A Cat