For my project I wanted to see if I could use state of the art machine learning and crowd sourcing methods to make something not only innovative and exciting, but also fun and whimsical.
Although I'm not artistically talented, I like doodling, whether it's when I'm supposed to be paying attention in a class, or for games like "Draw Something". I got the idea for this project after learning about Google Quick Draw project, where they crowdsourced millions of doodles (including some of mine!) to train a neural network, and then released the drawings for the public to use. I really liked the cartoony vibe of these drawings, and thought it would be fun to use them to make your own doodles.
What it does
The overarching theme of the app is to create drawings using crowd-sourced doodles, and there are two main ways it does this.
First, you can add to the image using voice commands. The picture is split into an (invisible) 8x6 grid, and you can specify where you want something placed with regular Cartesian coordinates. For example, you can say "Tree at 4 dot 3", which would draw someone's doodle of a tree on square (4, 3). If you're planning on placing a lot of the same item at once, you could instead say a command like "Frying pan across 7" which would fill row 7 with frying pans, or "Monkey down 0", which would fill the first column with monkeys. If you don’t like a doodle, you can click refresh to try a different user submitted doodle, and if you want to simply restart the whole drawing, you can say “Erase” to get a blank canvas (full of possibilities!).
The second way to create a drawing is you can paste in a link to an actual photograph, and the program will doodle-fy it! It will attempt to detect prominent objects in the photo and their location, convert those objects into doodles, and place them in the blank canvas so that they're in the same position relative to each other as in the original.
How I built it
This program was built using Python, the GUI using PyQt, and image creation through Pillow.
I use Google QuickDraw's open dataset to access some of the millions of doodles submitted by people around the world, each corresponding to some category like "car", "tree", "foot", "sword", etc, and these are the building blocks for Doodle My World's images.
For the voice commands, I used Microsoft Azure's speech services to convert the commands into text. The program then searches the speech transcript for certain keywords, such as the object to be doodled, its coordinates, whether or not to fill a row, etc. The name of the object is fed to Google QuickDraw, and if it's matched with a doodle, it will add it to the canvas at the coordinates specified (the box coordinates are converted to pixel coordinates).
For the photo conversion, I used Azure's computer vision to analyze photos for objects and their locations. It detects prominent objects in the photo and returns their pixel coordinates, so I put in the list of objects into QuickDraw, and if a doodle was available, I placed it into the image file at the same relative coordinates as the original.
Challenges I ran into
One of the main challenges I ran into was getting the APIs I was using to work together well. Google's QuickDraw based its data off of things that were easy to draw, so it would only identify specific, simple inputs, like "car", and wouldn't recognize Azure's more specific and complex outputs, like "stationwagon". One way I got around this, was if QuickDraw didn't recognize an object, I'd then feed in what Azure calls its "parent object" (so for a stationwagon, its parent would be car), which were often more general, and simpler.
Another issue was that QuickDraw simply didn't recognize many common inputs, a specific example of which was "Person". In this case my workaround was that whenever Azure would detect a Person, I would feed it in as a "smiley face" to QuickDraw and then draw a shirt underneath to simulate a person, which honestly worked out pretty well.
Accomplishments that I'm proud of
This was my first time attempting a solo hackathon project, so I'm really happy that I managed to finish and reach most of my initials goals. This was also my first time creating a front end GUI and doing image manipulation with Python, so I learned a lot too!
What's next for Doodle My World
There are a lot of voice commands that I wanted to add, but were more like stretch goals. One idea was to be able to specify more specific ranges rather than just single rows or columns, such as "Star from rows 3 to 5" or "Mountain in half of column 2". I also wanted to add a "Frame" command, so you could create a frame out of some object, like stars for instance.
Because of the disconnect between the specifics of Azure compared to the generality of QuickDraw, I'd originally planned on adding a thesaurus feature, to get related words. So for instance if QuickDraw doesn't recognize "azalea" for example, I could run it through a thesaurus app, and then feed synonyms to my program so that it could eventually recognize it as a "flower".