Team 7 | SketchID

Inspiration

I was working on a separate project involving machine learning outside of the hackathon, which was taking existing machine learning models for optical character recognition(OCR), and attempting to modify the values to accept symbol cipher values(see dcode.fr/symbols-ciphers). This challenge was a very similar concept, adapting the more general google quick draw dataset, training a model on that and transfer learning the layers to make it possible to identify a small dataset.

What it does

It creates a convoluted neural network and trains it on padded strokes of lists of x-y values. It parses multiple ndjson files as input, iterating interleaving them to make an mixed dataset.

How we built it

The program is entirely built in python, specifically tensorflow 2.x. It creates a neural network consisting of convolution and dropout layers, then dense layers to finally parse to output. The code is able to run on gpu cuda, so it very efficient and quickly trains in under 30 minutes.

Challenges we ran into

We ran into two main challenges. Training a more complex neural network would end up taking significantly more time and would result in not being able to finish in time. It was initially planned to be a RNN, however the multiple bidirectional layers and higher node count would multiply the time need by almost 3 orders of magnitude. The other challenge we ran into was the fact that data was parsed in separate files. This would mean that training the data would end up being inaccurate with values clumping up. We remedied this by creating a custom function that would iterate through a list of generators.