hawk.ai

Inspiration

Millions of clnical records in the world each year are stored in pen and paper format. With the ever increasing complexity of healthcare, many patients receive suboptimal care as these records go undigitized and thus unable to be linked to the greater health picture of a patient. Using latest machine learning models, we aim to solve this problem.

What it does

We built a neural network that transforms a picture of handwritten numbers into digitized text.

How we built it

Our model is a custom CRNN-like model (but without CTC loss and other idiosyncratic implementations) we built from scratch in PyTorch. The model uses a ResNet50 as a backbone and feature extractor, which feeds a 512-dimension latent vector to the bidirectional LSTM.

The model uses a per-timestep cross-entropy loss alongside an auxiliary loss to improve stoppage prediction. We trained the model using SGD with Nesterov momentum, alongside cyclic learning rates for 42 epochs. The data was augmented with standard affine transformations, and we converted the full MNIST dataset in .jpg format similar to that of the OCR challenge in order to pretrain both the ResNet50 head and the LSTM.

Challenges we ran into

We attempted using computer vision methods to separate the digits of the numbers to simplify the problem, but we weren't able to achieve desirable results doing so. Instead, we decided to go for an end-to-end model without any kind preprocessing.

Building our own model from scratch wasn't easy. A lot of debugging and hacking. At first, by inadvertence, we weren't processing training data and validation data the same way, causing the model to achieve suboptimal performance. We didn't realize this mistake until later, and we were scratching our head about why our model was training well but validating poorly.