OCR2Text

Inspiration

Helping the UN operations in the underdeveloped nations.

What it does

It converts OCR images to an ordered sequence of digits and puncturations.

How we built it

A residual neural net is firstly used to scan patches of images to extract local features of the digits, punctuations and other symbols. Afterwards, a seq2seq translation/transduction model is employed to squentially evaluate and convert the local information gathered by the resnet, into a sequence of digits translation.

Challenges we ran into

The restricted quality and quantity of training data posed a serious limitation the model capacity, observed from overfitting during training.
Without explicit image segmentation at the preprocessing stage, we implicitly segment, or in other words, sequentially parse the images information, using a seq2seq model widely applied in NLP and NMT.

Accomplishments that we're proud of

We can reach 26% percent accuracy on the test set validated by the server, after pretraining the resnet part on mnist dataset for 100 epochs followed by training the complete pipeline for 4000 epochs.

What we learned

Identifying a mature image segmentation and semantic recognition model or algorithm would make things a lot easier... But building a novel neural architecture, train and evaluate it on your own certainly has a lot of fun nonetheless.

What's next for OCR2Text

Improve the pipeline accuracy! Incorporating some pretrained model looks like the only way to go.

Built With

tensorflow-matplotlib-opencv

Updates

Zichao Yan started this project — Jun 09, 2019 10:46 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.