Inspiration

Thanks AI4good.ca for putting the need of ONG in front of us.

What it does

Handle tiny handwritten images of text written by medical field worker (doctor) in Congo

  1. Preprocess these images in grayscale and deblur them using opencv-python
  2. A tensorflow model (see images below) map images to text
  3. Post-process the image (sanitize) so that impossible case are not pushed to the end user 3.1 For example, 2 consecutives commas or dots with a number are removed 3.2 non-leading dashes are removed

How we built it

Using tensorflow. Inspiration: https://github.com/watsonyanghx/CNN_LSTM_CTC_Tensorflow#how-to-run It's a CNN-LSTM-CTC implementation in Tensorflow.

Process description

  1. Have a CNN with a small stride scan the whole image from left-to-right
  2. Pass the channels from the CNN, to an LSTM that further encode the meaning of each consecutive windows
  3. Pass the LSTM state to the final layers which handles deduping repeated character predictions from consecutive windows... and does it in a differentiable way ;-)

Challenges we ran into

--> Implementation was not easy: important to choose our starting point with care

--> The images were very small, preventing from boxing each character in a way that we could recognize the content of the box

--> The image were really faint, with a little bit of contrast magic this challenge was overcome.

Accomplishments that we're proud of

  • Building on the recommendations of each other
  • Got to learn about CNN-LSTM-CTC and implement it
  • Our perseverance; it yielded a great leaderboard score of 84.67%

What we learned

  • Preprocessing of images, with opencv
  • Goodness-of-fit for OCR of the CNN-LSTM-CTC architecture
  • Got to know new teammates

  • Mask R-CNN expects larger images; so it's not a good solution for small images.

  • Importance of pivoting and choosing simple solutions

What's next for wymbah -- Helping read doctor's handwritting

Built With

Share this project:

Updates