Inspiration
History is important. Through history, we are able to learn lessons from our ancestors and understand how the world came to be what it is today.
One important event in Japanese history was the Meiji restoration, when Japanese leaders standardized the Hiragana writing system we all know today. Before this, people wrote books using a cursive script called Kuzushiji. Today, most Japanese natives cannot read Kuzushiji. This means there are over a thousand years' worth of books (~3 million unregistered books and a billion historical documents) that are inaccessible to the general public.
Our solution is a web application serving a Kuzushiji Optical Character Recognition (OCR) system.
What it does
The web application lets users upload images or take pictures, and detects the location of each character and classifies them.
How we built it
We used data from http://codh.rois.ac.jp/char-shape/book/ that contains high quality pictures of manuscripts written in Kuzushiji with bounding boxes and a classification label for each character.
The model was built from scratch and consists of two components: a UNET detecting the center of each character, and an image classifier predicting the label of each detected character.
Challenges we ran into
Deep learning OCR for Kuzushiji writing is a very new field and an open research problem, and there are currently no open-source implementations for this available. There is only one published paper (code+models not open-sourced yet) that demonstrates this.
Although our focus was not to solve this open research problem in a single Hackathon weekend (people work on this for months), we still wanted to build and train something plausible that could demonstrate the potential of the application. This was our biggest challenge as we had to build a Kuzushiji OCR system from scratch without reference code.
Accomplishments that we're proud of
Conversely, the fact that we were actually able to build a functioning OCR with our limited time and resources was a huge accomplishment.
However, our most important accomplishment was the interface. We built a web application that lets anyone easily parse Kuzushiji documents using a PC or mobile device. We designed it in a way that the underlying deep learning model can easily be switched out when a new state-of-the-art Kuzushiji OCR system gets released. We are expecting this soon with Clanuwat, et. al.
What we learned
We learned how to use the UNET architecture to predict the location of Kuzushiji characters in a document. We also learned how to build an image classifier to classify common Kuzushiji characters.
What's next for Kuzushiji Lite
We plan to improve the user interface and make the integration process for new models as smooth as possible. We'd like to be able to eventually run it on a real website so we can actually provide the general public access to the interface.
Built With
- javascript
- python
- pytorch
- tensorflow
Log in or sign up for Devpost to join the conversation.