How come there is no modern OCR for Braille in Spanish?
Over 35 million people in the world are blind. The last decades brought a lot of improvements to make the world more accessible for blind and visually impaired people. However, there's still work to do: There are very few solutions that allow people with sight to understand braille documents. Especially in the school environment, where braille is used for exams and projects, such a system would greatly facilitate the integration of blind children.
Braille is a tactile system, the transcription -or illumination- of a braille text written by a student currently requires the physical presence of the document. The transcription is done manually and can take up to 15 days. 15 days that we pretend to transform into less than 15 seconds.
In cooperation with ONCE, the Spanish National Organization of the Blind, we're developing an OCR (Optical Character Recognition) for Braille, specially targeted for educational purposes.
What it does
BCR (Braille Character Recognition) offers a web interface that allows you to take a photo or scan a braille document and transcribe it to text within seconds. The result of the transcription is shown as an extra layer superimposed on the original image of the braille text. By mimicking the traditional illumination method we allow the teacher to correlate each word in place with the symbols written by the student. We also included a magnifier to zoom in on any section of the document and inspect details.
How we built it
The system is composed of the following components:
An Angular SPA with a straightforward interface, similar to a web translator
A .NET API that is in charge of user authentication, image upload, and orchestration of the detection and transcription of braille symbols
A multilayer CNN (Convolutional Neural Network) using segmentation and classification models that allow the detection of the following objects in the document:
The models were trained using a dataset of 20,000 auto-generated braille documents. After the initial training, the model was refined using a set of real braille images provided by ONCE.
- A Python API that allows the detection of braille symbols using the CNN model
- A Python API that transcribes the text from braille to Spanish using a custom algorithm where Braille interpretation rules are encoded.
Although due to its own nature, this solution is used by people with normal visibility, we consider that our application at least meets the accessibility level AA, desirably level AAA.
BCR is built on Azure Kubernetes Service (AKS). The modular approach of Kubernetes allowed us to develop the different parts of the system independently, using the technology that best fits the purpose. It further provides high availability and rapid scaling of the different parts of the system on demand and allows a zero downtime deployment.
Challenges we ran into
During the development we faced the following challenges:
- Document orientation, light conditions, contrast, and input method (photo /scan)
Braille is a tactile system, meant to be read with the fingers. Braille characters, in order to be identified by touch, are formed from perforations on the back side of the paper.
One of the main challenges in implementing an OCR process of a braille text precisely consists in obtaining an image in which the "dots" that make up the braille characters can be seen clearly in the digitalized image.
Additionally, different input methods create different types of shadows: There is a huge difference between the shadows of the braille points produced by scanning a document (equally distributed light) or taking a photo. A bad capture can make it impossible to read the document. Inadequate lighting casts distorted or absent shadows -in direct or overhead lighting-. Certain noise or stains present in the image can lead to the confusion of grains or dirty pixels with shadows from the perforations. A distorted perspective of the sheet offers a non-uniform horizontal or vertical alignment of the grid, forming an inhomogeneous spacing that leads to errors in reading and subsequent automatic translation.
- Availability of Training material
The only publicly available dataset uses double-sided braille (impression on both sides of the paper).
For computer vision, this scenario is far more complex as you see the shadows from the "hills" of the braille letters on the top side mixed with the shadows from the "valleys" on the other side of the paper. Also, there's little material available in Spanish, that's why we decided to generate our own dataset from different types of points extracted from real documents.
- Evaluation of trained models
Classification models are commonly evaluated by counting the number of correctly identified objects. In our scenario not only the number of correctly identified characters is important but also the correct order of letters in a word and words in the text are crucial to make the text understandable.
We solved this problem using the Levenshtein distance. It allows measuring the difference between two sequences. The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.
- Braille transcription
The braille matrix consists of 6 points because that's what a finger can "read" with one touch. In total there are 64 possible combinations. The alphabet has 26 characters in lowercase and another 26 chars in uppercase. If we also think of numbers, punctuation, and special symbols, possible combinations are rapidly exhausted. That’s why braille uses a contextual system: There are symbols to indicate the next character is in uppercase o a number. Additionally, some symbols with different representations in Spanish use a single braille symbol, as for example ¿?.
To understand a single symbol we need to evaluate the context. Also, Spanish Braille is not the same as English Braille, the alphabet and numbers are the same, but punctuation and special symbols are different.
An extra challenge is that the text might contain mistakes that should not be auto-corrected to guarantee a fair evaluation of the student's work.
Accomplishments that we're proud of
- Better integration of blind/limited vision students in the scholar environment
- Help parents to accompany their children's education
- Provide educational resources for ONCE
What we learned
- A lot about AI / CNN and computer vision!
- A lot about Braille
- And a lot about the challenges that blind/limited vision people face every day.
What's next for BCR (Braille Character Recognition)?
BCR is in an early phase, there are a lot of points on our list:
- Improve the web interface (design, accessibility, line-by-line output, improved zoom)
- Improve the detection of braille points
- The possibility to correct errors in transcription
- Support for transcription of documents in other languages
- Add the possibility to download the braille image with illuminated text
Try it out:
You can test BCR at https://bcr.northeurope.cloudapp.azure.com/ using the following credentials:
- Username: buildfor2030
- Password: test