Introduction

Tools to correct spelling are a fundamental tool for students and professionals alike, thus the world of machine learning has welcomed a new era of using models to correct and optimize spelling. However, these methodologies are perfected for the average writer, and largely leave those with various processing disorders by the wayside. In this particular project, I will be focusing on Dyslexia, however, this problem is far more widespread.

These novel models for spelling correction are trained with programmatically generated data. The use of programmatic scrambling to simulate spelling errors, however, this tuning like most is created for the neurotypical user, and tuned accordingly. However, as we seek to increase accessibility, for various neurodivergent users, this sort of programmatic scrambling fails to comprehensively correct errors characteristic of processing disorders such as dyslexia

This project primarily functions as a proof of concept. The UI and current model demonstrate the promising nature of this software, and looking forward the applications are endless. The end goal would be spelling checking on either an IDE or Google Docs

Scope

In order to prevent scope creep, and ensure a properly functioning product, this project seeks to assuage the effects of three of the seven major types of dyslexia: Phonological Dyslexia, Surface Dyslexia, and Primary Dyslexia. These three were selected because their effects are easily observed in typed writing. The benefit of this selection is twofold: first, it eases the process of creating a dataset, and second individuals with these types of Dyslexia would benefit most from our product. Additionally, it is widely believed that phonological dyslexia is by far the most common type: so providing powerful tools aimed at helping phonological dyslexics would provide the largest possible reach.

Data

Deeper dive into the problem

Using "Creating a spell checker with tensorflow" (citation 1) as a heuristic we can have a more comprehensive understanding of the problem. First is the frequency of errors. Current tools are tuned to anticipate errors approximately 5% of the time. While this is quite reasonable for your typical user, for atypical users, this is drastically too low. In the training set that I derived there was an error rate of a bit above 18%.

Second is the nature of errors. In this particular article, the errors generated were fairly simple: ie swapping letters, adding letters, and omitting letters. Whereas the errors typical of dyslexic writers arise from the deficits in the ability to link phonemes to graphemes when encoding. The result is phonetically spelling, or atypical errors that are difficult for traditional tools or models such as Grammarly to catch.

Data derivation

For this hackathon I hope to make a proof of concept: this will prove that a model can pick up on the particular errors characteristic of a typical dyslexic writer, that could be expanded upon for future research.

In order to create this proof of concept, I was able to sit down with my sister. My sister is a high school age student who struggles with dyslexia to the point where typical tools such as Grammarly are insufficient. I sat down with her and dictated my Twitter timeline, and several Reddit news posts in order to create a 410 word training set. I had her write in Notebook, a tool that has no spelling and grammar checking to ensure that her writing would be unaffected by any other tools.

Methodology

In order to achieve my goals, I researched several methodologies to train a model but ended up modifying code that can be found here https://github.com/vuptran/deep-spell-checkr to support custom training and to create the best possible model from my training set. The h5 file as a model is the primary deliverable, a stretch goal is to deploy this model as an extension or other tool a student could utilize.

Looking Forward

In the future, I hope to apply my model as a comprehensive tool for dyslexic students. Additionally, I have already started leveraging my connections with Dyslexia specialists. A deal is already in place to acquire data from grade 8-12 students. This partnership will provide the necessary data to flesh out the model, and is an extremely exciting development for the project.

  1. Currie, D. (2017, May 18). Creating a spell checker with tensorflow. Medium. Retrieved December 31, 2021, from https://towardsdatascience.com/creating-a-spell-checker-with-tensorflow-d35b23939f60
  2. Flôr, A. (2020, November 2). Spelling correction using tensorflow 2.X. Medium. Retrieved December 31, 2021, from https://arthurflor23.medium.com/spelling-correction-using-tensorflow-2-x-a063f428c106
  3. Vuptran (2021, December 31.) Sequence-to-Sequence Learning for Spelling Correction Github Retrieved December 31, 2021, from https://github.com/vuptran/deep-spell-checkr
  4. AH, N. H. (2021, December 10). Learn about the different types of dyslexia & how to identify them. neurohealth arlington heights. Retrieved January 4, 2022, from https://neurohealthah.com/blog/types-of-dyslexia/

Built With

Share this project:

Updates

posted an update

Update: the project is coming along nicely! While waiting on data to come in, I attempted creating a chrome extension that used my model, however due to changes in how google docs processes user input, I was unable to create a successful implementation and deemed it out of scope. I was able to get the data set back from the Academy I am working with, however, due to Covid related delays, I am still entering the handwritten data into the training set. I am super excited to continue working on this project.

Log in or sign up for Devpost to join the conversation.