Our project was inspired by the reality that while ASL and braille may work as forms of communication for people with certain disabilities, others who are paralyzed or have difficulties with fine motor control may not be able to use those methods. However, people who are paralyzed may still have control of their eyes, which is why we built, an eye movement communication web app using machine learning for people with limited mobility.

What it does

The model locates the user’s eye from a webcam and outputs the eye position: left, right, up, down, or center. Our demos include using eyes to move an object on screen, which can be applied to moving a mouse or playing a video game. An options menu demo shows the user choosing “Yes” or “No” by looking left or right, respectively. A text typing demo maps each sequence of 3 eye positions to a character to replace typing with keys.

How I built it

We trained a CNN for 2 epochs using the PyTorch torchvision library on normalized images from the Eye Gaze dataset on Kaggle. We relabeled the images with 5 classes-left, right, up, down, and center-based on the dataset’s eye gaze vector. The feed-forward CNN architecture was adapted from the Stanford University paper ‘Convolutional Neural Networks for Eye Tracking Algorithm’ by Griffin and Ramirez, but we changed the input size and output classes and added an extra Linear layer to prevent loss of information because their output layer had 56 classes while we only had 5.

To locate the eye from a webcam, we used the facemesh library, which locates key points on a user’s face. From there, the node server POSTs the eye image data to a Flask server that serves our model, and the Flask server normalizes the image and responds with the classification.

Challenges I ran into

All of us were relatively new to machine learning, so figuring out how to load data and use PyTorch involved a substantial learning curve. The accuracy of our model was 60%, possibly due to the Eye Gaze gaze vectors not being clearly left or right but many slightly left or right, and the model had issues classifying up, likely due to the dataset having very few up images. When training, the loss didn’t stabilize as expected, possibly due to the learning rate being too high. Trying to pipe between our javascript web app and our python model was a challenge, including navigating CORS permissions when sending requests between servers.

Accomplishments that I'm proud of

We were literally holding our breath the first time we trained the NN and had a great moment when it finished training without snagging on any errors! Even though the final model was only 60% accurate on the test images, it was a big improvement from the essentially random guessing of our first model, and it was reasonably accurate for controlling an object on screen. In general, we are really proud of our project idea and are committed to continue developing the app after this hackathon to become an impactful product.

What I learned

We learned a ton about the PyTorch library, different types of neural networks and their applications, and many other machine learning concepts. We experimented with different inputs and CNN architectures, such as normalized vs. unnormalized data, 2 or 3 linear layers, more or fewer classes, and training on one or both eyes, which gave us a better grasp of hyperparameter optimization.

What's next for

We want to focus on improving the accuracy of our model by changing the CNN architecture, using an adaptive learning rate to prevent overshooting the minimum loss, filtering our data to only include clearly right or left gazes, and adding more up images to the data. The increased accuracy would make typing text with eye movements more viable, at which point we could implement a text-to-speech feature for real-time conversation. We will improve our web app UI to more easily switch between demos and allow users to set custom mappings between eye positions and keys. We also want our NN to recognize more eye movements, such as blinking, which could stand in for clicking a mouse, raising eyebrows, etc. Ideally, we want to integrate’s functionality into native keybindings and mouse input so that users can operate any software with their eyes.

Built With

Share this project: