Title: Eye-See-You: Eye Gaze Prediction using Deep Learning Ji Won Chung (jchung97) Anita de Mello Koch (ademello) Arthur Chen (kchen157) Skye Thompson (rthomp12)

Final Submission

Checkin

Introduction With the rise of portable, accessible technology the use of augmented reality has increased. The idea of virtual assistants and generated images that react to the user are now more possible than ever before. However, we still cannot simulate natural eye gaze, instead creating virtual assistants that give off a feeling of uncanny valley. We hope to start moving towards simulating natural eye gaze by first predicting the rest of the eye trajectory given some starting trajectory.

Related Works There are several works that use LSTM and CNN networks on eye gaze data. Work by Sodoké et al focuses on learning how to filter out noise that is often found in eye gaze predictions using a deep convolutional LSTM. Koochaki et al. try to predict the intent of a user who uses an eye-based interface on computers using CNNs and LSTMs. Both of these works provide a starting point for our project. Additionally, we wish to incorporate natural blinking. For this to be included in the model it is important for us to extract blink intervals from the original dataset. There are works for this on github, for example https://github.com/pathak-ashutosh/Eye-blink-detection.

Data We will be using the webgazer dataset (https://webgazer.cs.brown.edu/data/). This data was collected using 51 participants in an eye-tracking study. The data includes the user input data (mouse and cursor logs), screen recordings, webcam video of the participants face and eye-gaze locations predicted by a Tobii Pro X3-120 eye tracker. We want to augment this data by also including the participant's blink information which requires us extracting this data from the webcam footage and augmenting the eye-gaze locations.

Methodology We will be attempting a CNN and LSTM architecture. Eye gaze is dependent on the previous trajectory and so requires a model that has memory. Similarly, there have been previous works that show that these architectures are well suited to this type of problem.

Metrics We will be predicting the future trajectory based on the previous trajectory. As such we can train on the eye-gaze data from webgazer by splitting the trajectories to form the input and the desired output. As such we can use accuracy as our success metric.

Ethics Our data contains webcam footage of the participants and so must be carefully handled. Additionally we have only 51 participants so it is likely that the demographics of the participants are unbalanced. Additionally, we intend to use this data to mimic natural eye gaze however what people look at and interact with their eyes can be cultural. As such we should be careful to not become to biased to any one group or culture. This problem is well suited to deep learning because large amounts of data can be easily collected and existing methods that do not use learning are not able to simulate natural eye gaze. This implies that the problem is difficult to simulate with a simple model which could be improved by learning a model.

Division of labor Ji Won Chung - LSTM Anita de Mello Koch - CNN Arthur Chen - Blink extraction Skye Thompson - data augmentation

Sodoké, K., Nkambou, R., Dufresne, A. and Tanoubi, I., 2020. Toward a deep convolutional LSTM for eye gaze spatiotemporal data sequence classification. In EDM. Koochaki, F. and Najafizadeh, L., 2019, July. Eye gaze-based early intent prediction utilizing cnn-lstm. In 2019 41st Annual international conference of the IEEE engineering in medicine and biology society (EMBC) (pp. 1310-1313). IEEE.

Built With

Share this project:

Updates

posted an update

Reflections

*Introduction * With the rise of portable, accessible technology the use of augmented reality has increased. The idea of virtual assistants and generated images that react to the user are now more possible than ever before. However, we still cannot simulate natural eye gaze, instead creating virtual assistants that give off a feeling of uncanny valley. We hope to start moving towards simulating natural eye gaze by first predicting the rest of the eye trajectory given some starting trajectory.

*Challenges * One of the hardest aspects of this project was gathering the data. The code to extract Webgazer dataset has been a bit unwieldy because of software compatibility issues. Additionally, extracting eye information for blinks has been difficult as we had to figure out what the corresponding coordinates matched the coordinates with MediaPipe’s face mesh.

*Insights * Yes, the current MSE was about 2500-3000. We also have the pipeline to visualize and generalize eye data from facemesh. It does surprisingly okay for a dataset with low resolution data in a semi-structured setting.

Plan We have started training a model. We have three architectures with variations of LSTM and CNN. Our current goal is to meet a low MSE threshold. We plan to integrate the eye aspect ratio and tune our model. We plan to focus on comparing the MSEs of our architectures to understand which model may work better. Our focus is more on interpretability than accuracy. We may change the integration of the blinks if it doesn’t improve our accuracy, but we also think this is a part of creating an eventually generative model.

*Github * https://github.com/AnitadeMelloKoch/eye-see-you

Log in or sign up for Devpost to join the conversation.