Real-Time Emotional Analysis for Telehealth Systems

Example graph over time
Client Side Screen - Happy
Patient Side Screen - Sad
Patient Side Screen - Happy
Client Side Screen - Sad

Inspiration

Prior to the COVID-19 pandemic, the vast majority of consultations with medical professionals were conducted in-person; however, the pandemic caused many patients to attend virtual consultations, a trend which continues to this day. Unfortunately, the impersonality of telemedicine visits has been detrimental to patient satisfaction as a whole, a fact corroborated by a study conducted in 2021..

Function

The program is able to scrape screenshots of facial expressions during video calls, then uses the DeepFace AI model in order to categorize the emotion expressed by the patient. This data is displayed in an overlay for the healthcare provider in order to provide real time feedback as to the emotions that the patient is likely feeling with the goal of enabling the healthcare provider to adapt their behavior in order to better establish trust with the patient. In short, the program seeks to establish a greater emotional understanding between the doctor and patient, thereby increasing trust and patient satisfaction.

Development

At the heart of the program is the DeepFace AI model, which is trained to categorize facial expressions into seven different emotions and approximate the percentage of each emotion that the subject is expressing. The model accepts an image as an input, which was harvested using a screen scraper. The healthcare provider and patient speak over a third party video calling software such as Skype, and the patient's face is is inputted into the model using the scraper. These functions and outputs are displayed in an overlay built using pyqt5.

Challenges

One of the main challenges that we encountered was obtaining the source image to be fed to the model. The most intuitive solution would be to take an image directly from the user's webcam; however, this would require the app to have exclusive access to the webcam and would be extremely resource intensive. As a compromise, a third party video calling software was used to facilitate the actual video feed, and a screen scraper was used to obtain screenshots of the patient's face. This is not as seamless as accessing the webcam directly, but it allows the app to circumvent having to facilitate the actual video call, as well as allowing it to be adapted to other video-conference systems.

Accomplishments

Our greatest accomplishment was the successful implementation of a fully-functioning prototype. Although it is unpolished, it performs all of the tasks which we sought to accomplish. This was made possible by our successful implementation of a machine learning model as well as the creation of an overlay in Python.

Key Takeaways

We learned several important concepts throughout the development of this project. For instance, we learned how to implement a facial recognition AI model, create an overlay using pyqt5, and scrape images from the screen.

Future Development

The next logical step for this program would be to implement video transfer so as to remove dependence on third party applications. This would improve the accuracy and efficiency of the program in addition to requiring less effort from the user. Further training could be done with the model in order to improve accuracy, and the overlay could be polished to provide a more complete experience.