What starts with weakness in limbs soon progresses to muscle degeneration, which can rob a person’s ability to walk or use one’s arms. As of 2015, this is the harsh life of over 200,000 ALS patients. Similar conditions affect the day to day functioning of numerous physically disabled people around the world.
We wished to try and lessen their difficulties. The first idea we thought of was to make their web experience as seamless as possible. Since the nerve degeneration starts from below the neck, we thought of using Computer Vision to allow users to navigate websites completely hands-free using just their facial movements as captured by their webcams.
What it does
Our project is a python program which enables users to interact with websites using just their facial movements such as eye blinks, eyebrow raises, and mouth opens. By using these movements, they can interact with buttons, dropdowns, slideshows, text fields and other web elements.
Users are able to type text using a set pattern of facial gestures. These patterns are mapped to morse code and the output is shown in the form of text.
Additionally, our program employs facial recognition to remember previous users and is able to run using the pre-saved ratios of their facial features. This means that our software can be used by multiple users.
Finally, we allow the users to send emergency Emails and WeChat messages. Once the webcam detects no facial patterns for 10 seconds, an emergency message is broadcasted to all their preferred contacts. This is especially useful if people with disabilities happen to face an accident.
How we built it
We use an existing library for implementing facial recognition of the user. This is used to verify and use pre saved thresholds unique to each user. For the blink, eyebrow raise and mouth open detection, we use a pre trained model that takes in a video frame and returns a list of 68 (x,y) coordinates of the facial landmarks detected. We then perform computations on this data to get the respective thresholds.
Morse Code Decipher
The typing process consists of registering a blink as a dot and an eyebrow raise as a dash. Once the user is satisfied with the current pattern typed, they can open their mouth to enter the string into our decipher function. Thus a sequence of these operations can be used to interpret Morse code and type a complete message.
WeChat and Email
The WeChat message is sent using WeChat API, itchat. An emergency message is sent after a user to logs in by scanning the QR code. The email is sent using Python’s smtplib library which provides backend features for handling the sending and receiving of emails from one account to another.
The demo website was made using Materialize which is a CSS library very similar to Google’s renowned Material Design. There are multiple interactive elements which have classes such as
bc-1. These help the web control script identify the list of interactive elements and how to interact with them. The website was hosted using GitHub pages.
Selenium was used for web automation. It is used to choose the next “bc-” element and scroll to it. A separate function decides the type of element it is and how to interact with it accordingly.
Challenges we ran into
- Finding a way to take morse code input and show the live changes in the input field.
- Recognising new faces in different lighting conditions
- Setting sensitivity thresholds for different facial gestures
- Merging the web automation and facial recognition software into a logical program flow
Accomplishments that we're proud of
- We were able to design functions to identify, iterate through, highlight and interact with specific elements using Selenium.
- We were able to implement facial gesture recognition only using the desktop’s webcam and without the use of any other external hardware such as LeapMotion.
What we learned
- How to find elements and interact with them in a website through web automation using Selenium.
- How to implement a seamless and feasible workflow between numerous code modules.
- How to do facial recognition using OpenCV and optimise parameters.
- How to implement a pre-trained model.
What's next for BlinkCeption
- Adding an autocomplete function using NLTK to the morse code input.
- Improving accuracy of gesture detection.
- Finding a method to adapt the scripts to all websites - or to provide easy steps to integrate the system into existing websites.
- Adding browser control - switching tabs, opening settings, opening history, etc.
- Switching between pages on the same website