We were inspired by the Charles Schwab rep at TAMUhack when he said that the company was looking for futuristic password-less login methods for their users. He mentioned using a James Bond style retina scan as a possibility. While the MLH hardware lab does not carry retina scanners (unfortunately), we realized that Microsoft's Azure has a wealth of Cognitive API that could help us. We wanted to create a solution that uses a combination of live facial recognition and speech verification to authenticate users - without the need to memorize anything!
What it does
Our solution takes advantage of sensors that are available on most devices today - a camera and a microphone. A user can potentially use their smartphone or a laptop to perform secure login by taking an image of the user from the camera, asking the user to blink a few times (to ensure that the image is of a real person and not photograph) and recording the user speaking out a preset phrase out loud. The face and voice clips are sent to the Azure platform for verification. If authorized, the user can login successfully, seamslessly without remembering a passphrase/login image.
How we built it
Azure has convenient API for both facial and speech verification. We used python scripts to take pictures/audio clips from a laptop and submit to Azure servers for training. This completes the enrollment for an authorized user.
Once a login request is received, a separate script requires a potential login user to face the camera, and blink a few times. We use openCV to detect blinks and at the same time, send an image of the person to Azure. Once the image is verified, we used another python script to record a .WAV file of the user saying a preset phrase that is displayed on the screen. This is again sent to Azure's Speech verification API to receive confirmation of the user's authority.
Challenges we ran into
While the face detection component of Azure's cognitive services works reliably, we found that the Speech Verification API only works on .WAV files that are recorded in a noise-less environment (Hard to find in a Hackathon!). Also, their API only allows a few preset phrases for enrollment. It would've been nicer if it allowed randomly generated phrases. (In all fairness, though, they do state clearly that the Speech API is still under development)
Accomplishments that we're proud of
We didn't really have any concrete ideas coming in. Being able to come up with a viable solution to a given challenge and being able to implement it (quite reliably) within 24 hours is awesome!
We are also proud that we got our solution working on a Qualcomm Dragonboard 410c, for IoT applications
What's next for tamuhack2019
If we had more time, we would have liked to get a working front end website with a UI. There are more features in Azure's API that we would like to explore.The API allows us to potentially detect emotions of the user from their face (to recognize if they are under external pressure and being forced to log in, to trigger a silent alarm to the authorities or the company).
Further, since most of the image processing is on the Azure servers, we can use lightweight hardware like the Qualcomm's Dragonboard 410c to have login kiosks for a variety of use cases! (Smart doors, anyone?)