We decided to partake in one of American Express's case study challenges, and both found the multi-step authentication challenge intriguing. Having never worked with facial or vocal recognition or any type of related software, we knew it would probably take the entire weekend to get a functional product, but figured it would be an excellent learning experience.
What it does
The facial recognition software starts by taking 10 pictures of the user in a second. It uses those pictures as a base and generates a variation matrix, composed of 10 vectors of differences of each picture from the average. This data is what can be stored for each customer, amounting to roughly 250KB of floating-point data (but can be increased or reduced depending on requirements). We then take a picture of the user attempting to authenticate. The variation matrix undergoes singular value decomposition to create a "plane" of face data, and then the picture from the authenticatee is projected onto the plane with a certain "success" ratio defining how well it projects. For high projection ratios, the user is authenticated.
The vocal software simply takes an input blurb from the authenticatee and uses fast fourier transforms to get a sound spectrum of the various frequencies, as well as generates the decibel power level of the voice. These two values can be used to differentiate the actual account owner from a fraudulent user attempting to gain access.
How I built it
The code is all written in Python 2.7 with various libraries, primarily OpenCV and Numpy for the facial recognition, and Wave, Numpy, and Pylab.
Challenges I ran into
Originally we wished to upload the matrix data into the cloud and have all SVD computations and comparisons done on Wolfram's cloud computing platform, but the raw amount of data we had was not easily transferred. We also ran into numerous issues getting OpenCV to recognize our faces and ignore face-like objects (it had a strange attachment to my left shoulder as a face...)
Accomplishments that I'm proud of
Even though it's difficult to get clean data from both services, the data yield is incredibly accurate when it does turn out correctly. With better hardware and a better trained facial recognition software, authentication via "selfie" could be achieved.
What I learned
I learned how to manipulate graphics on the fly, as well as audio. I learned how to extract data from the two sources, as well as how to process them to accomplish various tasks.
What's next for VeriFace
I still have much to learn about graphics processing and facial recognition, as well as about the various quirks of OpenCV. Before attempting to further VeriFace, I intend to learn more about the code base behind OpenCV as well as the theory behind image processing.