Inspiration
The primary goal of machine learning is to reveal patterns hidden to the human eye. The idea that a person's handwriting can be identified by examining specific features, such how straight the edges are, the lean of the letters, or the pressure of the pen, has long existed. However, in this century the art of handwriting recognition has largely been rejected as a pseudo-science: it is simply too subjective to provide concrete evidence. In this equation, the unreliability comes from the human subjectivity but we acknowledge that the patterns are there. This led to my hypothesis that collecting a few samples of someone's signature would allow for a machine learning model to predict in the future whether or not their signature is real.
What it does
The program is built with a graphic interface to easily allow users to provide new input data, train models, and test models. The GUI begins by welcoming the user to the application and asking for the name, to keep track of their data and to accept their consent to the use of their information. The menu then switches to a 4 button menu, allowing the user to 'collect training data,' 'test fraud detection,' 'retrain model,' or quit. The 'collect training data' option results in a popup drawing tool that allows the user to draw their signature with a pen on the touch screen. It also allows them to quit or start over. Once the user is satisfied with their signature, they can 'save and continue.' This process repeats until the user has entered 10 signatures, which are all saved locally and connected to the name they entered in the first menu. The 'test fraud detection' option loads the most recently saved model for that user, based off the name entered in the welcome menu. It pops up with a signature input screen and collects one input signature. It then runs the input against the existing model to guess if the signature is real or fraudulent. It also provides feedback on how confident it is in that guess. The ’retrain model’ provides the machine learning part of the code. It trains a convolutional neural network where the input parameters are the signatures of the current user and any other user in the system and a binary label indicating whether that signature corresponds with the current user.
Results
Overall, the system is substantially more successful than I expected. I did not think it would be possible to predict fraud from just ten samples of handwriting. It would be easy to misrepresent the results of this experiment. In cases such as this where the dataset is extremely limited, verification metrics automatically produced by reserved test cases may not provide much information. As there were only 10 signature inputs, I could only afford to set aside 2 points for verification. While this provided feedback that the system was working in some capacity, it would not be fair to report 100% accuracy with just 2 trial cases. Thus verification was conducted by training models then asking peers to try to forge various signatures. From this it was found that the validity of the model was strongly influenced by the individual signed. In one case, the user input their signature 5 times and all times the algorithm recognized the signature as valid. This was then followed by 3 people who tried a total of 5 times to copy the signature (which they were given a copy of) but failed to trick the algorithm. In another case, the user could not have their own signature verified as real let alone anyone else. Examining the data of this user closer reveal that the algorithm still found the real signatures to be magnitudes more likely but not close enough to binary 1 to be counted. It is theorized that this is because different signatures have features of different prominence. Unique features in unique signatures result in easier featuring drawing. This makes the 1 size fits all model not ideal. If the threshold for success of each person’s signature is recorder, a more appropriate and successful model can be devised.
How I built it
The neural network was built in the Python 3.7 programming language, using the Keras convolution models with Tensorflow backend. After data was collected, separated into testing and training data, and normalized, a sequential Keras model was created. The training flux data was fed into a convolutional neural network. The network was compiled with the ‘loss=binary_crossentrophy’ metric. The final layer of the model was a fully-connected (dense) layer with 1 binary value, which a binary value representing if the signature belonged to the user. Additionally, in order to
Challenges I ran into
One of the most important personal challenges was to make the program realistic for widespread use. I decided at the start of the experiment to limit the input signatures from each person to 10. I believed that requiring a user to sign any more than 10 times would be too unrealistic for practical use. This meant the training data to work with was absolute miniscule. I worked to increase the effectiveness 10 input data points by added extra data points with random noise or orientation skewing. I wanted to include the features of pen pressure, which was difficult to communicate via electronic input. I came up with the solution to design the drawing tool as a series of dots placed intermittently rather than a connected line. This results in a larger distance between dots when the pen was moved faster, to represent the difference in natural pen pressure and speed.
Accomplishments that I'm proud of
For some datasets very successful models were created. This demonstrates that my hypothesis was correct that unique features from the style of handwriting can allow for the recognition of an individual’s handwriting. I am also proud of my recognition of validation needs and my conclusion that 10 data points is not a large enough sample set to create a reliable model for any person’s signature. I would recommend this project is pursued for handwriting recognition but not pursued for large scale signature validation.
Built With
- keras
- pysimplegui
- tensorflow-gpu
- tkinter
Log in or sign up for Devpost to join the conversation.