Links
Introduction
The objective of our project is to be able to classify fingerprints of a known set of individuals. We do this by embedding the fingerprints into a latent space, and training our model to cluster together fingerprint impressions from the same individual and separate impressions from different individuals.
Methodology
We use a residual CNN architecture that was original to the paper we follow. The architecture maps the fingerprint images into latent space. Then uses triplet loss to train the network to cluster impressions correctly. Triplet loss works by selecting triplets, which are composed of an anchor example, a positive example (which has the same label as the anchor), and a negative example (which has a different label than the anchor). Triplet loss rewards the network for putting the anchor and positive close to each other in latent space, and rewards the network for putting the anchor and negative instances far apart. For classification, we embed a given fingerprint into latent space and we use KNN to determine its label, where the neighbors for KNN are fingerprints that we embed and store during training.
Our dataset had 50 unique identities and 5 fingerprints for each identity. We split up our data on the 5 fingerprints themselves. That is, our training tensor had 50 identities with 3 fingerprints each, and our testing tensor had 50 identities with 2 fingerprints each. An important detail to observe is that our model was not tested with fingerprints that it had not seen before.
Results
After training for 1000 epochs, we arrived at 82% accuracy on the training data and 90% accuracy on the testing data. We defined accuracy to be the total number of correct classifications divided by the total number of classifications.
While these numbers in and of themselves were high, we wanted to verify that our latent space looked sound: that is, we expected that the neural network embedded fingerprints for the same identity very closely in latent space. To visualize this, we reduced our 300-dimensional fingerprint embeddings into 10 dimensions using PCA, and then plotted that using a t-SNE visualization graph. In the graph, points with the same color belong to the same identity, circles are training points, and squares are testing points.
We do have some outliers, such as Identity 0's square near (-6, 0) and Identity 2's circle near (-2, -6). We leave it as future work to understand why these embeddings aren't ideal.
Challenges
The first major challenge we encountered was the exploding gradient problem. When we initially tried training our model, the weights became extremely large, and eventually resulted in our model outputting NaNs. To fix this, we lowered the learning rate and added an extra batch normalization layer. With the help of a few additional bug fixes, our weights no longer exploded. However, we later experienced this same issue later on, when changing the way that we split our training and testing data. Unfortunately, despite changing our learning rate and latent size, as well as adding more weight normalization throughout our model, we weren't able to resolve this issue. Hence, we were unable to evaluate our model on entirely new individuals that it hadn't trained on.
Another challenge that we faced was that the latent embeddings were somewhat sparse, as seen in the t-SNE visualization. In an ideal world, cluster centers would be as small as possible, so that we wouldn't have the chance of misclassification. We attempted to decrease the size of the clusters by decreasing the size of our latent space from 300 down to 200, but didn't see significant improvements.
Reflection
The project ultimately turned out to work quite well, especially because our accuracy was what we were hoping for (~90%). Our model worked as expected, apart from the issues with the exploding gradient that the original paper didn't mention. Our approach was fairly straightforward once we found a paper that we wanted to implement. We slightly deviated from our initial project plan, so our base, target, and stretch goals do not apply, but we are happy with the progress we made.
If we had more time, we might have invested in running a hyperparameter search to see how to make our model perform even better. We had several hyperparameters whose values we rarely explicitly sought out to tune, including the latent space size, the batch size, alpha (for triplet loss), and the number of positive anchors we chose per batch. In the future, we might consider doing this earlier on in the model creation process, so that we can understand whether it is our hyperparameters that were failing us, or whether the model itself was deficient.
A big takeaway that we had from this project was not to try to implement a paper if we didn't fully understand how it worked. Initially, we were going to try to implement a paper that was similar to the one that we went with but left a lot more vague. If we were to have implemented that, we would have to do our own "research" to fill in the blanks that they left. Instead, choosing a paper that was more concise and accessible led to our getting to a higher accuracy more quickly. Finally, we found that if details were left out, there would be a good amount of experimentation needed to try to understand what the authors intended. What helped here was reading the related work of the paper, which contained useful context that the authors must have assumed we had.
Built With
- tensorflow
Log in or sign up for Devpost to join the conversation.