posted an update

Intro

Credit Card Fraud and detection is one of the most complex topics to take on from a traditional programming perspective. Looking at individual factors, it can often be difficult to determine if a transaction is “fraudulent”, or simply a normal user on vacation. Detection is essential to avoiding financial difficulty, either due to potentially thousands being stolen by fraudulent transactions or locking real users out of their financial resources in situations where they may be essential and necessary. Every member in the groups has identified a situation where an international transaction has been blocked, and two out of the three identified having had credit card discrepancies throughout their lifetime, or knowing someone who had. In short, this problem has personally affected our group. Our hope is that by leveraging AI models to determine inherent patterns with fraudulent transactions using open source financial data, we will be better able to determine both an ideal approach from a model structure standpoint as well as an accurate framework for detecting fraud. As this will require labeled data and will classify transactions into both fraudulent and non-fraudulent categories, this should be a supervised classification problem.

Related Work

In "Deep Learning Methods for Credit Card Fraud Detection" (Nguyen et al) explores multiple models used to detect fraudulent credit card transactions. This paper includes the architecture details for both a 1DCNN and a 2DCNN. Both architectures contain two convolution layers and use sigmoid activation. The 1D model also contains dropout and uses Max Pooling. Both these models performed with a similar accuracy when tested on datasets with varying levels of class imbalance.

Data

Kaggle will be our main source of data for this project and we intend to run our models on multiple sets. A challenge we face will be in the class imbalance within many of these datasets. It is difficult to find a sufficient sample size of fraudulent examples compared to the valid transactions that are contained within the datasets. Some datasets overcome this by using synthetically generated fraudulent samples. We plan to explore if a model trained in this fashion will classify samples with a higher accuracy than one without the artificial fraudulent points.

Another issue that exists in the data is that due to privacy transaction features used for classification are often encoded. This will not prevent constructing models for classification but would make it difficult in the future for any analysis on weighing what features are more significant.

Methodology

We plan to experiment with a variety of model architectures to see which performs best. Potential architectures include vanilla RNN, LSTM/GRU, and transformer. In order to keep models in the same wheelhouse and best attribute performance differences to different architectures, we will aim to keep all our models at the same number of FLOPs, at least initially. We may also experiment with how high of a performance ceiling we can reach for each architecture without limiting FLOPs, though this would be very intensive in time and resources.

Since it consists of multiple transactions over time, credit card data can be seen as a time series. Further, whether or not a specific purchase is fraudulent depends largely on the context of other purchases made by that card holder. For these reasons, we anticipate the more complex models performing better since they are able to factor in relevant information about previous transactions by using mechanisms like cell state and attention.

Metrics

In order to thoroughly evaluate the accuracy of the various models, our group intends to compare results alongside other models tested on the same datasets to get a comparison of how they perform. We also intend to analyze the false positive rate and the false negative rate within miss classifications in order to pinpoint bias in our model.

Drawing from the evaluation criteria in (Nguyen et al), we are also planning to adjust the class imbalance within our datasets to observe how this may affect accuracy overall. Our desired conclusion will be discovering what architectures work most reliably on our various datasets and exploring what some of the issues may be within the process.

Ethics

The major stakeholders in this problem are credit card companies and credit card holders. Both credit card companies and holders want to know when a fraudulent transaction occurs, and handle recourse with minimal effort. If our system works well, it will facilitate this by automating the process of fraud detection, which in turn makes recourse easier when both parties agree that fraud has occurred. If our system works poorly, it could result in one or both of the following: false positives, where honest transactions are incorrectly labeled as fraudulent and card holders would have to verify their authenticity; and false negatives, where fraudulent transactions go undetected at the expense of both the company and the holder. The latter case is likely a bigger deal, since it would require cases to be manually investigated, and the entire point of our system is to minimize this necessity.

It is also possible that our data could lead to biased outcomes against certain groups. For example, it might be the case that a certain type of purchase is fraudulent at a higher rate than others, and our model learns to reflect this pattern. If certain groups are more inclined to make this type of purchase, then our model might falsely label that group’s purchases as fraudulent at a disproportionately high rate. This would be hard to detect with our dataset, as personal information relating to different groups is largely redacted, but could be tackled with an outcome-oriented approach if our system was deployed in the real world.

Division of Labor

We will all work together on most parts of the project, since decisions such as preprocessing steps and model complexity will affect all of our models. It is possible that as we progress different people will end up taking ownership of different models.

Log in or sign up for Devpost to join the conversation.