What it does
Various predictive models were used to see how accurate they are in detecting whether a transaction has been detected as no fraud or a fraud. Therefore, we can still analyze some important aspects of the dataset.
Goal of this project:
1) Understanding the data 2) Create a half percentage of sub-dataframe ratio of "Fraud" and "Non-Fraud" transactions. 3) Determine which classifiers with the highest accuracy. 4) Create a Neural Network algorithm and compare the accuracy to our best classifier. 5) Understand how the imbalanced datasets affects.
Challenges we ran into
Outlier Removal Tradeoff -> We have to be careful as to how far do we want the threshold for removing outliers. We determine the threshold by multiplying a number (ex: 1.5) by the (Interquartile Range). The higher this threshold is, the less outliers will detect (multiplying by a higher number ex: 3), and the lower this threshold is the more outliers it will detect.
Accomplishments that we're proud of
The results from the algorithms were compromising and did not have any issues running the query.
What we learned
Implementing SMOTE on our imbalanced dataset helped us with the imbalance of our labels (more no fraud than fraud transactions). Nevertheless, I still have to state that sometimes the neural network on the oversampled dataset predicts less correct fraud transactions than our model using the under sample dataset. However, remember that the removal of outliers was implemented only on the random under sample dataset and not on the oversampled one. Also, in our under sample data our model is unable to detect for a large number of cases non fraud transactions correctly and instead, misclassifies those non fraud transactions as fraud cases. Imagine that people that were making regular purchases got their card blocked due to the reason that our model classified that transaction as a fraud transaction, this will be a huge disadvantage for the financial institution. The number of customer complaints and customer not being satisfied will increase.
What's next for Credit card Fraud Detection
The next step of this analysis will be to do an outlier removal on our oversample dataset and see if our accuracy in the test set improves.
Log in or sign up for Devpost to join the conversation.