Machine Learning Fraud Detection

Inspiration

With the rising prevalence of financial fraud in online transactions, it became critical to completely design a solution that employs Machine Learning to detect suspicious patterns and prevent fraud. This initiative was driven by the need to safeguard users and companies from financial losses as a result of fraudulent actions.

What it does

The Machine Learning Fraud Detection system uses past transaction data from a simulator named PaySim, which utilizes aggregated data from actual financial logs of a mobile money service (https://www.kaggle.com/datasets/sriharshaeedala/financial-fraud-detection-dataset) to forecast whether a transaction is fraudulent or lawful. It employs a Random Forest classifier and tailored features to boost fraud detection accuracy.

How we built it

The project was created using Python and packages such as 'scikit-learn', 'pandas', and 'imbalanced-learn'. To assure the model's performance, we tried to preprocess the data using SMOTE to address class imbalance, created new features (such as balance change ratios), and trained it using cross-validation methods.

Challenges we ran into

We encountered challenges with class imbalance, where non-fraudulent transactions vastly outnumbered fraudulent ones, causing the model to overfit to the majority class. Balancing the dataset and making use of feature engineering to enhance recall for fraud detection was one of the biggest challenges.

Accomplishments that we're proud of

We observed class imbalance issues, with non-fraudulent transactions far outnumbering fraudulent ones, forcing the model to overfit to the dominant class. Balancing the dataset and tweaking the model's hyperparameters to improve recall for fraud detection was one of the most difficult tasks, but one that we overcome. We are also proud of all the efforts made on developing the MySQL Database, UI using Streamlit and the Flask API.

What we learned

This project gave us a thorough grasp of feature engineering, class imbalance handling, and model tweaking. We also heard about machine learning's vital role in protecting financial systems from fraud.

What's next for Machine Learning Fraud Detection

Moving forward, we plan to explore other machine learning algorithms like Gradient Boosting or Autoencoders for anomaly detection. Additionally, we'll focus on further enhancing the model's robustness and applying it in a live environment for real-time transaction monitoring.