Oct 20, 2022

Identifying Fraudulent Credit Card Transactions

Creating and training a machine learning
classifier to be able to accurately identify fraudulent
credit card transactions, as well as
supplemental analysis of fradulent charges

Scenario

A new credit card company has just entered the market in the western United States. The company is promoting itself as one of the safest credit cards to use. They have hired you as their data scientist in charge of identifying instances of fraud. The executive who hired you has have provided you with data on credit card transactions, including whether or not each transaction was fraudulent. The executive wants to know how accurately you can predict fraud using this data. She has stressed that the model should err on the side of caution: it is not a big problem to flag transactions as fraudulent when they aren't just to be safe. In your report, you will need to describe how well your model functions and how it adheres to these criteria.

Files:
Python Notebook
Loan Data CSV Trained Model

Objectives

Inspect dataset for most common aspects of fraudulent transactions
Prepare and clean data for use in a classifier
Train a machine learning model to be able to accurately predict whether or not a charge is fraudulent

Process

Data was acquired from an open source dataset (available here)
Data was split into various datasets, grouping it by different aspects in order to determine which aspects were most common in fraud
Data was analyzed and visualizations were created in order to explore aspects of code
Data was cleaned and prepared for usage in machine learning model
Model was trained in order to accurately detect whether or not a transaction was genuine or fraudulent

Analysis

Supplemental Analysis

Fraudulent Charges by Type

bar graph showing the amount of fraud charges per purchase type

From analyzing the dataset of all illegitimate charges in the dataset, I was able to see that the category of charges most responsible for fraud is the 'Grocery Point of Sale'(grocery_pos) category. Closely following the first category was online shopping (shopping_net).

Fraud by State

bar chart showing fraud charges by state

Analysis of the dataset showed that the state responsible for most charges was California. In my opinion, this isn't particularly impactful in detecting whether or not a charge was fraudulent, as the data seemed to mostly be from western American areas, and California is the most populous state in this region.

Classifier

Accuracy of Classifier

Upon training the classifier, which involved converting all data within the dataset to usable data types and testing multiple methods of classification, I believe I have achieved an acceptable level of accuracy. With tweaking of values for the training of the model, I was able to achieve an acceptable level of 98% correct identification of legitimate transactions, with a 73% accuracy in identifying fradulent transactions (see below for full confusion matrix).

Conclusions

With the ability to accurately detect an illegitimate charge with a 73% accuracy, this model would be useful as part of a system to be able to block fraudulent charges when they are made. Along with this accuracy in identifying fraud, the model will only return false positives in 2% of transactions. If used with a potential secondary system, this could definitely be a beneficial first step in protecting customers from credit card theft or fraud.