Worldwide, billions of euros are lost every year due to credit card fraud. Increasingly, fraud has diversified to different digital channels, including mobile and online payments, creating new challenges as innovative new fraud patterns emerge. Hence, it remains challenging to find effective methods of mitigating fraud. Existing solutions include simple if-then rules and classical machine learning algorithms. Credit card fraud is by definition an example-dependent and cost-sensitive classification problem, in which the costs due to is classification vary between examples and not only within classes, i.e., misclassifying a fraudulent transaction may have a financial impact ranging from a few to thousands of euros. In this paper, we propose an extension to the cost-sensitive decision trees algorithm, by creating an ensemble of such trees, and combining them using a stacking approach with a cost-sensitive logistic regression. We compare our method with standard machine learning algorithms and state-of-the-art cost-sensitive classification methods using a real credit card fraud dataset provided by a large European card processing company. The results show that our method achieves savings of up to 73.3%, more than 2 percentage points more than a single cost-sensitive decision tree.
Fraud Detection by Stacking Cost-Sensitive Decision Trees
1. Fraud Detection by
Stacking Cost-Sensitive
Decision Trees
Alejandro Correa Bahnsen, PhD
Chief Data Scientist & Head of Research
acorrea@easysol.net
2. Who am I?
Chief Data Scientist at Easy Solutions
Industrial Engineer
PhD in Machine Learning from Luxembourg University
Scikit-Learn contributor
Organizer of Science Bogota Meetups
2
3. AboutEasySolutions®
3
A leading global provider of electronic fraud
prevention for financial institutions and
enterprise customers
430+ customers
In 30 countries
115 million
Users protected
30 billion
Online connections monitored
Industry recognition
5. 5
Risk Based Authentication Phishing URL Classification Phishing Brand ID
Fraud Detection
19h risk = 10
9h risk = 95
HTML Injection Biometrics
6. Research/ DataScienceSpectrum
6
• Maybe someday, someone can use this
Basic
Research
• I might be able to use this
Applied
Research
• I can use this (sometimes)
Working
Prototype
• Software engineers can use thisQuality Code
• People can use this
Tool or
Service
Innovation
practicality
8. Estimate the probability of a transaction being fraud based on analyzing
customer patterns and recent fraudulent behavior
Issues when constructing a fraud detection system:
• Skewness of the data
• Cost-sensitivity
• Short time response of the system
• Dimensionality of the search space
• Feature preprocessing
• Model selection
Creditcardfrauddetection
8
10. • Larger European card processing
company
• 2012 & 2013 card present
transactions
• 20MM Transactions
• 40,000 Frauds
• 0.467% Fraud rate
• ~ 2MM EUR lost due to fraud on
test dataset
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
Test
Train
Data
11. Raw features
11
Attribute name Description
Transaction ID Transaction identification number
Time Date and time of the transaction
Account number Identification number of the customer
Card number Identification of the credit card
Transaction type ie. Internet, ATM, POS, ...
Entry mode ie. Chip and pin, magnetic stripe, ...
Amount Amount of the transaction in Euros
Merchant code Identification of the merchant type
Merchant group Merchant group identification
Country Country of trx
Country 2 Country of residence
Type of card ie. Visa debit, Mastercard, American Express...
Gender Gender of the card holder
Age Card holder age
Bank Issuer bank of the card
Features
12. Credit card fraud detection is a cost-sensitive problem. As the cost due to a
false positive is different than the cost of a false negative.
• False positives: When predicting a transaction as fraudulent, when in
fact it is not a fraud, there is an administrative cost that is incurred by
the financial institution.
• False negatives: Failing to detect a fraud, the amount of that transaction
is lost.
Moreover, it is not enough to assume a constant cost difference between
false positives and false negatives, as the amount of the transactions varies
quite significantly.
12
FinancialEvaluation
18. Decision model based on quantifying tradeoffs between various decisions
using probabilities and the costs that accompany such decisions
Risk of classification
𝑅 𝑐𝑖 = 0|𝑥𝑖 = 𝐶 𝑇𝑁 𝑖 1 − 𝑝𝑖 + 𝐶 𝐹𝑁 𝑖 ∙ 𝑝𝑖
𝑅 𝑐𝑖 = 1|𝑥𝑖 = 𝐶 𝐹𝑃 𝑖 1 − 𝑝𝑖 + 𝐶 𝑇𝑃 𝑖 ∙ 𝑝𝑖
Using the different risks the prediction is made based on the following
condition:
𝑐𝑖 =
0 𝑅 𝑐𝑖 = 0|𝑥𝑖 ≤ 𝑅 𝑐𝑖 = 1|𝑥𝑖
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
18
BayesMinimumRisk
19. • Logistic Regression Model
• Cost Function
• Cost Analysis
Cost-Sensitive- LogisticRegression
20. • Actual Costs
• Cost-Sensitive Function
Cost-Sensitive- LogisticRegression
21. 21
Proposed Cost based impurity measure
𝑆 𝑙
= 𝑥|𝑥𝑖 ∈ 𝑆 ∧ 𝑥𝑖
𝑗
≤ 𝑙 𝑚
𝑗
𝑆 𝑟
= 𝑥|𝑥𝑖 ∈ 𝑆 ∧ 𝑥𝑖
𝑗
> 𝑙 𝑚
𝑗
• The impurity of each leaf is calculated using:
𝐼𝑐 𝑆 = min 𝐶𝑜𝑠𝑡 𝑓0 𝑆 , 𝐶𝑜𝑠𝑡 𝑓1 𝑆
𝑓 𝑆 =
0 𝑖𝑓 𝐶𝑜𝑠𝑡 𝑓0 𝑆 ≤ 𝐶𝑜𝑠𝑡 𝑓1 𝑆
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• Afterwards the gain of applying a given rule to the set 𝑆 is:
𝐺𝑎𝑖𝑛 𝑐 𝑥 𝑗, 𝑙 𝑚
𝑗
= 𝐼𝑐 𝜋1 − 𝐼𝑐 𝜋1
𝑙
+ 𝐼𝑐 𝜋1
𝑟
S
S S
𝑥 𝑗, 𝑙 𝑚
𝑗
Cost-SensitiveDecisionTrees
22. 22
Decision trees construction
• The rule that maximizes the gain is selected
𝑏𝑒𝑠𝑡 𝑥, 𝑏𝑒𝑠𝑡𝑙 = 𝑎𝑟𝑔 max
𝑗,𝑚
𝐺𝑎𝑖𝑛 𝑥 𝑗, 𝑙 𝑚
𝑗
S
S S
S S S S
S S S S
• The process is repeated until a stopping criteria is met:
Cost-SensitiveDecisionTrees
23. 23
Proposed cost-sensitive pruning criteria
• Calculation of the Tree savings and pruned Tree savings
S
S S
S S S S
S S S S
𝑃𝐶𝑐 =
𝐶𝑜𝑠𝑡 𝑓 𝑆, 𝑇𝑟𝑒𝑒 − 𝐶𝑜𝑠𝑡 𝑓 𝑆, 𝐸𝐵 𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑛𝑐ℎ
𝑇𝑟𝑒𝑒 − 𝐸𝐵 𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑛𝑐ℎ
• After calculating the pruning criteria for all possible trees. The maximum
improvement is selected and the Tree is pruned.
• Later the process is repeated until there is no further improvement.
S
S S
S S S S
S S
S
S S
S S
Cost-SensitiveDecisionTrees
24. Typical ensemble is made by combining T different base classifiers. Each
base classifiers is trained by applying algorithm M in a random subset
24
𝑀𝑗 ← 𝑀 𝑆𝑗 ∀𝑗 ∈ 1, … , 𝑇
EnsembleCost-SensitiveDecisionTrees
26. After the base classifiers are constructed they are typically combined using
one of the following methods:
• Majority voting
𝐻 𝑆 = 𝑓𝑚𝑣 𝑆, 𝑀 = 𝑎𝑟𝑔 max
𝑐∈ 0,1
𝑗=1
𝑇
1 𝑐 𝑀𝑗 𝑆
26
EnsembleCost-SensitiveDecisionTrees
31. • New framework for stacking of example
dependent cost-sensitive decision trees
• Models should be evaluated taking into
account real financial costs of the application
• Algorithms should be developed to
incorporate those financial costs
Conclusions
31
32. Any questions or comments, please let me know.
Alejandro Correa Bahnsen, PhD
Chief Data Scientist & Head of Research
acorrea@easysol.net
Thank you!