Falcon Invoice Discounting: Empowering Your Business Growth
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
1. Fin Crime detection using autoencoders
Liubomyr Bregman
Richard Bobek
L'viv, Ukraine
3 Nov 2018
AI & Big Data Day
2. Lessons learnt
1. Skilled operators
2. Less opportunity for “insider” or “opportunistic” attack
3. Need for ‘out-of-band’ systems for notifications
PwC 2
Global Payment Fraud – lessons learnt and investigations highlights
Lessons learnt
1. Dedicated scenarios within FIs
2. Leverage the convergence between cyber, fraud and ML
3. Leverage advanced analytics – evolving threat
landscape
*BAE Systems
https://en.wikipedia.org/wiki/Bangladesh_Bank_robbery
If Hollywood releases another iteration of the 'Oceans 11' franchise, they should base it on the recent attack against
the Central Bank of Bangladesh (BB)*
Bangladesh cyber heist
The attackers attempted to steal $951m in 35 separate
fraudulent transactions. 30 orders (worth $850m) were stopped
by the US Fed, but 5 orders (worth $101m) went through.
A further $20m was blocked by a recipient bank in Sri Lanka
Vietnam Swift fraud attempt
Further analysis of the Bangladesh cyber heist, led to the
conclusion that the same attackers appear to have struck
previously, using similar tools written for targeting a bank in
Vietnam just a couple months before the Bangladesh attack.
3. There are many “creative” new strategies in fin crimes
Some of the known financial crime strategies:
Cheque fraud Credit card fraud Mortgage fraud Medical fraud Corporate fraud
Securities fraud
(including insider
trading)
Bank fraud Insurance fraud
Market
manipulation
Payment (point
of sale) fraud
Health care fraud Theft
Scams or
confidence tricks
Tax evasion Bribery
Embezzlement Identity theft Money laundering
Forgery and
counterfeiting
PwC 3
4. There are many fraud detection and AML software on the market
Market segment by Type, Financial Fraud Detection Software can be split into
Anti Money Laundering
Detection Software
Identity Theft Detection
Software
Credit/Debit Card Fraud
Detection Software
Others
Wire Transfer Fraud
Detection Software
PwC 4
5. Traditionally, fin crime is approached by reporting and expert knowledge
and assessment
The steps are usually:
* Historically those can be large financial abuse management systems, transaction monitoring systems, in-house development
scripts, etc..
Report of alerts is generated by rule decision engine*. This report is showing transactions / clients detected by
(usually) orthogonal rules.
Experts assess the alerts and decides on appropriate action. This can be for example investigation of the activities
of the client.
1
2
PwC 5
6. Most financial institutions struggle with similar problems in detecting financial
crime
Huge streams of data Scenarios are far from perfect Fraud schemas are developing
Investigations are costly Number of scenarios are limited and costly More data science is better
PwC 6
7. Machine learning approaches aim to increase the automation and recall of the
process
Rule based expert based Supervised (Investigation needed) Unsupervised
1. Optimal rules
1. Segmentations
2. Anomaly detection
3. Semi-Supervised approach
2. Deep learning approaches
a) Pattern discovery
a) Rule based Models Creation
b) Threshold optimizations
c) Rule optimization ways
d) Alert prioritization
PwC 7
8. The key problem is the unbalanced dataset and some terminology
0.1% True positive
and 99.9% False positives
Only 13 scenarios
Around 90 features
~12 segments
~700 threshold
600M transaction
2M
Alerts
2K SARs
PwC 8
9. More precise numbers from past projects in different banks
Alerting and escalation
Customers L1 (Alerts) Alert Rate (L1/Cust.) L2 (Cases) L1 to L2 Rate L3 (SAR-Rec) L2 to L3 Rate SAR Rate (SARs/Cust.)
Peer 1 55,000,000 320,000 0.58% 32,000 10% 6,400 20% 0.012%
Peer 2 4,500,000 60,500 1.34% 6,340 10% 3,340 53% 0.074%
Peer 3 9,900,000 148,000 1.49% 40,000 27% 670 2% 0.007%
Peer 4 40,000,000 50,000 0.13% 12,000 24% 375 3% 0.001%
Peer Average: 0.89% 17.88% 19.37%
Benchmarking
for alert
volumes
Benchmarking
for AML TM
investigations
Number FTE Annual spend Maturity (0 (low) to 3 (high))
Peer 1 12,000 $2bn 2+
Peer 2 5,000 $800m 2
Peer 3 10,000 $1.2bn 3
Peer 4 210 $50m 0-1
Peer 5 150 $125m 1-2
Peer 6 2,500 $300m 1-2
PwC 9
11. Anomaly
Sensitivity
Density
How does it work: Normality
Normality is a measure of concentration
separated from anomaly by sensitivity
threshold
Normal
Normal
PwC 11
12. How does it work: Abnormality of anomalies
How far from Normality?!
How far from other abnormalities?!
Abnormality of anomalies
Normal
Normal
PwC 12
13. How does it work: Similarity of anomalies
Anomaly cluster
Some anomalies are similar and create a
separate cluster
Investigation of one anomaly and finding a
fraud make other anomalies more probable to
be fraud
Normal
Normal
Similarity of anomalies
PwC 13
14. How does it work: Stability of normal and anomalous patterns
Anomaly cluster
When normality definition over time remains
”stable”, the analytical set is considered
“operational”
Normal
Normal
PwC 14
15. PwC 15
There are a lot of ways how to detect anomaly
Non parametric
• Density-based techniques (k-nearest neighbor, local outlier factor, and many more variations of this concept).
• Fuzzy logic-based outlier detection.
• Cluster analysis-based outlier detection.
Parametric
• Subspace- and correlation-based outlier detection for high-dimensional data..
• Bayesian Networks.
• Deviations from association rules and frequent item sets.
&more
• Ensemble techniques, using feature bagging, score normalization and different sources of diversity.
19. Autoencoders present powerful method for anomaly detection in financial
crimes
What is this?
Approach to training
Measure of quality
x H R
Input
(observation)
Internal representation
(neural network, hidden layer)
Output
(reconstruction)
f(x) g(x)
Target output = observed input:
H = f(x) R = g(x) = g(f(x)) = x
Loss function L(x, g(f(x))), e.g. RMSE
Traditionally used for anomaly detection & dimension reduction
PwC 19
20. PwC 20
Is it the same principle as compression?
No, compression usually has no loss
Compression is generic
Autoencoding is trained on specific cases
Original observation (Labrador
with brown collar)
Decoded observation (still dog)
Encoding Decoding
1 0 0 1 0
1 1 0 1 1
0 0 1 1 0
0 0 1
1 0 1
0 1 1
21. PwC 21
Which strategy should we apply?
Train on goods, predict anomality by loss
(difference between input and representation)
1
Train on all, assume that neural network will not
learn bads due to low number of observations
2
StrategiesTransactions
Goods
Bads
Unknown
Bads
1
2
22. HX R
Why do we need a model of g(f(x)) = x?
We do not
We need the internal representation H of x
With a deep H (multiple layers), autoencoder can approximate any mapping from X to R arbitrary well (Hinton & Salakhutdinor, 2006)
1
1 + e−(a1W1+a2W2+bias)
x1
x2
x3
bias
NEURON
a1W1
a2W2
PwC 22
23. How can I understand what’s happening inside?
Why do you want?
x1
x2
x3
Ok, then … We simulate:
x1 ∈ {min(x1): max(x1)}
x2 = x2
x3 = x3
PwC 23
24. So what do I get using this?
It learns only the probable inputs1
You can play with the loss function2
As a result we get a powerful anomaly detector
Autoencoder is able to learn the structure of manifold
Those combined force H to capture + information about the structure of the data generating distribution
Applying the expert knowledge
e.g.
L = n=1
k Wn x−xn 2
k
Wn = 0,5; 3;
,
PwC 24
25. PwC 25
How do we say in the end what is anomaly?
Input OutputHidden
X1
X2
X3
X1
X2
X3
Comparison of input & output
Bads
Goods
Anomality
treshold
RMSE
Classification
Using the final
layer of encoder
as input for the
classifier
1 2
26. 26
Finally, we train and validate a classification algorithm to predict anomalies in
advance
Anomaly labeling with Autoencoders
Anomaly
Normality
RMSE
1
Boosted decision tree to predict failure and define
predictive rules4
Normality Normality
Anomaly
5 Validating the results
ROC
FP
TP
Time series2
measuredattributeX
time
Counter example Positive example
measuredattributeX
Slidingwindow
Slidingwindow
3
measuredattributeX
time
measuredattributeX
time
Translating the problem to classification
27. PwC 27
Case study Asian Bank:
Deep Neural Network was built for Anomaly detection
Neural network illustration Accuracy and loss of the resulting solution
12 nodes 32 nodes 8 nodes 8 nodes 32 nodes 12 nodes
28. PwC 28
Case study Asian Bank
Comparison of input & output of Neural Network
Anomaly and actual SAR
Anomaly but not a fraud
Sensitivity top 1%
Transactions
Final ROC curve results into 80% AUC vs
Prioritization is possible