Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"

Fin Crime detection using autoencoders
Liubomyr Bregman
Richard Bobek
L'viv, Ukraine
3 Nov 2018
AI & Big Data Day

Lessons learnt
1. Skilled operators
2. Less opportunity for “insider” or “opportunistic” attack
3. Need for ‘out-of-band’ systems for notifications
PwC 2
Global Payment Fraud – lessons learnt and investigations highlights
Lessons learnt
1. Dedicated scenarios within FIs
2. Leverage the convergence between cyber, fraud and ML
3. Leverage advanced analytics – evolving threat
landscape
*BAE Systems
https://en.wikipedia.org/wiki/Bangladesh_Bank_robbery
If Hollywood releases another iteration of the 'Oceans 11' franchise, they should base it on the recent attack against
the Central Bank of Bangladesh (BB)*
Bangladesh cyber heist
The attackers attempted to steal $951m in 35 separate
fraudulent transactions. 30 orders (worth $850m) were stopped
by the US Fed, but 5 orders (worth $101m) went through.
A further $20m was blocked by a recipient bank in Sri Lanka
Vietnam Swift fraud attempt
Further analysis of the Bangladesh cyber heist, led to the
conclusion that the same attackers appear to have struck
previously, using similar tools written for targeting a bank in
Vietnam just a couple months before the Bangladesh attack.

There are many “creative” new strategies in fin crimes
Some of the known financial crime strategies:
Cheque fraud Credit card fraud Mortgage fraud Medical fraud Corporate fraud
Securities fraud
(including insider
trading)
Bank fraud Insurance fraud
Market
manipulation
Payment (point
of sale) fraud
Health care fraud Theft
Scams or
confidence tricks
Tax evasion Bribery
Embezzlement Identity theft Money laundering
Forgery and
counterfeiting
PwC 3

There are many fraud detection and AML software on the market
Market segment by Type, Financial Fraud Detection Software can be split into
Anti Money Laundering
Detection Software
Identity Theft Detection
Software
Credit/Debit Card Fraud
Detection Software
Others
Wire Transfer Fraud
Detection Software
PwC 4

Traditionally, fin crime is approached by reporting and expert knowledge
and assessment
The steps are usually:
* Historically those can be large financial abuse management systems, transaction monitoring systems, in-house development
scripts, etc..
Report of alerts is generated by rule decision engine*. This report is showing transactions / clients detected by
(usually) orthogonal rules.
Experts assess the alerts and decides on appropriate action. This can be for example investigation of the activities
of the client.
1
2
PwC 5

Most financial institutions struggle with similar problems in detecting financial
crime
Huge streams of data Scenarios are far from perfect Fraud schemas are developing
Investigations are costly Number of scenarios are limited and costly More data science is better
PwC 6

Machine learning approaches aim to increase the automation and recall of the
process
Rule based expert based Supervised (Investigation needed) Unsupervised
1. Optimal rules
1. Segmentations
2. Anomaly detection
3. Semi-Supervised approach
2. Deep learning approaches
a) Pattern discovery
a) Rule based Models Creation
b) Threshold optimizations
c) Rule optimization ways
d) Alert prioritization
PwC 7

The key problem is the unbalanced dataset and some terminology
0.1% True positive
and 99.9% False positives
Only 13 scenarios
Around 90 features
~12 segments
~700 threshold
600M transaction
2M
Alerts
2K SARs
PwC 8

More precise numbers from past projects in different banks
Alerting and escalation
Customers L1 (Alerts) Alert Rate (L1/Cust.) L2 (Cases) L1 to L2 Rate L3 (SAR-Rec) L2 to L3 Rate SAR Rate (SARs/Cust.)
Peer 1 55,000,000 320,000 0.58% 32,000 10% 6,400 20% 0.012%
Peer 2 4,500,000 60,500 1.34% 6,340 10% 3,340 53% 0.074%
Peer 3 9,900,000 148,000 1.49% 40,000 27% 670 2% 0.007%
Peer 4 40,000,000 50,000 0.13% 12,000 24% 375 3% 0.001%
Peer Average: 0.89% 17.88% 19.37%
Benchmarking
for alert
volumes
Benchmarking
for AML TM
investigations
Number FTE Annual spend Maturity (0 (low) to 3 (high))
Peer 1 12,000 $2bn 2+
Peer 2 5,000 $800m 2
Peer 3 10,000 $1.2bn 3
Peer 4 210 $50m 0-1
Peer 5 150 $125m 1-2
Peer 6 2,500 $300m 1-2
PwC 9

What is normality
Normal or not?
PwC 10

Anomaly
Sensitivity
Density
How does it work: Normality
Normality is a measure of concentration
separated from anomaly by sensitivity
threshold
Normal
Normal
PwC 11

How does it work: Abnormality of anomalies
How far from Normality?!
How far from other abnormalities?!
Abnormality of anomalies
Normal
Normal
PwC 12

How does it work: Similarity of anomalies
Anomaly cluster
Some anomalies are similar and create a
separate cluster
Investigation of one anomaly and finding a
fraud make other anomalies more probable to
be fraud
Normal
Normal
Similarity of anomalies
PwC 13

How does it work: Stability of normal and anomalous patterns
Anomaly cluster
When normality definition over time remains
”stable”, the analytical set is considered
“operational”
Normal
Normal
PwC 14

PwC 15
There are a lot of ways how to detect anomaly
Non parametric
• Density-based techniques (k-nearest neighbor, local outlier factor, and many more variations of this concept).
• Fuzzy logic-based outlier detection.
• Cluster analysis-based outlier detection.
Parametric
• Subspace- and correlation-based outlier detection for high-dimensional data..
• Bayesian Networks.
• Deviations from association rules and frequent item sets.
&more
• Ensemble techniques, using feature bagging, score normalization and different sources of diversity.

PwC 16
Non parametric: Density-based techniques

PwC 17
Parametric
Prediction
Error term is anomaly score

Autoencoders present powerful method for anomaly detection in financial
crimes
What is this?
Approach to training
Measure of quality
x H R
Input
(observation)
Internal representation
(neural network, hidden layer)
Output
(reconstruction)
f(x) g(x)
Target output = observed input:
H = f(x) R = g(x) = g(f(x)) = x
Loss function L(x, g(f(x))), e.g. RMSE
Traditionally used for anomaly detection & dimension reduction
PwC 19

PwC 20
Is it the same principle as compression?
No, compression usually has no loss
Compression is generic
Autoencoding is trained on specific cases
Original observation (Labrador
with brown collar)
Decoded observation (still dog)
Encoding Decoding
1 0 0 1 0
1 1 0 1 1
0 0 1 1 0
0 0 1
1 0 1
0 1 1

PwC 21
Which strategy should we apply?
Train on goods, predict anomality by loss
(difference between input and representation)
1
Train on all, assume that neural network will not
learn bads due to low number of observations
2
StrategiesTransactions
Goods
Bads
Unknown
Bads
1
2

HX R
Why do we need a model of g(f(x)) = x?
We do not
We need the internal representation H of x
With a deep H (multiple layers), autoencoder can approximate any mapping from X to R arbitrary well (Hinton & Salakhutdinor, 2006)
1
1 + e−(a1W1+a2W2+bias)
x1
x2
x3
bias
NEURON
a1W1
a2W2
PwC 22

How can I understand what’s happening inside?
Why do you want?
x1
x2
x3
Ok, then … We simulate:
x1 ∈ {min(x1): max(x1)}
x2 = x2
x3 = x3
PwC 23

So what do I get using this?
It learns only the probable inputs1
You can play with the loss function2
As a result we get a powerful anomaly detector
Autoencoder is able to learn the structure of manifold
Those combined force H to capture + information about the structure of the data generating distribution
Applying the expert knowledge
e.g.
L = n=1
k Wn x−xn 2
k
Wn = 0,5; 3;
,
PwC 24

PwC 25
How do we say in the end what is anomaly?
Input OutputHidden
X1
X2
X3
X1
X2
X3
Comparison of input & output
Bads
Goods
Anomality
treshold
RMSE
Classification
Using the final
layer of encoder
as input for the
classifier
1 2

26
Finally, we train and validate a classification algorithm to predict anomalies in
advance
Anomaly labeling with Autoencoders
Anomaly
Normality
RMSE
1
Boosted decision tree to predict failure and define
predictive rules4
Normality Normality
Anomaly
5 Validating the results
ROC
FP
TP
Time series2
measuredattributeX
time
Counter example Positive example
measuredattributeX
Slidingwindow
Slidingwindow
3
measuredattributeX
time
measuredattributeX
time
Translating the problem to classification

PwC 27
Case study Asian Bank:
Deep Neural Network was built for Anomaly detection
Neural network illustration Accuracy and loss of the resulting solution
12 nodes 32 nodes 8 nodes 8 nodes 32 nodes 12 nodes

PwC 28
Case study Asian Bank
Comparison of input & output of Neural Network
Anomaly and actual SAR
Anomaly but not a fraud
Sensitivity top 1%
Transactions
Final ROC curve results into 80% AUC vs
Prioritization is possible

PwC 29
Measuring of sensitivity of the autoencoder to input

This publication has been prepared for general guidance on matters of interest only, and does not constitute professional advice. You should not act upon the information contained in this publication
without obtaining specific professional advice. No representation or warranty (express or implied) is given as to the accuracy or completeness of the information contained in this publication, and, to the
extent permitted by law, PricewaterhouseCoopers Česká republika, s.r.o., its members, employees and agents do not accept or assume any liability, responsibility or duty of care for any consequences
of you or anyone else acting, or refraining to act, in reliance on the information contained in this publication or for any decision based on it.
© 2018 PricewaterhouseCoopers Česká republika, s.r.o. All rights reserved. “PwC” is the brand under which member firms of PricewaterhouseCoopers International Limited (PwCIL) operate and
provide services. Together, these firms form the PwC network. Each firm in the network is a separate legal entity and does not act as agent of PwCIL or any other member firm. PwCIL does not provide
any services to clients. PwCIL is not responsible or liable for the acts or omissions of any of its member firms nor can it control the exercise of their professional judgment or bind them in any way.
Thank you!

Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"

Ähnlich wie Liubomyr Bregman "Financial Crime Detection using Advanced Analytics" (20)

Mehr von Lviv Startup Club

Mehr von Lviv Startup Club (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"