Fraud deep learning_v2

Deep Learning & Fraud Detection in Fintech Lending
DEEP LEARNING IN FINANCE SUMMIT, SINGAPORE- APRIL’17
RATNAKAR PANDEY, HEAD OF INDIA ANALYTICS & DATA SCIENCE, KABBAGE
HTTPS://WWW.LINKEDIN.COM/IN/RATNAKARPANDEY/
1 Disclaimer: The views expressed here are solely those of the presenter in his private capacity.

Outline
Demo of Multi Level Perceptron (MLP)
Classification Case Approach and Performance
Suggested Deep Learning Application Areas
Anomaly Detection Social Network Analysis (SNA)
Need for Deep Learning
Existing Methods Why Deep Learning?
Frauds in Fintech Lending
Drivers Modus Operandi
Introduction
About Fintech About Kabbage

About Fintech Fintech is an integral part of our life now
$24.7 B
Invested in 2016 in
global fintech companies
1076
Deals in 2016 in
global fintech companies
Sources: KPMG, The Pulse of Fintech Q4 2016 | Capgemini World Fintech Report 2017 | PwC Global Fintech Report 2017 | www.forbes.com
50.2%
Of global customers have
done business with fintech
20%
Expected ROI on
fintech projects
20+
Global fintech
Unicorns
10K+
Global fintech
companies
Types of
Fintech
Alternative Lending- Kabbage, Lendingclub, Prosper, Zopa
Payment / Billing Tech - Stripe, Paytm, Adyen, Ant Financial,
Square
Personal Finance / Asset Management Creditkarma, Bankrate,
NerdWallet
Robo Advisory- Wealthfront, Betterment, NerdWallet
Blockchain- Abra, 21, coinbase, Ethereum

About Kabbage Kabbage is blazing a trail in big data & fintech
Kabbage is more than a lender for small businesses; our data and technology
platform is now being used as a fully branded product by other lenders, and our
products are expanding. We’ve received numerous awards & recognition,
including-
• Featured in Glassdoor’s 2017 Best Places to Work
• 36th fastest growing company in the US on the INC 500 List
• Fast Company’s "Top 10 most innovative companies in finance”
• Forbes' "America’s Top 100 Most Promising Companies"

Fraud Drivers Superfast decision making and faceless channels
Decisioning within few minutes
Application on web and Mobile
May have higher exposure to thin
file and new to credit
More prone to invisible window
applications
Unconventional and evolving data
sources
Note: Even with these challenges the fraud rate in the industry is typically less than 20 bps for more data savvy lenders

M. O. Varied and evolving modus operandi
• Stolen identity
• Synthetic identity
• May replicate best
customer (prime
and super prime)
• Falsified info
• No willingness to
pay
• Acquire multiple loans
in a short window (
invisible window)
• May provide all info
correctly
• More likely to be on
higher side in the risk
spectrum
• No or low willingness to
pay
• Mimic good payment
behavior for significant
time
• Bust out when gains are
highest

Current Situation Heuristics and regression driven approaches
Intuitive
Heuristics
Statistical
• Manual Reviews
• Experts Driven
• Gut feeling
• Thumb rules
• Driven by past experience
• Quick decision making
• Control/ confidence limits
• Outlier detection/ deviation from norm
• Decision tree, regression, time series

Why go Deep? Explosion of features and data sources
10,000 +
Features
Unstructured
Transactional
Social
Device
&
IP
Third Parties
Bureau
• Uncover hard to detect patterns (using
traditional techniques) when the
incidence rate is low
• Find latent features (super variables)
without significant manual feature
engineering
• Real time fraud detection and self
learning models using streaming data
(KAFKA, MapR)
• Ensure consistent customer
experience and regulatory compliance
• Higher operational efficiency

Find Anomalies Replicator neural network / Autoencoder
• Traditional techniques based on density or distance works better with linearly separable
data
• Stacked Autoencoders (SAE) and Deep Belief Networks ( DBN) make no assumptions
about the distribution of data and work better on non linearly separable data
• Unsupervised learning algorithms for feature learning, feature reduction and outlier
detection
• Input vectors are used as output vectors and reconstruction error computed
• The data points with higher reconstruction error ( MSE) are more likely to be outliers
• Helps in detecting different modus operandi of fraudsters
• Output from the network is generally used as an input for the Multi Layer Perceptron (
MLP) to improve classification accuracy .
• Generally training an MLP with the features selected from Deep Autoencoders will be
more efficient and faster process

Find Networks Social network analysis / Clique and Links
Detect
Fraudulent
Cases
Find
Commonalities
Form Network
• Use variety of attributes (on-us/ off-us) to build linkage between known bad customers
and other customers with unknown status
• Larger the size of network, easier the detection and vice versa
• Overlap networks using enumerative approaches and find commonalities
• Use graph transduction to detect potential fraudulent cases by doing peer group
(archetype) analysis to separate routine behavior from suspicious behavior - “birds of
same feather flock together”

MLP Demo Case details
• Anonymized credit card transactions data from European customers
• 30 features ( 28 anonymized, duration elapsed, amount of transactions)
• Label- fraud or normal transaction
• 17bps incidence rate for fraudulent transactions
• 284,807 total transaction in data
Sources: http://mlg.ulb.ac.be | https://www.kaggle.com/dalpozz/creditcardfraud

MLP Demo Tools and techniques used
Python
2.7.13
Keras
2.0.2
Tensorflow
1.0.1

MLP Demo Traditional Modeling Techniques Process
Manual Feature
Engineering
After variable
treatments
drop variables
with little or no
explaining
power- WOE,
IV, Distribution
Look at WOE to
create bins etc.
WOEDensity Dist.

MLP Demo Network training
Little or No Manual Feature Engineering
• No over or under sampling
• No variables dropped
• Only standardization of features done
• 75% training/ 25% validation
• No manual binning
Fitted Network
• Multi Layer Perceptron with three hidden layers.
o Activation function = Sigmoid
o # of neurons = 512 in the input layer
o Each consequent layer has half the neurons
o Cost function = logloss
o Optimizer = adam
o Epochs= 5
o Dropout rate = 30%

MLP Demo Performance summary
Metric Value
Accuracy Score 99.9%
Logloss 0.003
Precision Score 77%
Recall Score 75%
Area Under the
Curve (AUC)
87.4%
FScore 76.5%

MLP Demo Fine-tuning model- Hyperparameters optimization
• Epochs = [5,10,15,20,25…]
• Batch Size = [5,10,20,30,40…]
• Optimizer= [‘SGD’, ’Adam’, ’RMSprop’…]
• Learning Rate = [0.01,0.05,0.1,0.2…]
• Momentum = [0.2,0.4,0.6,…]
• Weights Initiation= [‘Uniform’, ‘Normal’, …]
• Activation Function= [‘relu’,’sigmoid’, ‘tanh’, ‘softmax’,…]
• Drop-out rate= [0.0,0.2,0.4,0.5,…]
• Neurons= [5,10,20,30,40…]
Python scikit-learn gridsearch function, design of experiment( screening design,
fractional designs) needs to be combined with intutition and expertise to come
out with the best network!

Fraud deep learning_v2

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Fraud deep learning_v2

Ähnlich wie Fraud deep learning_v2 (20)

Mehr von Ratnakar Pandey

Mehr von Ratnakar Pandey (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Fraud deep learning_v2