2. Outline
Why detect fraud – Is there a problem?
Why an intelligent system?
How we built one
2 of 22
3. Show me some numbers
What was the value of all electronic
transactions globally for year 2012?
$17 trillion (with a T)
This includes all credit, debit and pre-paid
cards used in both online and offline (card
present) scenarios for purchases and cash
withdrawals
4. More Numbers
How much of $17T was lost due to FRAUD?
$8 billion in 2012, > $10 billion by 2015
Fraud rate of 0.05% – Not too bad right?
Wrong !!
5. Getting specific
Reminder - 0.05% ratio is for all transactions
including face to face transactions
The fraud rate is a much more scary 3.5% for
Online transactions aka CNP
Global e-Commerce is expected to exceed $1T in
2013 –> $3.5B will be lost due to fraud
Add to this, the erosion due to loss of future
business from impacted customers
Big Customer Impact ! Big deal for us !!
6. The Big Fight...
Fraud to transaction ratio has been constant over
the past 10 years
This ratio should not lull us into a false sense of
security – bigger numbers are at stake and
increasing as volumes grow
The crooks LOVE e-Commerce (think 3.5%)
How do we then figure out if a transaction is
genuine or a victim of fraud
Intelligently of course! - ENTER FRANK !!
9. Rule based system
Rules on various signals
Num of transaction from a card in last one day
Transaction amount
and many more
Thresholds are hand crafted
Fraud Score = sum of individual scores
9 of 22
10. Need for Smarter system
Too much data for manual analysis
Businesses are evolving
Fraudsters are evolving
Extending to really high dimension – pushing
beyond limits of rule based system
10 of 22
11. Designing Frank
Labeled data missing
Observation
Very few fraud records
When you see one, you can identify one
Social behavior
11 of 22
19. Clustering for detecting fraud
Cluster the data using density based clustering
For new point find distance to all the existing
clusters
If there exists min-pts with epsilon dist in a
cluster, new point belongs to this cluster
If doesn't belong to any cluster -> fraud
19 of 22
20. Computing fraud probability
We find nearest cluster
Convert the distance to probability
using chi-square distribution
Probability of fraud between 0 and 1
20 of 22
Frank Abagnale Jr. First fraud before age of 16 Bank fraud, airline pilot, doctor and so on. 7-8 identities, fraud in 16 countries and got away from police custody twice, once from airplane. All this before age of 21. He was jailed for 5 years and helped FBI and then started his own company to help fraud prevention.