2. Overview
Protecting customers on an open
platform
Big data + Little loops enable
automation via analytics
Decisions as defenses
Putting your data to work
6. The Better Mousetrap
Automates defensive action x-platform
- Fast
- Accurate
- Cheap
IN REAL TIME
IN TIME TO MINIMIZE LOSS
REASONABLE FALSE
POSITIVES
AS GOOD AS A HUMAN
SPECIALIST
REDUCES MORE LOSS THAN COST CREATED
CHEAPER THAN MANUAL
INTERVENTION
BIG DATA &
LITTLE LOOPS
10. APPLIED RISK ANALYTICS
Use of technology, data, research &
statistics to solve problems
associated with losses or costs due to
security vulnerabilities / gaps in a system
-- resulting in the deployment of optimized
detection, prevention, or response capabilities.
14. Such as...
Metrics Analytics
$ Loss Txns
Purchase trends of high
loss users
# Compromised Accts
IP Sources of bad login
attempts
% of Spam Messages
Delivered
Spam subject lines
generating most clicks
Minutes of downtime Most process-intensive
applications
# Customer Contacts
Generated
Highest-contact
exception flows
17. Applied where?
Where risks manifest in observable
behavior
Where system owners make
decisions
Where controls can be optimized by
better recognizing identity, intent, or
change
19. BIG DATA &
LITTLE LOOPS
Why are you picking
on me?Boo-yah! Still
getting away
with it.
<Sigh>
Nobody
understands me.
20. Such as...
Populations
- Users, Transactions, Messages, Packets, API calls,
Files
Actions
- Allow, Block, Challenge, Review, Retry, Quarantine,
Add privileges, Upgrade privileges, Make Offer
Costs
- Fraud, Data leakage, Customer churn, Customer
contacts, Downstream liability
21. Applying Decisions
Risk management is
decision management
ACTOR
ATTEMPTS
ACTION
SUBMIT
WHAT IS THE
REQUEST
HOW TO
HONOR THE
REQUEST
SHOULD WE
HONOR?
RESULT
ACTION
OCCURS
22. For example:
ACTOR
ATTEMPTS
PAYMENT
p (actor attempting
payment is
accountholder)
Decision
Authorize
Review
Refer
Request
Authentication
Decline
f(variable A + Variable B + ...)
SUBMIT
23. Flavors of Risk Models
I deviate significantly
from a normal (good)
pattern
I summarize a known
bad pattern
fa(x), fb(x), fc(x) fq(x), fr(x), fs(x)
26. Study history...
User IP Country
<> Billing Country
Buying prepaid
mobile phones
Add new shipping
address in cart
However
Buyer =
Phone reseller,
static machine
ID
How much $$ is
at risk?
What is “normal”
for this
customer?
What “bad”
profiles does this
match?
27. SHALL WE PLAY A GAME?
(SINCE WE CAN’T PLAY “CLUE” FOR EVERY LOGIN
TRANSACTION
NEW USER
MESSAGE
FRIEND REQUEST
ATTACHMENT
PACKET
WINK
POKE
CLICK
WE BUILD RISK MODELS)
28. Model Development Process
Target -> Yes/No questions best
Find Data, Variable Creation -> Best part
Data Prep -> Worst part
Model Training -> Pick an algorithm
Assessment -> Catch vs FP rate
Deployment -> Decisioning vs Detection
29. User IP Country
<> Billing Country
Buying prepaid
mobile phones
Add new shipping
address in cart
Buyer =
Phone reseller,
static machine
ID
How much $$ is at risk?
What is “normal” for this customer?
What “bad” profiles does this match?
GEOLOCATE
IP
CONVERT GEO
TO COUNTRY
CODE
FLAG ON
MISMATCH
CART
CATEGORY
MERCH
RISK
LEVEL
DATE ADDED
ADDRESS
TYPE
STRING
MATCHING
CUSTOMER
PROFILE
DEVICE ID
DEVICE
HISTORYTXN-$-AMT
CHURN RISK, CLV,
TXNS, LOGINS,
STOLEN CC,
30. Model Training
Some algorithms:
- Regression: Determines the best equation describe
relationship between control variable and independent
variables
Linear Regression: Best equation is a line
Logistic Regression: Best equation is a curve (exponential
properties)
- Bayesian: Used to estimate regression models, useful
when working w/small data sets
- Neural Nets: Can approximate any type of non-linear
function, often highly predictive, but doesn’t explain the
relationship between control and independent variables
32. P-VALUE OF SIGNIFICANCE,
THROW OUT IF > .05
VARIANCE IN DEPENDENT
VARIABLE EXPLAINED BY
INDEPENDENT VARIABLES
DEPENDENT
VARIABLE
INDEPENDENT
VARIABLES
FACTOR ODDS OF
DEPENDENT GO UP WHEN
INDEPENDENT VAR
INCREMENTED
P-VALUE SHOULD
BE < SIGNIFICANCE
LEVEL (.05)
33.
34. GAIN
More gain/lift = more efficient predictions
Catch as much as possible (as much of the “bads”)
Minimize the overall affected
36. And now an example
Everyone loves a good 419 scam
37. 419 example: the 411
Trigger
- Contact receives 419 from a (free) business email
account, who contacts victim OOB
Backtrack
- Password was changed (user had to go through
reset process)
- Contacts, inbox, outbox deleted
- Nigerian IP login
Elaboration
- “Reply-to”: changed an “i” to an “l” (same ISP)
- Only takes Western Union
38. 419 example: with love, from Abuja
What is the question?
- p(ATO)
- p(Spam:scam)
- p(Fake acct creation)
What are our available answer/action
sets?
What else can we do to detect/mitigate?
39. 419 example: Reducing 911s
Variables
- “New” session variables: New login IP, new login IP country, new
cookie/machine ID
- “Change” account variables: Change password, change secondary
email, change name, change public profile
- “New” activity variables: Send to all contacts, # of accounts in “cc”
or “bcc”, Edit/delete contacts en masse
- Association variables: New recipients, New “reply-to” fields,
“Similar” accounts created/associated (fuzzy=more difficult)
User empowerment
- Stronger password reset options (SMS)
- Transparency: Other current sessions, past session history (IPs,
logins)
- Auto-logout all other sessions upon password reset
- Reporting: Details of elaboration as well as cut and paste messages
40. Recap
Protecting customers requires
understanding not just technology but
also behavior. This requires:
- Activity data
- Clear definitions of “good” vs “bad” results
- Constant feedback
- Analysis
Designing data-driven defenses
- Decisions that can be automated w/data
- Where/what data sets to use
- Business drivers to keep in mind
An example
BIG DATA &
LITTLE LOOPS
p (bad)
f(variable A + Variable B + ...)
41. Prediction is very difficult, especially about the
future
Niels Bohr
Allison Miller
@selenakyle