Weitere ähnliche Inhalte Ähnlich wie Fighting cyber fraud with hadoop v2 (20) Kürzlich hochgeladen (20) Fighting cyber fraud with hadoop v21. 1
Figh'ng
Cyber
Fraud
with
Hadoop
Niel
Dunnage
Senior
Solu'ons
Architect
2. 2
©2014
Cloudera,
Inc.
All
rights
reserved.
• Big
Data
is
an
increasingly
powerful
enterprise
asset
and
this
talk
will
explore
the
rela'onship
between
big
data
and
cyber
security.
Big
Data
technologies
provide
both
governments
and
corpora'ons
powerful
tools
to
offer
more
efficient
and
personalized
services.
The
rapid
adop'on
of
these
technologies
has
of
course
created
tremendous
social
benefits.
Unfortunately
unwanted
side
effects
are
the
poten'al
rich
pickings
available
to
those
with
malicious
inten'ons.
Increasingly,
the
sophis'cated
cyber
aPacker
is
able
to
exploit
the
rich
array
public
data
to
build
detailed
profiles
on
their
adversaries
to
support
their
malicious
inten'ons.
Summary
3. 3
©2014
Cloudera,
Inc.
All
rights
reserved.
• Data:
-‐
The
new
oil
• Defend
your
data
• The
security
value
of
Big
Data
Agenda
Source:
Grant
Thornton
LLP
2014
Corporate
General
Counsel
Survey,
conducted
by
American
Lawyer
Media
4. 4
©2014
Cloudera,
Inc.
All
rights
reserved.
• DDOS
• Data
Exfiltra'on
• Confiden'al
customer
records
• Transac'on
data
• Reputa'on
aPack
• False
flag
• Fake
data
• Insider
Threat
Cyber
Security:-‐
Data
is
a
valuable
commodity
OperaDons
designed
to
deceive
in
such
a
way
that
the
operaDons
appear
as
though
they
are
being
carried
out
by
enDDes,
groups
or
naDons
other
than
those
who
actually
planned
and
executed
them
hGp://en.wikipedia.org/wiki/False_flag
@security_511
has
conDnued
to
support
OpSaudi,
claiming
further
aGacks
on
websites
connected
to
Saudi
Aramco.
The
@SQLiNairb
hacker
has
released
a
database
dump
from
a
US
fantasy
football
website
(hGp://www.Qoday.com/),
claiming
that
it
was
Dmed
to
coincide
with
the
NFL
draT
Anonymous
Italy
and
Opera=on
Green
Rights
(OpGR)
have
released
the
contents
of
an
email
account
connected
to
an
Italian
steel
producer,
in
connecDon
to
accusaDons
of
polluDon
against
the
company
The
Lizard
Squad
claim
responsibility
for
taking
down
the
PlaystaDon
network
5. 5
©2014
Cloudera,
Inc.
All
rights
reserved.
Typical
Security
Layers
Type
Example
Access
Physical
(lock
and
key),
Virtual
(Firewalls,
VLANS)
Authen'ca'on
Logins
–
verify
users
are
who
they
say
they
are
Authoriza'on
Permissions
–
verify
what
a
user
can
do
Encryp'on
at
Rest
Data
protec'on
for
files
on
disk
Encryp'on
in
transport
Data
protec'on
on
the
wire
Audi'ng
Keep
track
of
who
accessed
what
Policy
/
Procedure
Protect
against
Human
Error
&
Social
Engineering
6. 6
Cloudera’s
Approach
to
Security
Compliance-‐Ready
Comprehensive
Transparent
• Standards-‐based
Authen'ca'on
• Centralized,
Granular
Authoriza'on
• Na've
Data
Protec'on
• End-‐to-‐End
Data
Audit
and
Lineage
• Meet
compliance
requirements
• HIPAA,
PCI-‐DSS,
…
• Encryp'on
and
key
management
• Security
at
the
core
• Minimal
performance
impact
• Compa'ble
with
new
components
• Insight
with
compliance
6
©2014
Cloudera,
Inc.
All
rights
reserved.
7. 7
Opera-onal
Efficiency
Perform
exis'ng
workloads
faster,
cheaper,
bePer
Innova'on
and
Advantage
Ask
bigger
ques'ons
in
the
pursuit
of
discovering
something
incredible
©2013
Cloudera,
Inc.
All
Rights
Reserved.
Enterprise
Data
Hub
Users
Cases
ETL
Accelera-on
EDW
Op-miza-on
Ac-ve
Archive
OSINT
Analysis
Fraud
Detec-on
Deep
Exploratory
BI
Historical
Compliance
Log
Processing
Performance
Management
Risk
Manageme
nt
8. 8
Offence:-‐
Fraud
Detec'on
User
Cases
• Distributed
parallel
execu'on
with
chained
joins
• Historical
processing
at
scale
• Machine
Learning,
malware/
anomaly
detec'on,
spam
filters
etc
• Combined
real
'me
and
batch
predictors
8
Fully
Automated
at
scale
9. 9
Big
Data
Economics
Ask
bigger
ques'ons
• Predictably
process
large
data
sets
• Linear
scaling
• Robust
and
economic
crypto
security
• Crea've
fail
fast
innova'on
• Powers
produc'vity
insights
• Increasing
infrastructure
ROI
• Increasing
business
ROI
• Defea'ng
fraudulent
ac'vity
• Evalua'ng
risk
Ingest
Discover
Predict
Innovate
©2013
Cloudera,
Inc.
All
Rights
Reserved.
9
10. 10
store
buffer
Data
Ingest
• NRT
Ingest
• Flume
• Op'mized
to
flow
real
'me
event
data
into
the
Hadoop
cluster
• Spark
Streaming
for
near
real
'me
micro
batch
aggrega'ons
• TwiPer
streaming
• Kala
• Log
• API
• Bulk
Load
• Sqoop
for
structured
• Fuse
file
system
access
• API
• Web
/
Hue
• Data
Enrichment
• Flume
interceptors
• Kite
Morplines
module
• Configura'on
based
interceptors
that
can
enrich
data.
For
example
extrac'ng
facets,
en'ty
extrac'on
applying
regulatory
tags
©2014
Cloudera,
Inc.
All
rights
reserved.
Client
Client
Client
Client
Agent
Agent
Agent
enrich
collect
11. 11
Near
Real
'me
Access
to
threats
• View
the
geographic
distribu'on
of
Slowloris
DDOS
taken
from
Apache
web
server
logs
• Help
isolate
unpatched
servers
• Iden'fy
source
of
aPacks
©2014
Cloudera,
Inc.
All
rights
reserved.
LogU'ls.createStream(...)
.filter(_.getText.contains(”408
Error"))
.countByWindow(Seconds(10))
stream.join(historicCounts).filter
{
case
(word,
(curCount,
oldCount))
=>
curCount
>
oldCount
}
12. 12
Machine
Learning
12
Real-‐'me
large-‐scale
machine
learning
predic've
analy'cs
infrastructure
build
on
Hadoop
• Collabora've
filtering
and
recommenda'on
• Classifica'on
and
regression,
• Clustering
(K-‐Means,
Gaussian)
13. 13
VARs
and
Monte
Carlo
Simula'ons
“Under
reasonable
circumstances,
how
much
can
you
expect
to
lose?”
• “Monte
Carlo
simula'on,
involves
posing
thousands
or
millions
of
random
market
scenarios
and
observing
how
they
tend
to
affect
a
porwolio
of
financial
instruments”
• VAR
based
on
Time
Period,
Porwolio
and
Confidence
level
• This
technique
is
easily
parallelizable
and
as
such
is
a
great
fit
for
Hadoop
and
Spark
in
par'cular
• Un'l
recently
required
complex
MPI
C++
code
• Can
be
implemented
in
Hadoop
and
feasible
across
hierarchies
of
financial
instruments
(P&L
Accounts)
• Backtest
to
validate
the
VAR
• Cura'on
of
Market
Factors
is
important
(large
indices
eg
FTSE,
Fx
rates,
Oil
Price
etc)
• Can
shape
porwolio
investments
for
instruments
that
trial
as
loss
making
©2014
Cloudera,
Inc.
All
rights
reserved.
14. 14
Applying
BigDataTechniques
to
Cyber
Threat
Monitoring
with
Hadoop
• Historical
event
data
processing
at
scale
• Hadoop
as
a
service
shared
with
financial
governance
applica'ons
• Simulate
the
sta's'cal
likelihood
of
the
BIA
scenario
• Evaluate
the
sen'ment
of
commentary
of
suppor'ng
IT
• APach
the
anomaly
detector
to
a
stream
processor
scoring
data
in
real
'me
and
aler'ng
accordingly
• Anomaly
detec'on
of
network
traffic
by
learning
what
is
normal
• Siloed
applica'ons
have
previously
made
it
hard
to
have
a
tangible
value
of
financial
risk
• Risk
calcula'ons
tend
towards
the
subjec've
ie
low
(FIS
APT),
high
(insider
threat)
©2014
Cloudera,
Inc.
All
rights
reserved.
15. 15
Internal
Threat
Dashboard
Ranked
List
of
High
Risk
Personnel:
Name
Risk
Score
Kim
Burgess
94
Guy
Hughes
93
Jeff
Maclaen
87
Ed
Snowden
86
Mary
Smith
82
Customers
with
Risk
Scores
that
Recently
Changed
Name
Old
Score
New
Score
John
Smith
34
94
Rob
Jones
26
93
Jim
Fisher
17
87
Henry
Johnson
45
86
Sue
Leefield
12
82
Overall
Risk
Assessment:
Risk
Per
Category:
Online
Banking
Access:
Public
Records:
Financial
transac'on
rate:
Online
Ac'vity:
Social
Media
Ac'vity:
Regular
purchases
Foreign
Travel:
Open
Cases:
Name
Risk
Score
Customers
Dodgy
Ecomm.biz
94
John
Smith,
Rob
Jones.
Brenword
Shopping
Centre
93
Jim
Fisher,
Henry
Johnson
17. 17
Our
Design
Strategy
The
Enterprise
Data
Hub
©2014
Cloudera,
Inc.
All
rights
reserved.
17
One
pool
of
data
One
metadata
model
One
security
framework
One
set
of
system
resources
A
fully
integrated
Hadoop
ecosystem
Storage
Integra-on
REST
(Webhdfs),
File
(Fuse)
Flume,
Sqoop
Resource
Management
YARN
Metadata,
Navigator
Batch
Processing
Spark,
MAPREDUCE,
HIVE
&
PIG
Stream
Processing
Spark
streaming
HDFS
Hbase/
Accumulo
TEXT,
RCFILE,
PARQUET,
AVRO,
ETC.
RECORDS
Engines
Interac've
SQL
CLOUDERA
IMPALA
Interac've
Search
CLOUDERA
SEARCH
Machine
Learning
Spark
Mlib,MAHOUT,
Oryx
Math
&
Sta-s-cs
SAS,
R
Security,
Navigator,
Sentry
graph.ver'ces.filter{case(id,
_)
=>
id==13669222}.collect
Select
CPU_Met
from
applica'on
WHERE
(USAGE
>
1000)
LEFT
OUTER
JOIN
ON
applica'on_ID
where
applica'on_type
IS
Non_Cri'cal
18. 18
©2014
Cloudera,
Inc.
All
rights
reserved.
• Hadoop
Security:
-‐
Kerberos
simplified
deployment
with
Cloudera
Manager
• Sentry:
-‐
provides
unified
authoriza'on
with
a
single
policy
for
Hive,
Impala
and
Search
• HDFS
Extended
ACL’s
and
HBase
cell
level
access
control
• Navigator
encrypt
and
key
trustee
deliver
compliant
data
security
• Via
Gazzang
acquisi'on
• Navigator
provides
data
management
layer
including
audit,
access
control
reviews,
data
classifica'on
and
discovery,
and
lineage
Defense:
-‐
Security
Features
19. 19
Kerberos
Security
Perimeter
Security
• Guarding
access
to
the
cluster
itself
• Technical
Concepts:
• Authen'ca'on
• Network
isola'on
Kerberos
• Kerberos:
A
computer
network
authen-ca-on
protocol
that
works
on
basis
of
'ckets
to
allow
nodes
to
prove
iden'ty
to
each
other
in
a
secure
manner
using
encryp'on
extensively
• Messages
are
exchanged
between:
• Client
• Server
• Kerberos
Key
Distribu'on
Center
(KDC).
• Note
this
is
not
part
of
Hadoop,
but
most
Linux
Distros
come
with
MIT
Kerberos
KDC.
• Passwords
are
not
sent
across
network,
Instead
passwords
are
used
to
compute
encryp'on
keys
• Authen'ca'on
status
is
cached
(don’t
need
to
send
creden'als
with
each
request)
• Timestamps
are
essen'al
to
Kerberos
(make
sure
system
clocks
are
synchronized
!)
©2014
Cloudera,
Inc.
All
rights
reserved.
20. 20
Apache
Sentry
Access
Security
Sentry
©2014
Cloudera,
Inc.
All
rights
reserved.
• Sentry
provides
unified
authoriza'on
across
mul'ple
access
paths
• A
single
authoriza'on
policy
will
be
enforced
for
Impala,
Hive
and
Search
• Role
based
access
at
Server,
Database,
Table
or
View
granularity
• Mul'-‐tenant:
Separate
policies
for
each
database
/
schema
• Access
• Defining
what
users
and
applica'ons
can
do
with
data
• Technical
Concepts:
• Permissions
• Authoriza'on
21. 21
Cloudera
Navigator
Visibility
Cloudera
Navigator
©2014
Cloudera,
Inc.
All
rights
reserved.
• Audi'ng
and
Access
Management
• View,
gran'ng
and
revoke
permissions
across
the
Hadoop
stack
• Iden'fy
access
to
a
data
asset
around
the
'me
of
security
breach
• Generate
alert
when
a
restricted
data
asset
is
accessed
• Lineage
• Given
a
data
set,
trace
back
to
the
original
source
• Understand
the
downstream
impact
of
purging/modifying
a
data
set
• Metadata
Tagging
and
Discovery
• Search
through
metadata
to
find
data
sets
of
interest
• Given
a
data
set,
view
schema,
metadata
and
policies
• Lifecycle
Management
• Automate
periodic
inges'on
of
data
• Compress/encrypt
a
data
set
at
rest
• Purge
a
dataset/replicate
data
set
to
a
remote
site
• Visibility
• Repor'ng
on
where
data
came
from
and
how
it’s
being
used
• Technical
Concepts:
• Audi'ng
• Lineage
22. 22
©2014
Cloudera,
Inc.
All
rights
reserved.
23. 23
©Gazzang
gazzang.com/products/cloudencrypt-‐for-‐aws
Linux
Server
/
VM
Encrypt
client
Linux
File,
Directory
AES-‐256
Encryp'on
Process
Based
ACL’s
GPG
Linux
Server
/
VM
Key
Trustee
Server
Encryp'on
at
rest
Navigator
Encrypt
and
Key
Trustee
• Encrypt
any
File,
Directory
• AES-‐256
Encryp'on
• Unique
Access
controls
• Process
Based,
NOT
users
/
groups
• 100%
Transparent
• Separa'on
of
Du'es
• Key
Management
• AES
encryp'on
keys
stored
on
separate
Key
Trustee
server
• Key
manager
breach,
data
is
safe
• Data
Server
breach,
data
is
safe