Weitere ähnliche Inhalte Ähnlich wie October 2014 Webinar: Cybersecurity Threat Detection (20) Kürzlich hochgeladen (20) October 2014 Webinar: Cybersecurity Threat Detection1. Securely explore your data
CYBERSECURITY
THREAT DETECTION
Deriving Insights with Sqrrl and Spark GraphX
Adam Fuchs, CTO
October 2014
3. WHAT WE’LL DISCUSS
3© 2014 Sqrrl Data, Inc. | All Rights Reserved
• Security Analytics using (Big) Cybersecurity Data
• You’ve been breached – what’s at stake?
• Dealing with the new security dilemma
• The ‘Linked Data’ Approach
• Case study: internal network breach
• Overview of scenario
• Data modeling with Sqrrl
• Detecting anomalies with Sqrrl and GraphX
• Visual, contextual research and analysis
4. THE NUMBERS DON’T LIE
© 2014 Sqrrl Data, Inc. | All Rights Reserved | Proprietary and Confidential 4
229 87%
90% $12.7M
Source: Mandiant Source: Verizon
Source: Verizon Source: Ponemon
5. TARGETED ATTACKS HAVE CHANGED THE GAME
5© 2014 Sqrrl Data, Inc. | All Rights Reserved Source: Battery Ventures
6. WHAT DOES THIS MEAN FOR US?
• You’ve been breached. Deal with it.
• Empower the investigator
• Research and respond: better, faster, smarter
• It’s all about speed to understanding
© 2014 Sqrrl Data, Inc. | All Rights Reserved 6
Dissolution of the secure perimeter
7. © 2014 Sqrrl Data, Inc. | All Rights Reserved 7
Detecting attacks requires more (i.e. BIG) data
But your tools
can’t handle the
big data wave
So attackers are spilling in
THE SECURITY DATA DILEMMA
8. BIG DATA TRANSFORMED
© 2014 Sqrrl Data, Inc. | All Rights Reserved
Linked Contextual Knowledge
Perimeter Data
Network Data
Endpoint Data
Security Data
VPN FW
Network Data
Proxy NetFlow
Application Data
HR USB
Email
Users
WebsitesInternal
Servers
Client
Devices Assets
Analysis
Search
Exploration
Reports
Anomalies
Data sources
Machine Learning
8
9. ARCHITECTURAL OVERVIEW
© 2014 Sqrrl Data, Inc. | All Rights Reserved 9
Commodity Hardware
HDFS + Accumulo
Raw Events
Entity/Relationship
Model
Query Engine Bulk/Graph Processing
Visualization / API
ML + Anomaly
Detection
Physical
Data Storage
Data Model
Processing
Interface
Audit
Cryptography
Labeling+Policy
Security
11. BREACH DETECTION SCENARIO
© 2014 Sqrrl Data, Inc. | All Rights Reserved
BREACH
Compromised Laptop
NETFLOW:
NETWORK SCAN
WINDOWS EVENT LOGS:
PASS THE HASH
NETFLOW:
EXFIL
STOLEN
CREDENTIALS
WINDOWS EVENT LOGS:
Unusually excessive logins
DB DUMP
MSSQL EVENT LOG:
Unscheduled backup
i
RECON / DELIVERY EXPLOIT / INSTALL C2 / ACTION
p! a
Wq
mins hours days weeks months
11
12. CASE STUDY MODEL
© 2014 Sqrrl Data, Inc. | All Rights Reserved 12
Data Sources
Users
Hosts
login
Linked Meta Model
flow
login
DNS
records
Netflow
Host logs
Database
logs
External
Alerts
13. CASE STUDY EXAMPLE MAPPING
© 2014 Sqrrl Data, Inc. | All Rights Reserved 13
Netflow Records
startTime endTime sourceIP destIP
source
Port
destPort protocol tcpFlags bytesIn bytesOut
10/22/14
8:58
10/22/14
8:58
10.0.2.15
192.168.0.123
37051
139
TCP
...RS.
100
3355
10/22/14
8:45
10/22/14
8:45
10.0.2.15
192.168.0.6
0
3328
ICMP
......
40
100
10/22/14
8:59
10/22/14
8:59
192.168.0.11
9
10.0.2.15
139
60071
TCP
.A..S.
46
351
10.0.2.15
192.168.
0.123
Class=Flow,
totalBytes = 3455
192.168.
0.6
Class=Flow,
totalBytes = 140
14. CASE STUDY EXAMPLE DATA
© 2014 Sqrrl Data, Inc. | All Rights Reserved 14
Jane
Class=User:
id=Jane,
loginAttempts=82
192.168.
10.94login
74.129.
94.19
Class=Host:
id=74.129.94.19,
bytesTransfered={
2014-09-30/01:00:
64472381}
Class=Host:
id=192.168.10.94,
hostname=kali,
bytesTransfered={
2014-09-30/01:00:
64472381}
flow
192.168.
10.120
Class=Host:
id=192.168.10.120,
hostname=msserv
bytesTransfered=
{2014-09-30/04:00:
42318}
15. INVESTIGATION PROCESS
© 2014 Sqrrl Data, Inc. | All Rights Reserved 15
1. Set the Stage
2. Enable Search
and Discovery
3. Automate
Analysis
• Define the
security-centric
entity/
relationship
model
• Extract and
maintain the
model
• Visually
navigate assets
and actors in
the network
• Drill down to the
raw data
seeding the
model
• Use behavioral
analytics to build
expectations of
‘normal’
• Flag entities as
potentially
‘abnormal’ and
sniff them out
17. APACHE SPARK 101
We use Spark because:
1. Meets core processing
requirements
• Pre-canned algorithms
• Native support for graph
processing
• Simple programmability
2. Good performance
• Low latency for many small
jobs
• Scalability for big jobs
3. Natural fit
• Ties with Hadoop ecosystem
simplified integration
© 2014 Sqrrl Data, Inc. | All Rights Reserved 17
18. ROUND-TRIPPING WITH SPARK
© 2014 Sqrrl Data, Inc. | All Rights Reserved 18
Algorithmic Enrichment
SqrrlGraphInputFormat SqrrlGraph.update(uuid, values)
Sqrrl Graph Store
Input Data
Ingest/
Extract
Serve/
Analyze
Sqrrl UI
• DNS
• Netflow
• Windows
Logs
• DB logs
• Alert data
19. STRUCTURAL FEATURES
© 2014 Sqrrl Data, Inc. | All Rights Reserved 19
Triangle Counting:
• Given node A, find edges AB, AC, BC
• For nodes B, C in A’s neighborhood, is
P(BC) > E/N2
Node Degree:
• Given node A, how many nodes
within 1 or 2 edges?
Page Rank:
• Iteratively transfer weight
proportionally to neighbors
• Converges on entity importance
20. SPARK OUTLIER DETECTION
• Use GraphX to load Sqrrl graph model
• Entities: Users, Hosts
• Relationships: Flows, Logins (both user and host)
• Loads an RDD with Sqrrl graph in Spark
• For every node, generate features:
• GraphX built-in methods:
• Degree, Triangle Count, PageRank
• Implemented in Spark by Sqrrl:
• edgeWeightTotal, totalNeighborDegree
© 2014 Sqrrl Data, Inc. | All Rights Reserved 20
Detail on data flow and algorithms
21. SPARK OUTLIER DETECTION
• Transform statistics to feature matrix, run PCA
• Creates ranked list of high-variance dimensions, most
likely indicative of an entity’s “outlierness”
• PCA run with Spark MLLib
• Top feature pairs:
• totalNeighborDegree vs. edgeWeightTotal
• Degree vs. edgeWeightTotal
• Create “distance” measure using pairs to flag
anomalies
© 2014 Sqrrl Data, Inc. | All Rights Reserved 21
Detail on data flow and algorithms
25. THANKS!
© 2014 Sqrrl Data, Inc. | All Rights Reserved 25
Adam Fuchs, CTO Sqrrl Data, Inc.
http://www.sqrrl.com