Weitere ähnliche Inhalte Ähnlich wie Analyzing 1.2 Million Network Packets per Second in Real-time (20) Mehr von DataWorks Summit (20) Kürzlich hochgeladen (20) Analyzing 1.2 Million Network Packets per Second in Real-time1. OpenSOC
The Open Security Operations
Center
for
Analyzing 1.2 Million Network Packets per Second
in Real TimeJames Sirota,
Big Data Architect
Cisco Security Solutions Practice
jsirota@cisco.com
Sheetal Dolas
Principal Architect
Hortonworks
sheetal@hortonworks.com
June 3, 2014
2. Cisco Confidential 2© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Problem Statement & Business Case for OpenSOC
Solution Architecture and Design
Best Practices and Lessons Learned
Q & A
Over Next Few Minutes
4. Cisco Confidential 4© 2013-2014 Cisco and/or its affiliates. All rights reserved.
fatalism:
It's no longer if or when you get hacked,
but the assumption is that you've
already been hacked,
with a focus on minimizing the
damage.”
Source: Dark Reading / Security’s New
Reality: Assume The Worst
5. Cisco Confidential 5© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Breaches Happen in Hours…
But Go Undetected for Months or Even Years
Source: 2013 Data Breach Investigations
Report
Seconds Minutes Hours Days Weeks Months Years
Initial Attack to Initial
Compromise
10% 75% 12% 2% 0% 1% 1%
Initial Compromise
to Data Exfiltration
8% 38% 14% 25% 8% 8% 0%
Initial Compromise
to Discovery
0% 0% 2% 13% 29% 54% 2%
Discovery to
Containment/
Restoration 0% 1% 9% 32% 38% 17% 4%
Timespan of events by percent of breaches
In 60% of
breaches, data
is stolen in hours
54% of breaches
are not discovered for
months
6. Cisco Confidential 6© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Cisco Global Cloud Index
Source: 2014 Cisco Global Cloud Index
7. Cisco Confidential 7© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Introducing OpenSOC
Intersection of Big Data and Security Analytics
Multi Petabyte Storage
Interactive Query
Real-Time Search
Scalable Stream
Processing
Unstructured Data
Data Access Control
Scalable Compute
OpenSOC
Real-Time Alerts
Anomaly Detection
Data
Correlation
Rules and Reports
Predictive Modeling
UI and Applications
Big Data
Platform
Hadoop
8. Cisco Confidential 8© 2013-2014 Cisco and/or its affiliates. All rights reserved.
OpenSOC Journey
Sept 2013
First Prototype
Dec 2013
Hortonworks
joins the
project
March 2014
Platform
development
finished
Sept 2014
General
Availability
May 2014
CR Work off
April 2014
First beta test
at customer
site
9. Cisco Confidential 9© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Solution Architecture &
Design
10. Cisco Confidential 10© 2013-2014 Cisco and/or its affiliates. All rights reserved.
OpenSOC Conceptual Architecture
Raw Network Stream
Network Metadata
Stream
Netflow
Syslog
Raw Application Logs
Other Streaming
Telemetry
HiveHBase
Raw Packet
Store
Long-Term
Store
Elastic Search
Real-Time
Index
Network Packet
Mining and
PCAP
Reconstruction
Log Mining and
Analytics
Big Data
Exploration,
Predictive
Modeling
Applications + Analyst Tools
Parse+Format
Enrich
Alert
Threat Intelligence
Feeds
Enrichment Data
11. Cisco Confidential 11© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Raw Network Packet Capture, Store, Traffic Reconstruction
Telemetry Ingest, Enrichment and Real-Time Rules-Based
Alerts
Real-Time Telemetry Search and Cross-Telemetry Matching
Automated Reports, Anomaly Detection and Anomaly
Alerts
Key Functional Capabilities
12. Cisco Confidential 12© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Fully-Backed by Cisco and Used Internally for Multiple
Customers
Free, Open Source and Apache Licensed
Built on Highly-Scalable and Proven Platforms (Hadoop,
Kafka, Storm)
Extensible and Pluggable Design
Flexible Deployment Model (On-Premise or Cloud)
Centralize your processes, people and data
The OpenSOC Advantage
13. Cisco Confidential 13© 2013-2014 Cisco and/or its affiliates. All rights reserved.
OpenSOC Deployment at Cisco
Hardware footprint (40u)
14 Data Nodes (UCS C240 M3)
3 Cluster Control Nodes (UCS C220
M3)
2 ESX Hypervisor Hosts (UCS C220
M3)
1 PCAP Processor (UCS C220 M3 +
Napatech NIC)
2 SourceFire Threat alert processors
1 Anue Network Traffic splitter
1 Router
1 48 Port 10GE Switch
Software Stack
HDP 2.1
Kafka 0.8
Elastic Search 1.1
MySQL 5.5
14. Cisco Confidential 14© 2013-2014 Cisco and/or its affiliates. All rights reserved.
OpenSOC - Stitching Things Together
AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing
StormKafka
B Topic
N Topic
Elastic
Search
Index
Web
Services
Search
PCAP
Reconstruction
HBase
PCAP Table
Analytic
Tools
R / Python
Power Pivot
Tableau
Hive
Raw Data
ORC
Passive
Tap
PCAP Topic
DPI Topic
A Topic
Telemetry
Sources
Syslog
HTTP
File System
Other
Flume
Agent A
Agent B
Agent N
B Topology
N Topology
A Topology
PCAP
Traffic
Replicato
r
PCAP
Topology
DPI Topology
15. Cisco Confidential 15© 2013-2014 Cisco and/or its affiliates. All rights reserved.
OpenSOC - Stitching Things Together
AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing
StormKafka
B Topic
N Topic
Elastic
Search
Index
Web
Services
Search
PCAP
Reconstruction
HBase
PCAP Table
Analytic
Tools
R / Python
Power Pivot
Tableau
Hive
Raw Data
ORC
Passive
Tap
PCAP Topic
DPI Topic
A Topic
Telemetry
Sources
Syslog
HTTP
File System
Other
Flume
Agent A
Agent B
Agent N
B Topology
N Topology
A Topology
PCAP
Traffic
Replicato
r
Deeper
Look
PCAP
Topology
DPI Topology
16. Cisco Confidential 16© 2013-2014 Cisco and/or its affiliates. All rights reserved.
PCAP Topology
StorageReal Time Processing
Storm
Elastic Search
Index
HBase
PCAP Table
Hive
Raw Data
ORC
Kafka
Spout
Parse
r
Bolt
HDFS
Bolt
HBas
e
Bolt
ES
Bolt
17. Cisco Confidential 17© 2013-2014 Cisco and/or its affiliates. All rights reserved.
DPI Topology & Telemetry Enrichment
StorageReal Time Processing
Storm
Elastic Search
Index
HBase
PCAP Table
Hive
Raw Data
ORC
Kafka
Spout
Parse
r Bolt
GEO
Enric
h
Whoi
s
Enric
h
CIF
Enric
h
HDF
S
Bolt
ES
Bolt
18. Cisco Confidential 18© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Enrichments
Parse
r
Bolt
GEO
Enrich
RAW
Message
{
“msg_key1”: “msg value1”,
“src_ip”: “10.20.30.40”,
“dest_ip”: “20.30.40.50”,
“domain”: “mydomain.com”
}
Who
Is
Enrich
"geo":[ {"region":"CA",
"postalCode":"95134",
"areaCode":"408",
"metroCode":"807",
"longitude":-121.946,
"latitude":37.425,
"locId":4522,
"city":"San Jose",
"country":"US"
}]
CIF
Enrich
"whois":[ {
"OrgId":"CISCOS",
"Parent":"NET-144-0-0-0-0",
"OrgAbuseName":"Cisco Systems Inc",
"RegDate":"1991-01-171991-01-17",
"OrgName":"Cisco Systems",
"Address":"170 West Tasman Drive",
"NetType":"Direct Assignment"
} ],
“cif”:”Yes”
Enriched
Message
Cache
MySQL
Geo Lite
Data
Cache
HBase
Who Is Data
Cache
HBase
CIF Data
19. Cisco Confidential 19© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Applications: Telemetry Matching and DPI
Step1: Search
Step2: Match
Step3: Analyze
Step4: Build PCAP
20. Cisco Confidential 20© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Integration with Analytics Tools
Dashboards Reports
21. Cisco Confidential 21© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Best Practices
and
Lessons Learned
22. Cisco Confidential 22© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Journey Towards Highly
Scalable Application
25. Cisco Confidential 25© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Some code optimizations and increased
parallelism
26. Cisco Confidential 26© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Is Disk I/O heavy
Kafka 0.8+ supports replication and JBOD
Better performance compared to RAID
Parallelism is largely driven by number of disks and partitions per topic
Key configuration parameters:
num.io.threads - Keep it at least equal to number of disks provided to
Kafka
num.network.threads - adjust it based on number of concurrent
producers, consumers and replication factor
Kafka Tuning
28. Cisco Confidential 28© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Bottleneck Isolation, Resource Profiling,
Load Balancing
31. Cisco Confidential 31© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Row Key design is critical (gets or scans or both?)
Keys with IP Addresses
Standard IP addresses have only two variations of the first character : 1 & 2
Minimum key length will be 7 characters and max 15 with a typical average of 12
Subnet range scans become difficult – range of 90 to 220 excludes 112
IP converted to hex (10.20.30.40 => 0a141e28)
gives 16 variations of first key character
consistently 8 character key
Easy to search for subnet ranges
Row Key Design
32. Cisco Confidential 32© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Experiments with Row Key
33. Cisco Confidential 33© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Know your data
Auto split under high workload can result into hotspots and split storms
Understand your data and presplit the regions
Identify how many regions a RS can have to perform optimally. Use the
formula below
(RS memory)*(total memstore fraction)/((memstore size)*(# column families))
Region Splits
35. Cisco Confidential 35© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Enable Micro Batching (client side buffer)
Smart shuffle/grouping in storm
Understand your data and situationally exploit various WAL options
Watch for many minor compactions
For heavy ‘write’ workload Increase hbase.hstore.blockingStoreFiles (we
used 200)
Know Your Application
38. Cisco Confidential 38© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Parallelism is controlled by number of partitions per topic
Set Kafka spout parallelism equal to number of partitions in
topic
Other key parameters that drive performance
fetchSizeBytes
bufferSizeBytes
Kafka Spout
39. Cisco Confidential 39© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Mysteriously Missing Data
40. Cisco Confidential 40© 2013-2014 Cisco and/or its affiliates. All rights reserved.
A bug in Kafka spout that used to miss out some partitions
and loose data
It is now fixed and available from Hortonworks repository (
http://repo.hortonworks.com/content/repositories/releases/org/apache/
storm/storm-Kafka )
Mysteriously Missing Data Root Cause
42. Cisco Confidential 42© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Every small thing counts at scale
Even simple string operations can slowdown throughput when
executed on millions of Tuples
Storm
43. Cisco Confidential 43© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Error handling is critical
Poorly handled errors can lead to topology failure and eventually loss
of data (or data duplication)
Storm
44. Cisco Confidential 44© 2013-2014 Cisco and/or its affiliates. All rights reserved.
Tune & Scale individual spout and bolts before performance
testing/tuning entire topology
Write your own simple data generator spouts and no-op bolts
Making as many things configurable as possible helps a lot
Storm
45. Cisco Confidential 47© 2013-2014 Cisco and/or its affiliates. All rights reserved.
When it comes to Hadoop…partner up
Separate the hype from the opportunity
Start small then scale up
Design Iteratively
It doesn’t work unless you have proven it at scale
Keep an eye on ROI
Lessons Learned
46. Cisco Confidential 48© 2013-2014 Cisco and/or its affiliates. All rights reserved.
How can you contribute?
Technology Partner Program – contribute developers to
join the Cisco and Hortonworks team
Looking for Community Partners
Cisco + Hortonworks + Community Support for OpenSOC
Hinweis der Redaktion In Storm bolts shuffle group based on regions so that each HBase bolt gets data mostly for one or two regions and minimizes RS trips
In case of DoS attack situations where actual packet are very small 20-60 bytes and individual packets are not very critical for analysis, skip WAL In Storm bolts shuffle group based on regions so that each HBase bolt gets data mostly for one or two regions and minimizes RS trips
In case of DoS attack situations where actual packet are very small 20-60 bytes and individual packets are not very critical for analysis, skip WAL Frequent minor compactions reduce the overall throughput of system. For ‘write’ heavy workload reduce frequency of minor compactions by increasing hbase.hstore.blockingStoreFiles (we used 200)
Frequent minor compactions reduce the overall throughput of system. For ‘write’ heavy workload reduce frequency of minor compactions by increasing hbase.hstore.blockingStoreFiles (we used 200)
Frequent minor compactions reduce the overall throughput of system. For ‘write’ heavy workload reduce frequency of minor compactions by increasing hbase.hstore.blockingStoreFiles (we used 200)