Slides of a talk given to the Seattle Chapter of the Cloud Security Alliance. Looks briefly at Architectures, Sources of Log Data, and behavioral signatures in the data and issues and observations around using Big Data products for security.
1. BIG DATA APPROACHES
TO CLOUD SECURITY
Paul Morse – President, WebMall Ventures
Cloud Security Alliance, Seattle Chapter 3/28/2013
2. “BIG DATA IS NOT JUST ABOUT LOTS OF DATA, IT IS ABOUT
HAVING THE ABILITY TO EXTRACT MEANING; TO SORT
THROUGH THE MASSES OF DATA ELEMENTS TO DISCOVER THE
HIDDEN PATTERN, THE UNEXPECTED CORRELATION,”
Art Coviello, executive chairman of RSA
ON THE SURFACE, BIG DATA SEEMS TO BE ALL ABOUT BUSINESS
INTELLIGENCE AND ANALYTICS, BUT IT ALSO AFFECTS THE NITTY-
GRITTY OF POWER AND COOLING, NETWORKING, STORAGE
AND DATA CENTER EXPANSION.
3. AGENDA
• Observations
• Cloud Architectures/Components
• Machine-Generated Data
• Sources of Data
• Time Sequencing of Events
• Searching for Behavior
• Recent Hack Examples
4. OBSERVATIONS
• Big Data solutions are changing the game for security practitioners and execs
• Provide the ability to look at discovery, detection and remediation across large portions
of the organization in entirely new ways
• Correlation between seemingly unrelated events in near real time is now relatively easy
• Growing range of solution types – simple to highly complex
• Roll your own to pre-packaged solutions
• On-prem, Public Cloud-based and Hybrid
• Simple Log search to Predictive Analysis with complex dashboards and reporting
• Some solutions have extremely short “time to value” propositions
• “Big Data Washing” like “Cloud Washing” is showing up
• Prices vary – Free to mondo
• It is NOT the holy grail for security but has many advantages over traditional SIEM
products – real time, large amounts of data, broad event correlation, etc.
5. SET THE STAGE
• Many perspectives to Cloud Computing
• Main focus for this talk is as a Public Cloud Provider
• You are the “owner” of the facility – all of it.
• Infrastructure-centric discussion
• How do Big Data solutions improve Security?
9. SCADA DATA SOURCES
Backup Generators
Door Wireless Devices
Backup Batteries Sensors
RFID PC’s Tablets
Power Card Key
Storage Distribution Systems Printers Phones?
This is your attack surface Temp Water System
Servers Sensors
Lighting controls
Routers/Switches
I want all the data in one searchable repository and available in near real time
10. SECURE? THINK AGAIN.
• Internet Mapping Project
• “harmless” Port ping and bot install
• 660 million IPs with 71 billion ports
tested
• 460 Million Devices Responded
• Resulted in 420 thousand bots
• Stupid uid/pwd combos
• Admin/admin, Admin/no pwd,
root/root, root/no pwd
• What’s on your network?
http://internetcensus2012.bitbucket.org/paper.html
11. CAUSE FOR PAUSE
“ We hope other researchers will find the data we
have collected useful and that this publication will
help raise some awareness that, while everybody is
talking about high class exploits and cyberwar, four
simple stupid default telnet passwords can give you
access to hundreds of thousands of consumer as well
as tens of thousands of industrial devices all over the
world.”
12. MACHINE DATA
• Isn’t it really all machine data?
• Machine-generated data (MGD) is the generic term for information which was
automatically created from a computer process, application, or other machine
without the intervention of a human.
• Network Device Log files
• Event logs
• Application logs
• RFID logs
• Storage logs
• HVAC Logs
• Sensor data
• Etc.
14. TIME SEQUENCE OF EVENTS
Outbound Traffic
Terminate Sess
Delete logs
Installer runs
Upload Small File
Command
Fail
Pass
Login Attempt
Server
TOR
LB
Front end
IP Address/Packet T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19
15. TIME SEQUENCE OF EVENTS
Terminate Sess
Delete logs
Update
Upload Small File
Command
Fail
Pass
Login Attempt
Device
TOR
LB
Front end
IP Address/Packet T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18
16. TIME SEQUENCE OF EVENTS
Terminate Sess
Delete logs
Update
Upload Small File
Command
Fail
Pass
Login Attempt
Device
IP Address/Packet T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18
Door 5
Door 4
Door 3
Door 2
Door 1
T-30 T-15 T0 T15 T30 T45
17. SOME AREAS TO CONSIDER
• Ingesting various data formats
• Many vendors claim it is easy, when it may not be
• Transforms and connectors may be required (affect performance)
• Device companies create add-ons, connectors, dashboards, transforms, queries, etc
• Speed of indexing determines “real time” abilities
• Do you need to index ALL machine data?
• Vendor-specific Query languages
• No standard, some commonality
• Learning curve for seriously complex queries and operationalizing environment
• Dashboards and Visualizations Vary
• Large number of simultaneous queries is required
• Workflow is critical – what happens when you find something?
• Implementation architecture – lots of hardware? Bandwidth? Security? Users?
• Data Governance – You found what?
18.
19. HACK EXAMPLES
• DOJ in January
• Defacement
• What specific behavior happened and what did they do?
• Log in Remotely
• Completely replace Index.*
• Solution – monitor index.* and set up a parsing stream and search for a code in
the html. Call a workflow if the file changes or the code doesn’t match.
• DDoS
• Overwhelm Website
• Solution – compare request rate of increase to a previous ‘norm”. If the disparity
is great enough, call a workflow to check IP addresses of source(s). Depending
on results, do nothing or script a filter or block.
20. VENDORS AND GETTING STARTED
• Hadoop with Flume • Getting Started
• HP ArcSight • Easiest – Cloud Based
• Loggly • Sumo Logic
• Splunk Storm
• Logrythm
• Download and Install
• SumoLogic • Loggly
• LogScape • Logrythm
• LogStash • LogScape
• Sawmill • LogStash
• Sawmill
• Splunk • Splunk
• Splunk Storm • Hadoop/Flume/Pig