1. ML for Security Monitoring
Santisook Limpeeticharoenchot
Managing Director
2. Agenda
• Why ML for Security Monitoring ?
• Overview of Machine Learning
• Apply Theory to Practice
• ML for Security Example.
• DataScience Process for Security
• Q&A
3. Fraud
Bad Actors Ransomware
IP Theft
Application Performance
Identity Theft
Key Performance Indicators
Network Intrusion
Malware
Exfiltration
Cyber-attacks
Zero-day
Compromised Credentials
SCADA Security
Hardware Deterioration
Known & Unknown Threat
4. The Current IT Situation
VM VM VM VM VM
VM VM VM VM VM VM
VM VM VM VM VM VM
VM VM VM
Fluid
Infrastructure
Distributed
Applications
Continuous
Deployment
7. Current State Of Security Monitoring: #monitoringsucks
Measure Everything
➢Collect 1000’s of metrics and logs, most
unused
➢Analytics methods too simple, not correlated,
doesn’t help solve outages
Threshold = alert overload
➢Too many false positives
➢Hundreds of alerts a day, most ignored
IT & Security operations has become a big data challenge
“The [traditional] tools present us with the raw data, and lots of it, but sufficient insight into the
actual meaning buried in all that data is still remarkably scarce”
- Turn Big Data Inward With IT Analytics, Forrester Research
15. Terms and definitions
Artificial Intelligence
Machine learning
Deep
learning
Algorithms
Supervised
Unsupervised
source:www.ibm.com
16. Traditional Computers vs. Artificial Intelligence
Traditional Programs
•Pre-programmed: producing
same results every time
•Deterministic: good or false
•One-dimensional: for
one/limited purpose
10
Artificial Intelligence
•Machine learning: changing
its code to improve results
•Stochastic: based on
probability
•Multi-dimensional: potential
for more general purposes
source:www.ibm.com
17. Traditional Programs vs. Machine Learning
Machine LearningTraditional Programs
Data
Static code
Real world result
Data
Algorithm
Real world result
Hypothesis Feedback
source:www.ibm.com
18. Enter Machine Learning!
What: “Field of study that gives computers the ability to learn
without being explicitly programmed” – Arthur Samuel, 1959
How: Generalizing (learning) from examples (data)
30. Anomaly Detection
Unusual vs. peers
Rare Events
Deviations in
Counts or Values
=
=
=
“responsetime by host”
“count by error_type”
“rare by EventID”
“rare by process”
“sum(bytes) over client_ip”
EXAMPLES
source:prelert.com
31. Evolution of Malware Detection
Signature-based
Potential
malware
Known “bad”
Behavior-based
Potential
malware
Bad behavior
Potential malware
Heuristics/sandboxing
Testing indicators
Statistical inference
Potential
malware
Probabilities
source:www.ibm.com
32. Real world applications for Machine Learning
• Fraud: credit card fraud, spam, DLP Automated recognition: face, handwriting
• Capacity planning: product stocking, server provisioning
• Anomaly detection for security and IT Operations Product recommendations
• Customer segmentation Medical diagnoses
…
33. Customer Usecase : Detect Network Outliers
Reduced downtime + increased service availability = better customer satisfaction
ML Use Case
Monitor noise rise for 20,000+ cell towers to increase service and device availability, reduce
MTTR
Technical overview
•A customized solution deployed in production based on outlier detection.
•Leverage previous month data and voting algorithms
“The ability to model complex systems and alert on deviations is where IT and security
operations are headed … Splunk Machine Learning has given us a head start...”
source:www.splunk.com
34. Reliable website updates
Proactive website monitoring leads to reduced downtime
“Splunk ML helps us rapidly improve end-user experience by ranking issue severity which
helps us determine root causes faster thus reducing MTTR and improving SLA
• Very frequent code and config updates (1000+ daily) can cause site issues
• Find errors in server pools, then prioritize actions and predict root cause
•Custom outlier detection built using ML Toolkit Outlier assistant
•Built by Splunk Architect with no Data Science background
ML Use Case
Technical overview
source:www.splunk.com
37. Normal distributions are really useful
• I can make powerful predictions because of the statistical
properties of the data , most naturally occurring processes
• I can easily compare different metrics since they have similar
statistical properties
• Population height, IQ distributions ,Widget sizes, weights in
manufacturing
• There is a HUGE body of statistical work on parametric
techniques for normally distributed data
source:conf2016,splunk
42. Example: Three-Sigma Rule
Three-sigma rule
–~68% of the values lie within 1 std deviation of the mean
–~95% of the values lie within 2 std deviations
–99.73% of the values lie within 3 std deviations: anything
else is considered an outlier
source:conf2016,splunk
43. Probabilistic Modeling and Analysis
Outliers
likelihood
observed values
X
ML model
Gaussian
source:prelert.com
46. • Fraud detection systems:
– Is what he just did consistent with past
behavior?
• Network anomaly detection:
– More like bad statistical analysis
• Predicting likelihood of attack actors
– Create different predictive models and chain them
to gain more confidence in each step.
Security Applications of ML
Source:mlsecproject.org
47. • Alert-‐based:
– “Traditional” log management
– SIEM
– Using “Threat Intelligence” (i.e
blacklists) for about a year or so
– Lack of context
– Low effectiveness
– You get the results handed
over to you
Kinds of Network Security Monitoring
• Exploration-‐based:
– Network Forensics tools
– High effectiveness
– Lots of people necessary
– Lots of HIGHLY trained people
• Big Data Security Analytics (BDSA):
– Run exploration-‐basedmonitoring on Hadoop
– More like Big Data Security Monitoring(BDSM)
Source:mlsecproject.org
48. • Rules in a SIEM solution invariably are:
– “Something” has happened “x”times;
– “Something” has happened and other “something2” has happened, with some
relationship (time, same fields, etc) between them.
• Configuring SIEM = iterate on combinations until:
– Customer or management is satisfied;
– Consulting money runs out
• Behavioral rules (anomaly detection) helps a bit with the “x”s, but still, very
laborious and time consuming.
Correlation Rules: A Primer
Source:mlsecproject.org
49. Historical Data Real-time Data Statistical Models
DB, Hadoop/S3/NoSQL, Splunk Anomaly Detection or Machine Learning
T – a few
days
T + a few
days
Why is this so challenging using traditional methods?
• DATA IS STILL IN MOTION, still in a BUSINESS PROCESS.
• Enrich real-time MACHINE DATA with structured HISTORICAL DATA
• Make decisions IN REAL TIME using ALL THE DATA
• Combine LEADING and LAGGING INDICATORS (KPIs)
SIEM
Security Operations Center
Network Operations Center
Business Operations Center
source:conf2016,splunk
50. Anomaly Detection & Machine Learning
What is AD?
Types of security anomalies:
spikes in activity
rare events
first-observed
Outliers
state change
simple existence
What do these
have in common?
time-based
The basic comparison parameter is self-comparison overtime.
Advanced parameters include peer-based comparison.
What is ML?
Supervised ML
–Classification/Regression
Unsupervised ML
–Clustering
Semi-Supervised
–Rule-based AD
For AD and security, ML can
establish a baseline of normal
(negative) values
source:conf2016,splunk
51. Unsupervised Learning
Unsupervised Machine Learning
– You have unlabeled data and want to group the data by feature(s)
– The algorithm makes its own structure out of the data
– You do not know what outliers look like
– Good for the data exploration phases of security anomaly detection
– Examples used in security applications include:
Clustering: k-means, k-medians, Expectation Maximization
Association: less relevant because in highly structured searches we are less concerned with
associations between fields for security anomaly detection
source:conf2016,splunk
52. Supervised Learning
Supervised Machine Learning
– You have labeled data and the algorithm predicts the output
– Classification vs. Regression
– Example ML algorithms include:
Linear and Logistic Regression
Random Forest
Support Vector Machine
DBSCAN
Semi-Supervised Machine Learning
– You have “some” labeled data, but not all
– Most security ML applications fall in this category
– LabelPropagation
– Rule-based anomaly detection
For SECURITY-PURPOSED
applications of ML, a combination
of unsupervised, supervised, and
Semi-Supervised learning
algorithms is a best practice
In realistic applications, security-purposed
AD requires highly structured data and
human training of the algorithm
source:conf2016,splunk
53. ML 101 for Security Monitoring
• Machine Learning (ML) is a process for generalizing from examples
– Examples = example or “training” data
– Generalizing = build “statistical models” to capture correlations
– Process = ML is never done, you must keep validating & refitting models
• Simple ML workflow:
– Explore data
– FIT models based on data
– APPLY models in production
– Keep validating models
source:conf2016,splunk
54. The ML Process
Problem: <Stuff in the world> causes big time & money expense
Solution: Build predictive model to forecast <possible incidents>, act pre-emptively & learn
1.Get all relevant data to problem
2.Explore data & build KPIs
3.Fit, apply & validate models on past / real-time data
4.Predict and act. Identify notable events, create alerts
5.Surface incidents to X Ops, who INVESTIGATES & ACTS
Operationalize
source:conf2016,splunk
55. Security: Find Insider Threats
Problem: Security breaches cause big time & money expense
Solution: Build predictive model to forecast threat scenarios, act pre-emptively & learn
1. Get security data (data transfers, authentication, incidents)
2. Explore data & build KPIs
3. Fit, apply & validate models on past / real-time data
4. Predict and act. Identify anomalous behaviors, create alerts
5. Surface incidents to Security Ops, who INVESTIGATES & ACTS
Operationalize
source:conf2016,splunk
56. Machine Learning in IT Operation.
Adaptive Thresholding:
• Learn baselines & dynamic thresholds
• Alert & act on deviations
• Manage for 1000s of KPIs & entities
• Stdev/Avg, Quartile/Median, Range
Anomaly Detection:
• Employ machine learning to baseline normal
operations and alert on anomalous conditions
• Identify abnormal trends and patterns in KPI data
source:conf2016,splunk
57. Finds the Deviation perfectly
5
7
• No extraneous false alarms
• Automatic periodicity
source:prelert.com
58. Challenge:
How do you find the signs of advanced threats amid thousands of daily high-severity alerts?
▪ Difficulty of creating effective
rules results in a high false
positive rate
▪ Advanced Evasion
Techniques (AETs) well-
known to attackers
Find Important IDS/IPS Events
source:prelert.com
59. • Anomaly Detective
generates a dozen or
so alerts per week
• Accuracy & alert detail
enable faster
determination of threat
level
Find Important IDS/IPS Events
Solution:
Let machine learning filter out normal ‘noise’ and identify unusual
counts, signatures, protocols and destinations by source
source:prelert.com
60. Rare Items as Anomalies
Use Case: Learn typical processes on each host
Find rare processes that “start up and communicate”
source:prelert.com
61. Finds the RARE anomaly perfectly
• finds FTP process running for 3 hours on
system that doesn’t normally run
source:prelert.com
62. Population / Peer Outliers
Use Case: Find users behaving much differently than the others
source:prelert.com
63. Find the Unusual USER Perfectly
• Host sending 20,000 requests/hr
• Attempt to hack an IIS webserver
source:prelert.com
64. Low and Slow – Automated Logins
user failing logins all
day
= “dc(date_hour) over user”
source:prelert.com
65. Machine Learning in Event Correlation
• Reduce event clutter, false positives and extensive rules
maintenance
• Events are auto-grouped together (supressed, de-duped)
• Easily provide feedback on auto-grouping of events &
alerts
source:conf2016,splunk
67. (Security) Data Scientist
Data Science Venn Diagram by Drew Conway
• “Data Scientist (n.): Person who is better at statistics than any software
engineer and better at software engineering than any statistician.”
-‐-‐Josh Willis, Cloudera
68. Data Science Cycle For Security
Determine
Use-Case
Computational
Scaling/Storage
Machine Learning &
Anomaly Detection
Model
Model
Testing
Refinement
Alerts & Visualization
Data Mining &
Exploration
Data Validating &
Cleaning
source:conf2016,splunk
69. Example : Email Use-Case
Your company has been hit with a large
number of phishing emails that were not
detected by traditional signature-based tools
Several employees have clicked on the
phishing link and entered their credentials
The adversary has taken over several
accounts and sent thousands of additional
emails, internal and external
Use-Case
Deep Dive
source:conf2016,splunk
70. Where Are We In The Platform?
Log Sources
Model Testing
& Validation
Alerts &
Visualizations
Exploration Mining
Cleaning Validation
API
Short Term Storage
3rd Party Computations
Machine Learning
Anomaly Detection
Use-Case
Deep Dive
SIEM Platform
source:conf2016,splunk
71. 3rd party ML Calculations All are open source products
source:conf2016,splunk
72. Data Mining & Exploration
What looks interesting in this sourcetype?
What could be used to detect an anomaly?
What is important to note about the events?
Send an email to yourself, then to a co-worker, then
to several people, etc. as a validation test; trace the
actions through Splunk
ML & AD for Security Best Practice:
Validate data by viewing your
own actions on the network
sourcetype="MSExchange:2010:MessageTracking"
source:conf2016,splunk
74. ML & AD Model
What features do we choose? Supervised?
Unsupervised? Classification? What statistical model do
we choose?
Start by clustering all data
• Splunk “cluster” command for text and “kmeans” for numerical fields
| stats count by {field being measured}
ML & AD for Security Best Practice:
From an incident response perspective,
highly structured and single feature
data is required to minimize time
considering false positives
source:conf2016,splunk
76. Training Data And The ML Process
Collect a set of training data (univariate/single feature/single field)
• In our case, it is 60-120 days worth of daily email totals
• Next, split the data by time into 3 groups: training set, cross-validation set,
test set
Determine if your dataset is Gaussian (Normal Distribution)
ML & AD for Security Best Practices:
-Split historical data 60-20-20 into training, cross-validation, and test sets
source:conf2016,splunk
77. Algorithm Selection
For normal distributions, Inter-Quartile Range (IQR) is a good place to start
We can test back in Splunk for specific cluster users
Other options available include:
–Scikit-learn.org has the python modules
–MATLAB, GNU Octave, and R all have extensive ML and AD packages
–Python has easy Gaussian test algorithms (used in this example)
• scipy.stats.mstats.normaltest
• scipy.stats.shapiro
Scikit-Learn has in-depth explanations of each algorithm and command
descriptions such as “fit(x)” and “predict(x)”, etc.
source:conf2016,splunk
78. Model Testing: 1
sourcetype="MSExchange:2010:MessageTracking" sender="xxxx@xxxx.com" recipient_count!=NONE | dedup message_id sortby _time |
table _time directionality sender recipient message_subject message_id recipient_count total_bytes | timechart sum(recipient_count) as
daily_total span=1d | eventstats median(daily_total) as median, p25(daily_total) as p25, p75(daily_total) as p75, mean(daily_total) as
mean | eval iqr = p75 - p25 | eval xplier = 2 | eval low_lim = median - (iqr * xplier) | eval high_lim = median + (iqr * xplier) | eval
anomaly =
False Positive False Positives
TruePositive
if(daily_total < low_lim OR daily_total > high_lim, daily_total,0) | table _time daily_total anomaly source:conf2016,splunk
79. Model Testing : 2
sourcetype="MSExchange:2010:MessageTracking" sender="toby.ryan@emerson.com" recipient_count!=NONE | dedup message_id
sortby _time | table _time directionality sender recipient message_subject message_id recipient_count total_bytes | timechart
sum(recipient_count) as daily_total span=1d | eventstats median(daily_total) as median, p10(daily_total) as p10, p90(daily_total) as p90,
mean(daily_total) as mean | eval iqr = p90 - p10 | eval xplier = 2 | eval low_lim = median - (iqr * xplier) | eval high_lim = median + (iqr *
xplier) | eval anomaly = if(daily_total < low_lim OR daily_total > high_lim, daily_total,0) | table _time daily_total anomaly
source:conf2016,splunk
80. Validating Models
• How can we validate models?
Precision =
# of correct positive values
# of all positive results
# of correct positive values
# that should have been positive
Recall =
precision x recall
precision + recall
F1 Score = 2
F1 Score is the harmonic mean, or average of rates, where F1 is
best at a value of 1, and worst at a value of 0.
First model: F1 = 0.4
Second model: F1 = 1.0
Beware of missing false negatives by tuning too much
too quickly; tuning is an iterative process over time
8
0
source:conf2016,splunk
81. Alerts & Visualizations
• The output of the off-Splunk calculations can be picked
up by the Splunk UF or written to a flat file
• Allows the user to capitalize on the Splunk interface
• Advantages/Disadvantages of Indexing and
Sourcetyping:
• Treat like any other data source for calculations
• Technically “re-indexing” data, however anomaly data sets
will be small
source:conf2016,splunk
82. Refinement
• Treat different clusters with different models
• Continually validate data and results
• Understand why false positives come up
• Add length to training data time if possible
• If a cluster is not Gaussian, try other models, or try to fit the data to
a Normal Distribution
• Compare simple rule-based models such as 3 x mean = anomaly
source:conf2016,splunk
83. Domain Expert on Insider Email Analytics
Consider not only a large number of recipients outside a user’s normal
behavior, but consider the number of new recipients
What is the average number of new recipients an employee emails each
day? One? Five? Establish a set of training data and record the unique
recipients over 60 days
Create an anomaly detection that fires when the number of new
recipients exceeds the baseline variance
Add to the “# of recipients per day” data for higher fidelity alert.
source:conf2016,splunk
84. Key Takeaways
• Machine Learning is an evolution in the tools available to us
• ML is not one thing, it’s many different types of things that can
be applied to different types of problems
• ML applications and techniques vary so like any other tool, it
helps to use the right tool for the right problem space
• SIME enhance capability to support ML algorithms and make
our life easier.
85. Machine Learning in Splunk ITSI
Adaptive Thresholding:
• Learn baselines & dynamic thresholds
• Alert & act on deviations
• Manage for 1000s of KPIs & entities
• Stdev/Avg, Quartile/Median, Range
Anomaly Detection:
• Find “hiccups” in expected patterns
• Catches deviations beyond thresholds
• Uses advanced proprietary algorithm
86. User Behavior Analytics (UBA) in Splunk
• Understand normal & anomalous behaviors for ALL users
• UBA detects Advanced Cyberattacks and Malicious Insider Threats
• Lots of ML under the hood:
– Behavior Baselining & Modeling
– Anomaly Detection (30+ models)
– Advanced Threat Detection
• E.g., Data Exfil Threat:
– “Saw this strange login & data transfer for user mpittman at 3am in China…”
– Surface threat to SOC Analysts
87. Splunk Machine Learning Toolkit
Assistants: Guide model building, testing & deployment for common objectives
Showcases: Interactive examples for typical IT, security, business, IoT use cases
SPL ML Commands: New commands to fit, test and operationalize models
Python for Scientific Computing Library:
300+ open source algorithms available for use
Build custom analytics for any use case
20