Presented by: Michael Toecker, Digital Bond
Abstract: Control Systems are responsible for the safe and reliable governing of physical processes, and are designed to report conditions that could affect reliable operations to operators for action. These conditions may vary in their severity, from minor inconveniences to those that can bring the process to a full halt. While engineers have predicted certain events and consequences, others are “unknown unknowns”, and may only be detected due to variances from normal function.
Cyber security conditions are similar in nature. Cyber security conditions can vary in severity and cyber security professionals can classify and alert on some, but not all cyber security events. In this presentation, Michael Toecker will discuss cyber security conditions that are known, and that could be integrated into the operational display.
Treating cyber security events as analogous to control system events has many benefits and drawbacks, and Toecker will expand on criteria for determining what is appropriate for an operator display, and what is not. The purpose of this presentation is to demonstrate that cyber security can have a place in operational decisions, so long as conditions are carefully analyzed and response actions developed beforehand.
3. Monitoring and Response of Cyber
Security Events Originating from the
Control System Parallels the Monitoring
and Response of Process Events
The Premise
4. {
Ñ ICS Operations was similar to Security
Operations
Ó ICS had alarms, SecOps had alarms
Ó ICS had events, SecOps had events
Ó ICS had historical points, SecOps had
voluminous logs
Ó ICS had 24/7 Operators, SecOps had
Analysts (some 24/7, others not)
Ó ICS had a responsibility for monitoring
safe and effective productions, SecOps had
responsibility for ensure secure and
trusted operations
I Spent a Year
working as a
Security Guy
in an
Operations
Environment
ICS Ops vs. SecOps
5. {
Task
SecOps
ICSOps
Visualizing Data using Graphs,
Charts, etc
X
X
Providing Status Indicators when
parameters went out of normal
X
X
Directed Field Personnel to Take
Specific Actions based on Events or
Alarms
X
X
Reviewing of Logs, Records, and
Other Data to Improve Efficiency
and Locate Problem Areas
X
X
Investigate for Compliance and
Effect on Process, and find ways to
Prevent, Detect and Respond
X
X
What I Often
Saw in ICS
Operations was
Paralleled in
What I was
Doing
Parallels
6. {
Ñ …was the data.
Ó My data was security logs, their data
from process points.
Ó But we were both identifying
conditions that could impact our
production or compliance, and taking
some action to correct
I was an
Engineer with
Specialized
Knowledge of
Specific
Equipment
What was Different…
7. {
Ñ There is an emphasis on procedure,
and process when faced with issues
Ñ Troubleshooting where advanced
knowledge is required is conducted
by those with the knowledge
Ñ Operators follow known actions that
will return a system to a stable state,
usually developed by process
engineers.
Operators
Monitor &
Respond, but
Do Not Always
Possess
Specific
Knowledge
The Role of Operators
9. Ñ Lack of Understanding and Confusion about
Computer Security
Ñ Owner A]itude is that Security has nothing to
Do with Operations
Ñ Leads to Reduction in Situational Awareness
Ñ Operators Don’t Know What Actions To Take
The Problems
10. Ñ Proper Notification Reduces Response Time to
Security Incidents
Ñ Regulatory Requirements can be Met With
Existing Personnel
Ñ Alerts and Events directly to 24/7 Personnel
look Awesome as Compensating Controls
The Benefits
11. {
Cyber Security
Events &
Incidents
Detectable w/
Security
Monitoring
Security Events
Operators
Could Respond
To
Not a
Substitute for a
Focused
Security
Monitoring
Program
The Limitations
12. {
Monitor,
and
Analyze
Identify
Security
Conditions
Identify
Operational
Events
Develop
Procedures
for Action
Implement
Condition
and
Procedure
Security
Monitoring
Program
Should Feed
into
Conditions for
Operator Alerts
The Role of Security
13. {
Monitor
Data Points
Identify
Process
Conditions
Identify
Operational
Events
Develop
Procedures
for Action
Implement
Condition
and
Procedure
This looks a lot
like Process
Intelligence
Process, the
only difference
is the Analysis
and
Knowledge
….wait a minute.
14. Identify Specific Clear Cyber Security Events
Determine Events Appropriate for Operator
A]ention
Create Operations Procedures for Actions
Develop a Detection and Presentation Strategy
The Process
16. {
…Clear
• No Ambiguity
• Straightforward Yes/No
Decision Point
…Derivable
• Sourced Directly from Control
Systems Security Data, not from
Intuition or Analysis
…Actionable
• Specific Actions can be taken on
receipt of the Event
• Not Dependent on Other
Events, or on Further Analysis
An Operational
Cyber Security
Event Should
Be..
Identify and Define
17. {
Ñ Questions to Ask
Ó What do my regulations tell me to be
concerned with?
Ó What do various standards bodies tell
me to be concerned with?
Ó Do I have specific policy statements
that suggest alerting, or 24/7 response?
Ó What Lessons Learned Do I have
related to Cyber Security?
Identify Cyber
Security
Conditions to
Alert On
Identifying Events
This is my polite way of saying
“If You Got Hacked, How Did
it Happen?”
18. List of
Security
Conditions
Regulations
Require
Monitoring
and Action
Standards
suggest an
Approach
Security
Policy may
Specify
Conditions
Lessons
Learned from
Security
Incidents
Determine Conditions
19. {
Ñ CIP-‐‑007 R4 – Malicious Software Prevention
Ó Paraphrase: ~..shall use anti-‐‑virus software
to detect malware on all Cyber Assets within
the ESP~
Ó Conclusion: I should alert on anti-‐‑virus
detections
Ñ CIP-‐‑007 R5 – Monitoring Electronic Access
Ó Paraphrase: ~monitoring processes shall
detect and alert for a]empts at or actual
unauthorized access~
Ó Conclusion: A]empts at unauthorized access
include incorrect passwords, alert on that.
Regulations,
such as NERC
CIP, may
provide clues
as to what
events should
be monitored
Regulations
Well, I did say clues…
Source: NERC CIP Standards, V3
20. {
Ñ Section 3.2.2 – Signs of an Incident
Ó ~Too many indicators exist to
exhaustively list them~
Ó ~Common ones include multiple failed
login a]empts, deviations from
normal network traffic, filenames with
unusual characters..~
Standards can
help as well,
but still are
clues not firm
guidance
Standards
Source: NIST SP-‐‑800-‐‑61
21. {
Ñ What I’ve seen in the past:
Ó ~Addition and Modifications of Users
shall be conducted through the change
control process~
Ó ~New Software on Control Systems
requires approval by the Senior
Manager~
Conditions
may exist in
your
corporations IT
Security
Policies
Policy Remarks
22. {
Ñ Good Security Comes with
Experience,
Ó Most Experience Comes from
Failures in Security
Ñ ….but it doesn’t have to be YOUR
Failures in Security
Ó Talk, Listen, Learn
Why
Information
Sharing is
Important.
Lessons Learned
23. {
There are tons
of events
available, but
not all are
relevant or
appropriate for
Operations
Complex, Irrelevant
24. {
• Start with from general
security conditions
• Trim to Specific Events
within those categories
Top
Down
• Start with Every Potential
Event that Could Be
Generated
• Trim to Specific Events from
the Potentials
Bo]om
Up
There are Two
Main Methods
for Identifying
Events
Methods to Identify
25. {
Ñ Specific Classes of Computer Security
Events
Ó Virus Detection, Failed Logins,
Disallowed Ports, etc
Ó Good Source of Some Classes – NIST
SP-‐‑800-‐‑53
Ñ Useful for PC based systems, which
often have a huge amount of capacity
for security
Top Down is
Good For
Systems with
Many Potential
Events
Top Down Approach
27. {
Ñ Enumerate the Security Capabilities of
the Device. Examples:
Ó Provides Specific Syslog Evidence
Ó Sets a Point when a Login Threshold
has been reached
Ñ Useful for Devices, where Capability
is often limited
BoYom Up is
Good for
Systems with
Limited
Capability for
Security
Bo]om Up Approach
28. {
Review of Manuals and Datasheets can identify
detectable Cyber Security Events
Bo]om Up Example
Source: S&C IntelliRuptor Instruction Sheet 766-‐‑560
29. {
Ñ Top Down
Ó Allows you to set criteria, and then delve
into system to find triggers to meet it
Ó Avoids the complexity of ge]ing into the
weeds of system events
Ó May miss important conditions due to
avoiding those same weeds
Ñ Bo]om Up
Ó Complex, but most Detailed
Ó Requires analysis of many events that will
likely never make it in front of an operator
There are
advantages and
disadvantages
of each
Approach
Compare and
Contrast
30. {
Ñ Windows Based Computers are the
obvious systems to use Top Down
Ó Event Heavy, Highly Complex
Ó Events were designed from an
incident response perspective, not
from an alert perspective
Use Top Down
when a system
is highly
capable of
reporting
security events
to narrow your
range
When to Use an
Approach
31. {
Ñ Systems like PLCs, Controllers, some
Network Devices have limited
capability to report security status
Ó Won’t be able to simply define events,
you’ll have to work with what’s there
Use BoYom Up
when working
with devices
that report on
few security
conditions
When to Use an
Approach
32. {
Condition
Source
Anti-‐‑Virus Detection
NERC CIP-‐‑007 R4
User Modified or Added
NERC CIP-‐‑007 R5
Security Logs Deleted
NERC CIP-‐‑007 R6
Security Logs Full
NERC CIP-‐‑007 R6
Excessive Incorrect Login
NERC CIP-‐‑007 R6
Use of Removable Media
Good Practice
New Software Installed
IT Policy
Logging Options
Changed
IT Policy
The End Result
of this Analysis
is a List of
Conditions to
Alert On
List of Conditions
Note: This list is far from comprehensive
34. {
Ñ Is the Condition a Clear Cyber
Security Event?
Ñ Is the Condition Derivable directly
from Logs, Alerts, and other
evidence?
Ñ Is the Condition Actionable by
Operators?
Not Every
Condition is
Appropriate
for Operator
Notification
Appropriate for
Operators
35. {
Condition
Source
Anti-‐‑Virus Detection
NERC CIP-‐‑007 R4
User Modified or Added
NERC CIP-‐‑007 R5
Security Logs Deleted
NERC CIP-‐‑007 R6
Security Logs Full
NERC CIP-‐‑007 R6
Excessive Incorrect Login
NERC CIP-‐‑007 R6
Use of Removable Media
Good Practice
New Software Installed
IT Policy
Logging Options
Changed
IT Policy
Unclear
Conditions are
Removed from
the List
Is it Clear?
Note: This list is far from comprehensive
36. {
Condition
Source
Anti-‐‑Virus Detection
NERC CIP-‐‑007 R4
Security Logs Deleted
NERC CIP-‐‑007 R6
Security Logs Full
NERC CIP-‐‑007 R6
Excessive Incorrect Login
NERC CIP-‐‑007 R6
Use of Removable Media
Lesson Learned
Remove
Conditions
Incapable of
being Derived
from Evidence,
or Require
Analysis
Is it Derivable?
Note: This list is far from comprehensive
37. {
Condition
Detection
Method
Reliability
Anti-‐‑Virus
Detection
Windows Event
Log
Very Reliable, Test Indicates
an event generated on each
detection in SYSTEM log
Security Log
Deleted
Windows Event
Log
Very Reliable, an explicit
event is created on clearing
Excessive
Incorrect Login
Windows Event
Log
Reliable, so long as the
account lockout se]ings in
SECPOL.msc are set correctly
Use of
Removable
Media
May require 3rd
party program.
Not Always Possible without
3rd Party Program
How Reliable
are the
Detection
Methods? Do
they have
potential false
positives?
Reliable and Unreliable
Conditions
Note: This list is far from comprehensive
38. {
Remove
Conditions that
an Operator
cannot
Realistically
take Action On
Is it Actionable?
Condition
Source
Anti-‐‑Virus
Detection
NERC CIP-‐‑007 R4
Security Logs
Deleted
NERC CIP-‐‑007 R6
Security Logs
Full
NERC CIP-‐‑007 R6
Excessive
Incorrect Login
NERC CIP-‐‑007 R6
39. {
Why were some of the conditions removed?
An Aside…
Ñ User Modified or Added
Condition
Reason for Removal
User Modified
or Added
Not Clear, as there are legitimate reasons for adding, or
modifying a User and these reasons aren’t apparent
without analysis.
Security Log
Full
Not Actionable, as operators should be doing
maintenance and admin functions.
Removable
Media
Not Derivable, on most systems as is. May require a 3rd
Party program to do a decent job of this.
40. {
Ñ Example: Removable Media Detection
Ó Wasn’t able to do this in Native
Windows in a Clear and Derivable
manner
Ó Use of Third Party tools can change
this, making it possible to monitor and
alert
A Previously
Rejected
Condition can
become valid
with New
Information or
Technology
When Conditions
Change
41. {
Ñ USB Based Infection Lesson Learned
Ó New USB Showed up in Registry Change
Ó Auto-‐‑Run Shows up in Registry Change
Ó Addition of Programs to the “Run” and
“RunOnce” keys in the Registry
Ó Copying of Files into “System”,
“System32”
Ñ Is this Clear? Definable? Actionable?
Some of the
More
Advanced
Conditions
That We Can
Define
Let’s Get Crazy…
42. {
List of
Conditions has
been
generated,
what next?
What Comes Next?
Condition
Detection
Method
Reliability
Anti-‐‑Virus
Detection
Windows
Event Log
Very Reliable, Test
Indicates an event
generated on each
detection in SYSTEM log
Event Log
Was Cleared
Windows
Event Log
Very Reliable, an explicit
event is created on
clearing
Excessive
Incorrect
Login
Windows
Event Log
Reliable, so long as the
account lockout se]ings
in SECPOL.msc are set
correctly
44. {
Ñ Notifying Operators of Cyber Security
Events is useless if the Operator has
no action to take
Ñ This guidance typically takes the form
of Operational Procedures
Ñ Each Event must have an appropriate
action to be taken
This is Now a
Procedure
Exercise
Operator Actions
45. {
Ñ “Notify Lead I&C Engineer by
Phone”
Ñ “Isolate Infected System From
Network by Disconnecting Ethernet”
Ñ “Call Out via Radio to check if invalid
login is from authorized user”
Be Succinct
and Specific
Guidelines for Actions
46. {
Ñ No IT Administrative Functions
Ñ No Maintenance Functions
Ñ Limit the Analysis Necessary
Ñ …and don’t give them someone else’s
work
Keep the
Guidance
within
Operator’s
Authorized
Abilities
Guidelines for Actions
49. {
Ñ Case in Point – Conficker (MS08-‐‑67)
Ó Highly Aggressive Worm which
impacts network communication
Ó Makes use of very reliable exploit in
Server service
Ó A]empts to brute force accounts
Ó Spreads over USB and removable
media as well
Some Cyber
Security Events
may Cause
Production
Impacts
Worst Case Scenario
50. {
Ñ A Highly Aggressive worm like Conficker
can have production consequences.
Ó Continuing to operate while this is going
on is risky.
Ó Who makes the decision to halt
production? Operator? Shift Supervisor?
Plant Manager?
Ñ Make sure the information gets to those
make the decision.
What guidance
would prepare
an operator for
these Alarms?
Worst Case Scenario
52. {
Ñ Most Cited:
Ó The Alarm Management Handbook
The High-‐‑Performance HMI
Handbook.
Ñ Wri]en by Bill Hollifield and Paul
Gruhn
Ó Of Course, Nothing Specific on
Security
There is
already a lot of
guidance on
development
of Operator
Displays
Operator Displays
53. {
Ñ Help Operators Perceive the Important
Security Data
Ñ Give Operators Data-‐‑in-‐‑Context
Ñ Help Them Comprehend the Situation in
Terms of the Process
Ñ Help Predict Future Status by Providing
Trending
Guidelines for
Cyber Security
Displays
Operator Displays
-‐‑ Tough right now… At least without giving access to an SIEM
54. Cyber Security
Master Display
Anti-‐‑Virus
Status Display
Users Status
Display
Removable
Media Status
Display
Event Log
Status Display
Concept Operator
Display
56. {
Ñ Many HMIs can accept SNMP Traps
Ó Often used for alerting when hosts
stop communicating
Ó Security tools can feed this, in certain
conditions
Ñ Security Logs don’t Translate Well
into traditional displays
Ó How do you ‘trend’ when you have
thousands of event ids?
Summary:
Limited, and
Nowhere Near
Ideal
Integration with
the HMI
58. More Research at S4
Ñ Digital Bond’s S4
Conference in Miami
Beach, January 2014
Ñ Got an Idea?
Ó Submit a presentation!
Ñ Details on
DigitalBond.com