Security in a Site Reliability Engineering (SRE) context with a focus on being pragmatic just makes sense. In this talk, we will look at 4 key areas where SRE and Security tribes can join forces and influence the overall business. This is a lab/discussion session.
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
A Pragmatic Union: Security and SRE
1. #RSAC
SESSION ID:
James Wickett
A Pragmatic Union: Security and SRE
LAB2-T14
DevSecOps
Verica
@wickett
Internet Incident Librarian
Verica
@courtneynash
Courtney Nash
2. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
What we say here, stays here.
(sort of)
8. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
Modern Traffic
https://medium.com/adobetech/adobe-contributes-to-netflixs-vizceral-open-source-code-dec7aaf5d43e
9. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
Failure is an inevitable
by-product of a complex
system's normal
functioning
12. #RSAC
SRE and Security, a lot in Common Ground
12
Safety Margin
Availability
Chaos Engineering
13. #RSAC
Definitions
13
Common ground is a critical component of joint activity—things
we do together every day at work, regardless of our roles.
Joint Activity is a mutual agreement to coordinate on both content
(the Who and What) and process (the How and When) towards a
shared goal.
14. #RSAC
Effective Coordination
14
Interpredictability is knowing how other agents behave in the
system
Directability is the ability to influence others to reframe or
reframe yourself
Common Ground builds group intuition in real time
15. #RSAC
Common Ground
15
"Two people's common ground is, in effect, the sum of their mutual,
common, or joint knowledge, beliefs, and suppositions." —Herbert H.
Clark
Common ground is not just agreeing on what you will do and
when. It is a form of self-awareness.
Teams must understand how common ground gets staked out and
used in different situations.
And what is most important is to understand that it is not an end-
point or a destination: it is an ongoing process.
16. #RSAC
Possible areas of common ground for SRE and Security
16
Do we share a basic understanding of the system(s) we are
dealing with?
Do we share and agree on the status of what has transpired in
that system?
What changes in knowledge exist in/across teams since we
started working together?
17. #RSAC
How the Breakouts will work
17
5 minutes long
Notetaker for each group decided alphabetically based off the
matching character to breakout # of zoom display name
Give space for others to share
Report out in chat at the end of breakouts
Only report out shareable learnings/discussion!
We will be collating all the results in a shared doc
19. #RSAC
Breakout 1
19
What problems are you experiencing related to the availability,
security, and chaos engineering?
Bonus: how is it going between SRE and Security?
5 minutes, go!
Notetaker is decided alphabetically by first character of display
name
22. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
Drifting into failure is a gradual,
incremental decline into
disaster driven by
environmental pressure, unruly
technology and social
processes that normalize
growing risk. No organization is
exempt from drifting into failur
e
23. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
Safety Margin
expresses how much
stronger a system is
than it needs to be for
an intended load
24. #RSAC
24
JOHN ALLSPAW
"AMPLIFYING SOURCES OF RESILIENCE"- QCON LONDON 2019
“Resilience is not a property that a
system has, resilience is something
that a system does
.
25. #RSAC
Resilience is…
25
A rebound from trauma and a return to equilibrium
Opposite from brittleness and able to be extensible when
surprises challenge the system boundaries
Architected with an ability to adapt to future conditions
26. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
Resilience
Resilience represents the ability not only to recover from
threats and stresses but to perform as needed under a
variety of conditions and respond appropriately to both
disturbances as well as opportunities.
27. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
Failures are a systems problem because there is
not enough safety margin.
- Adrian Cockroft
30. Presenter’s Company
Logo – replace or delete
on master slide
#RSAC
https://www.wired.com/story/opinion-the-plane-paradox-more-automation-should-mean-more-training/
Contrary to popular myth, pilot error is not the cause of most
accidents. This belief is a manifestation of hindsight bias and the
false belief in linear causality. It’s more accurate to say that pilots
sometimes
fi
nd themselves in scenarios that overwhelm them. More
automation may very well mean more overwhelming scenarios. This
may be one reason why the rate of fatal large commercial airplane
crashes per million
fl
ights in 2020 was up over 2019.
Safety Automation
31. #RSAC
Breakout 2: Safety and Resilience
31
Where are Safety and Security at odds in your organization?
What areas of common ground can you find between them?
5 minutes, go!
Notetaker is decided alphabetically by second character of
display name
34. #RSAC
Security Chaos Engineering
34
The identi
fi
cation of security control
failures through proactive
experimentation to build con
fi
dence
in the system’s ability to defend
against malicious conditions in
production.
35. #RSAC
4 Components of Security Chaos Engineering
35
Define expected behavior of a security defense
Hypothesize that when security turbulence is introduced it will
be either prevented, remediated, or detected.
Introduce a variable that introduces security turbulence.
Try to disprove the hypothesis by looking for a difference in
expected behavior and actual behavior
36. #RSAC
SCE experiments don’t…
36
validate a config; they exercise it
check authentication privileges; they attempt to thwart them
trust network settings; they send real traffic
check application policy; they interact with the application
build a model from infrastructure templates; they build
understanding from experimentation
40. #RSAC
40
by writing and running runtime
checks to ensure that the
application is always deployed
correctly, configured correctly,
and is running safely.
Chapter 13: Operations
41. #RSAC
Breakout 3: Chaos Engineering
41
What is the attitude towards Chaos Engineering (or SCE) at your
organization?
What areas of Common Ground can you find for Security Chaos
Engineering in your organization?
Bonus, what SCE experimentation could you get up and running
in your organization?
Notetaker = 3rd character, 5 minutes, go!
44. #RSAC
SLOs, SLAs, SLIs
44
Service Level Objective - target reliability (or security) for a given
service
Service Level Agreement - contractual obligation to customers
Service Level Indicator - The assessment of service outcome that
you think matters to users
https://sre.google/workbook/implementing-slos/
45. #RSAC
Error Budgets
45
Error Budget = 100% - SLO
Error budget is the slack in the system
Allows balance between feature velocity, reliability, (and security)
46. #RSAC
Breakout 4: Availability
46
There is an auth service for a FinTech global payment company
that regularly deals with AccountTakeOver (ATO) attacks. Each
month they assess if the tooling is too sensitive or not.
How would you setup an SLO for the service?
What would go into your error budget determination?
Notetake = 4th character and 5 minutes, go!
48. #RSAC
Apply What We Have Learned Today
48
Next week you should:
– Identify one area to build Common Ground
In the first three months following this presentation you should:
– Find areas to experiment within security or SRE, to collect data for future
collaboration
– Use Availability incidents and outages as way to collaborate between tribes
Within six months you should:
– Have had conversations around Safety, Availability, and Chaos between groups
– Built automation and experimentation under Security Chaos Engineering