4. Possible Scenarios
Ref Description of example scenario
1 Permeant loss or 2+ staff members (Death /dismissal/leaving) within 1 month.
2 Contagious illness causes shortage of staff. E.g. new flu strain causes ½ staff to be absent >1 week.
3 Training and holiday commitment cause shortage of staff. E.g. 2 staff down > 3 days
4 Fire guts office all equipment (laptops/computers/screens/phones/printers) lost.
5 Break-in leads to the theft of laptop and desktop computers from OxCERT office.
6 Loss of service – plumbing, heating, telephony, internet access (VoIP)
7 Evacuation due to gas leak - unexpected loss of access to offices (>4h). Offices undamaged
8 Unexpected short term loss of mains power to a data centre (<2h). No damage to equipment.
9 DDOS on JANET causes loss of internet connectivity for a prolonged period > 4h.
10 Loss of fibre connectivity between DC’s
11 Incident causing irrecoverable loss of equipment at data centre eg fire.
12 Loss of mains power to OxCERT offices in Wellington Square <2h.
13 [With in Uni] Loss of VM in hosting
14 [With Vendor] Disruption to AV signature distribution mail and desktop AV cannot be updated
15 Component failure on the of the server acting as XEN (VM) host cause crash and failure to restart.
16 Cryptolocker style compromise on NAS lead to data becoming irretrievable due to encryption.
17 Rootkit infection of bastion host requires it to be isolated for investigation and rebuild.
18 Police seizure of server for criminal investigation.
4
5. Possible Scenarios
Ref Description of example scenario Resource impacted
1 Permeant loss or 2+ staff members (Death /dismissal/leaving) within 1 month.
Lack of people2 Contagious illness causes shortage of staff. E.g. new flu strain causes ½ staff to be absent for >1 week.
3 Training and holiday commitment cause shortage of staff. E.g. 2 staff down > 3 days
4 Fire guts office all equipment (laptops/computers/screens/phones/printers) lost.
Lack of
Access
5 Break-in leads to the theft of laptop and desktop computers from OxCERT office.
6 Loss of service – plumbing, heating, telephony, internet access (VoIP)
7 Evacuation due to gas leak - unexpected loss of access to offices (>4h). Offices undamaged
8 Unexpected short term loss of mains power to a data centre (<2h). No damage to equipment.
Lack of
Infrastructure
9 DDOS on JANET causes loss of internet connectivity for a prolonged period > 4h.
10 Loss of fibre connectivity between DC’s
11 Incident causing irrecoverable loss of equipment at data centre eg fire.
12 Loss of mains power to OxCERT offices in Wellington Square <2h.
13 [With in Uni] Loss of VM in hosting 3rd Party
service14 [With Vendor] Disruption to AV signature distribution mail and desktop AV cannot be updated
15 Component failure on the of the server acting as XEN (VM) host cause crash and failure to restart.
Miscellaneous
16 Cryptolocker style compromise on NAS lead to data becoming irretrievable due to encryption.
17 Rootkit infection of bastion host requires it to be isolated for investigation and rebuild.
18 Police seizure of server for criminal investigation.
5
6. Our outlook : Guarded optimism
Hope for the
best,
plan for the
worst6
7. Artefacts & Audiences
Business Impact
Assessment (BIA)
Business
Continuity
Plan (BCP)
Disaster
Recovery
ProceduresBackup
arrangements
Keeping running….
Restarting from scratch
Parameters
EngineeringManagement
Potential
Scenarios
Operations
Exercises
1
2 3
4
7
8. Principles (& dog food)
❖ Eating your own dog food (Credibility)
Get our own house in order before we start laying
down the law to others.
❖ Being open (& setting users expectations)
Be transparent about the service levels we set & be
held to account by our users we fall short.
❖ Building a predictable response
Do the engineering, planning and testing to have
confidence we can achieve the targets
8
9. CERT Requirement
OxCERT must continue to operate even
where there is significant damage to, or
sustained hostile activity against, ourselves
or the network infrastructure of the
University we defend
9
Be Resilience
10. Cyber Resilience - is this new?
Traditional information security
Assumes a stable environment,
evolutionary change
Aim: Deal effectively with known risks /
threats
❖ Best practice
❖ Lessons learned
❖ Risk adverse
10
Cyber Resilience (Culture)
Assumes turbulent environment / disruptive
technologies, step changes which are
unknown / unpredictable
Aim : Anticipate & adapt
❖ Agility - Ability to change
❖ Anticipating / Forward looking
❖ Innovation / creativity to meet threats
11. Cyber Resilience - is this new?
Traditional information security
Assumes a stable environment,
evolutionary change
Aim: Deal effectively with known risks /
threats
❖ Best practice
❖ Lessons learned
❖ Risk adverse
11
Cyber Resilience (Culture)
Assumes turbulent environment / disruptive
technologies, step changes which are
unknown / unpredictable
Aim : Anticipate & adapt
❖ Agility - Ability to change
❖ Anticipating / Forward looking
❖ Innovation / creativity to meet threats
Getting better Getting different
12. Business Organisation Impact Assessment
Its not about how or why or the likelihood
of a failure, just focus on ‘if’
13. Artefacts & Audiences
13
Business Impact
Assessment (BIA)
Business
Continuity
Plan (BCP)
Disaster
Recovery
ProceduresBackup
arrangements
Keeping running….
Restarting from scratch
Parameters
EngineeringManagement
13
Potential
Scenarios
Operations
Exercises
14. What did we needed to think
about?
Geographic locations OxCERT operates from
The services we offer and the relative priorities for recovering them
Dependancies
❖ Stakeholders who depend on OxCERT
❖ External systems, services, vendors OxCERT depends on
Single points of failure in our infrastructure
Key person risks in the team
14
15. The shape of a disaster
15
Time
BAU
Service
Level
Lastgoodbackup
100%
Recovery Time ObjectiveRPO
Maximum Acceptable Outage
Response
Full Service
restored
Minimum
Acceptable
Service
Level
Downtime
Recovery
Failed
Disaster
strikes
Recovery
Achieved
16. The shape of a disaster
16
Time
Service
Level
100%
Recovery Time Objective
Response
Minimum
Acceptable
Service
Level
DowntimeDisaster
strikes
Recovery
Achieved
ç
17. The shape of a disaster
17
Time
Service
Level
100%
Maximum Acceptable Outage
Response
Full Service
restored
Minimum Acceptable
Service Level
Recovery
Failed
ç
Recovery
succeed
Disaster
strikes
18. OxCERT BIA: On one page….Service Name Relative
priority
Recovery time
objective (RTO)
Maximum
Acceptable Outage
(MAO)
Security
Incident
Response
1 3 days 1 week
Network
monitoring
2 1 week 2 weeks
Advising and
alerting
(vulnerabilities)
3 2 weeks 2 months
A Business
Impact
Assessment
on a page
19. How service impact grows over
time…
eg Security incident response service
19
Catastrophic
MAO *
High
* *
Acceptable
* *
Marginal
* *
Duration 2h 4h 8h 24h 48h 1 week 2 weeks 1month
20. BIA Reflections
Conducted between Q3/Q4, 2016
❖ Planned 9.5 days days effort, an underestimate
❖ Biggest issue - capturing what we did in a structured way.
Keep it simple :
Focus on identifying a few high level services (divided these down into internal activities)
Quick wins! : Analysis helped us identify:
• Single points of failure - firewall, Office VPN server
• Key person risks - sysadmin skills
Buy-in - Targets were:
• Reviewed by team & Management
• Signed off by CISO
20
22. Artefacts & Audiences
22
Business Impact
Assessment (BIA)
Business
Continuity
Plan (BCP)
Disaster
Recovery
ProceduresBackup
arrangements
Keeping running….
Restarting from scratch
Parameters
EngineeringManagement
22
Potential
Scenarios
Operations
Exercises
23. No
3. Activate
the Plan?
1. Disaster Occurs
2. Perform an initial
damage assessment
Stop
Yes
Recogniz
e
Phase Objective
1
DISASTER
OCCURRENCE
Safety of staff and visitors
2
INITIAL DAMAGE
ASSESSMENT
Develop an initial overview of the
situation
3
ACTIVATING THE
PLAN
Decide whether to activate the plan
based on the initial damage
assessment of locations and system
23
24. (5). Relocate Recovery
Team to alternate site &
establish operations?
4. Form Recovery Team
& Designate Coordinator
Yes
React Phase Objective
4
FORM
RECOVERY
TEAM
Form the recovery team, designate a
recovery coordinator
5
(RELOCATE TO
ALTERNATE
SITE)
Establish a working environment from
which to conduct the recovery and
resume services.
24
25. 7. Incident Coordination.
Execute specific recovery
procedures
8. Stand-down the
Recovery Team &
Transition back to
normal operations
Recover
6. Open an incident
log & Communicate to
key staff & teams
Phase Objective
6
OPEN AN
INCIDENT LOG
Maintain a record of key milestones and
decisions taken during in the recovery
process
EXTERNAL
COMMUNICATION
ACTIONS
Inform key staff and teams that recovery is
underway
7
INCIDENT
COORDINATION
Limit damage, prioritise performing
recovery procedures, estimate recovery
time.
8
STANDING
DOWN
Establish business as usual, inform key
staff and teams
26. No
3. Activate
the Plan?
1. Disaster Occurs (5). Relocate Recovery
Team to alternate site &
establish operations?
7. Incident Coordination.
Execute specific recovery
procedures
8. Stand-down the
Recovery Team &
Transition back to
normal operations
2. Perform an initial
damage assessment
4. Form Recovery Team
& Designate Coordinator
Stop
Yes
Recogniz
e
React Recover
6. Open an incident
log & Communicate to
key staff & teams
A Business
Continuity
Plan on a
page