Disaster recovery, emergency response and business continuity plans are usually developed when no disaster exists. We think we’ve covered all contingencies. We think we’ve trained all the appropriate players. We’ve tested. We’ve re-tested. We think we’re ready to face whatever event there is looming out their with our name on it! The real world has a nasty habit of triggering disasters at the least opportune time, often featuring a twist that throws plans into disarray.
This presentation focuses on three real-world plans, each of which with a fatal flaw. We will discuss elements that should be in a plan beyond the normal guidance from the Disaster Recovery Institute (DRI) and a set of actions that should be included in planning and preparation.
Linked Data in Production: Moving Beyond Ontologies
Harry Regan - Disaster Recovery and Business Continuity - "It's never so bad that it can't get worse"
1. It’s Never So Bad
That It Can’t Get Worse
A REVIEW OF DISASTER RECOVERY AND
BUSINESS CONTINUITY PLANNING IN
PRACTICE
HARRY REGAN
VP, SECURITY CONSULTING SERVICES
SECURICON, LLC
HTTP://WWW.SECURICON.COM
2. Agenda
• WhoWe Are
• The Magic of MixingTechnology and Humans
• Things DRITellsYou
• 3Tales from the Field
o Clouds of 9/11
o What if they threw a disaster and nobody came?
o Financial Services andY2K
• ScarTissue and Recommendations
• WhenYou’ve Got Lemons…
• Conclusions and Q&A
3. Who are we?
• Securicon provides security services primarily in the US and
Canada
• Our clients are generally from regulated industries
(Financial Services, Utilities, Manufacturing, Higher
Education, Healthcare), and Federal and local government.
• Broad base of experience in the integration of human and
social issues into the implementation and impact on
security
• Enterprise-level experience in developing COOP and BCP
plans.
• Always fascinated and amused by the BTAFFD* syndrome
* buttered toast always falls face down
4. The Magic of Mixing
Technology with Humans
• Technology makes the world work
• People make the world weird
• Business Continuity happens at the intersection of
people and technology– with one or more
emergencies thrown into the mix.
• Plans may be detailed and logical, but human
behavior is not as predictable as we’d like.
• Emergency scenarios can get complex–
be flexible– very, very flexible.
5. The Magic of Mixing
Technology with Humans
• The development of policies and
procedures is based on the
assumption that people are inclined
to obey the rules
• That is generally correct, however
people’s performance is a variable,
not a constant
• Introduce an emergency into the
mix and all bets are off
6. Best Intentions…
• We’re going to examine three case
studies from three different
industries.
• All three companies involved had
a Business Continuity Plan
• All three had a major failure then
the disaster arrived
• We’ll also look at a fourth case
where a company used a disaster
as a decision point for a business
decision
7. Things DRI Tells You…
Key Objectives…
• Safety is #1 priority in a emergency:
Protect people first, then assets and
resources
• Keep the business operating to the
extent possible
• Maintain basic communications
(e-mail, phone)
• Don’t let them see you sweat! (Web
site up, services available and shipping
with minimal disruptions)
• Maintain billing, accounting, and keep
revenue flowing
8. More Things DRI TellsYou…
• Your DR/BCP plan should have strategies for…
• Emergency Response and Operations Contingencies
• Actionable and detailed Business Continuity Plans at
a situational and granular Level
• Training and Awareness – for everyone, but
especially for key staff involved in the plan– they
have to pull it off!
• Maintaining andTesting DR and Business Continuity
Plans and Operability – and really do it!
• Public Relations and Crisis Communications–
reassure customers, vendors, suppliers
• Coordination with Public Authorities
9. 3 tales from the field
• All ItTakes Is People
o Shelter-in-Place approach
o Great plan, now where’s the staff?
• The Other 9/11 Issue
o The traditional DR contract approach
o Hurricane Gabrielle hits Florida
• Financial Services andY2K
o Comprehensive “situational plan”
o Y2K Plan used successfully (sort of)
10. The Other 9/11 Issue
• September 9, 2001 –Tropical Storm
Gabrielle forms off the west coast of
Florida in the Gulf of Mexico.
• September 11, 2001 – Hurricane
Gabrielle threatens western Florida
coast.
• A manufacturing company in central
Florida, already experiencing flooding
in their facility and data center from
heavy rain, declares a disaster and
went to exercise their contract with
their DR provider
• Scheduled DR site –
Sterling Forest, NY
• The request
“could not be accommodated”
11. The Other 9/11 Issue
• They had arranged for specific equipment
to be available at the DR site
o They assumed they could just “swap over” to
the DR site
o They further assumed they could just show up
with the tapes
• WhenYou Fail to Plan…
o They were a small company and had a very
basic but untested DR/BCP plan
o They had a DR contract with a big, reputable
name firm
o They kept backup tapes on site and planned to
FedEx them to the DR site when needed
12. The Other 9/11 Issue
• Lessons learned
o With an untested plan,
it was really iffy that
they could successfully
exercise the DR plan at
all
o With a 3rd party DR
contract, you may be
able to get your money
back if you “can’t be
accommodated”!
o Yes, their data center
flooded…
13. All It Takes Is People
• Picture rolling New England
hills, nestling a quaint little
mill town. In this town is a
manufacturing company
that makes specialty
products for the medical
industry
• “Shelter in Place” is a strategy some companies adopt– that’s the
approach this company chose– backups and redundant equipment
maintained on site.
• They maintained food, various beverages and water expecting
outages to be no more than “a couple of days”
14. All It Takes Is People
• The data center featured a
natural gas generator tied to the
city gas lines, so as long as they
had fuel, they had power
• The network featured divergent
carriers with failover
• They engineered their systems
to be all remotely administered
and operated so there was little
need for staff to be onsite – but
some functions had to be
manually attended.
• They had robust, tested remote
access processes.
15. All It Takes Is People
• But…
o Their DR/BCP documents had a
a very exacting “Bob will do X,
Frank will doY” approach.
o Sooner or later, they said,
they’d cross train folks.
o The disaster came before
“later” did.
16. All It Takes Is People
• The systems were up! No one
was available to do anything with
them, but they were up!
• Discovered many processes they
had not considered needed to
had to have someone on site for
operations support
• Also discovered that the phone
system and the PACS were
never moved to backup power
• In May of 2006, the area experienced severe flooding. All
telecommunications were out, roads impassable, residents
evacuated from the area.
17. All It Takes Is People
• Lessons learned
o It was a good plan! It was a
tested plan!
It didn’t go quite far enough
Cross-training participants is
important (but wouldn’t have
worked in this instance)
o Was their plan successful for
this event?
They were inaccessible for
several days, back in operation
within a week – so it met the
“couple of days” outage
scenario
All automated processes ran
There was no one in control for
two or three days
18. Financial Services andY2K
• Standing hotel
accommodations for
operations teams near
both data centers
• Situational BCP built with
input from each business
unit. Tested, tested,
tested.
• Identification of positions
that needed to be on-site
(the rest would work
from home)
• Large globally recognized
financial services firm with
heavy transactional network
traffic.
• Primary data center in
southern New England, about
an hour north of NYC
• Backup data center 200 miles
south.
19. Financial Services andY2K
• NYC staff in 1 Liberty
Plaza,Times Square
and nearWall Street
• If staff had to be
displaced, they
would go to one of
several locations or
be issued laptops to
work from home
• Monthly live test of failover
from primary to backup.
Well understood system and
network for financial
services. Business systems
were lower priority.
• Y2K – Nothing
Happened
20. Financial Services and Y2K
But then there was 9/11…
This was the DR/BCP Plan on
Place when the WorldTrade
Center attack appened
1 Liberty Plaza was across the
street from theWTC
21. Financial Services andY2K
• On 9/11 the first plane hit before the stock market
opened– so the decision was made not to open the
market until the extent of the disaster was known
• As events unfolded, activated disaster plan
o Liberty Plaza andWall Street staff evacuated to
Times Square (until SouthTower collapse)
o Network transferred to Backup Site without
incident
• Long-term displacement of workforce
22. Financial Services andY2K
• On one level, the DR/BCP was successful.
o Almost seamless transition to backup systems
(turned out not to be necessary)
o Market systems staff was on-site, in place and ready
for normal operations when the disaster occurred
o Corporate systems staff generally was in transit or
about to leave home, but in DC – another 9/11 target
site
o Market systems were ready for scheduled market
open at 10AM, but decision was made to keep the
market closed.
o There were staff injuries, but no reported fatalities
23. Financial Services andY2K
• Problems with the BCP
o No plan for loosing Manhattan
o Evacuation plan assumed navigable streets,
availability of public transportation
o Severe and lasting workforce displacement
o IT not ready for influx of teleworkers– not enough
VPN licenses. But that’s OK, not enough laptops
either.
• Sometimes you get lucky
o AT&T NYC Switch Center and most cellular service
was destroyed in the WTC collapse
o The company used MCI for telephone and network
service
24. Scar Tissue and Recommendations
Recurring drills are important. Annual drills are
simply not frequent enough. Test it, darn it!
Still doing weekly/monthly backups with
incrementals? You should rethink your backup
strategy.
Practice bare-metal restores. Even with great
planning and preparation, odds are good you’ll
have to do one or more and they take time.
Transactional systems love to have journal
problems. Understand how to identify problems
early and quickly and how to resolve them.
If you’re using a 3rd party backup site, expect
equipment problems. Plan for it.
25. Scar Tissue and Recommendations
Understand what disasters are facing your disaster
recovery sites!
Understand the logistics of getting the right people
to the right place in different kinds of disasters!
See if you can arrange to have your restoration
media transmitted to the DR site.
(Throwing the backup media in the van with the DR AwayTeam
may make the disaster even worse)
Maintain the equipment for the DR site! It won’t
help you if the DR hardware can’t run the current
mission critical applications!
26. Scar Tissue and Recommendations
• Cross train DR/BCP teams onALL roles. DRI
recommends backups roles and backups to
backups. But you won’t know for sure who reports
for duty until the disaster.
27. What this “Granular” stuff?
• It’s rare that a disaster/emergency will unfurl on
your terms. The key to survival is flexibility
o Be ready for a “half disaster”
o Also be ready for multiple, simultaneous disasters
o Finally, be ready for key staff unavailability
• Situational planning is important
o Have plans built for the most likely disaster scenarios
o To the extend possible, compartmentalize
o Also have a OCISD Strategy
OCISD = “Oh crud! It’s something different!”
28. WhenYou’ve Got Lemons…
• They planned to move
researchers from their
Nice, France facility to
the new US facility
• In the Summer and Fall of
2001, I had a client in the
cosmetics industry
expanding their New
Jersey research facility…
29. WhenYou’ve Got Lemons…
• After 9/11, they ended up halting the plans for the
expanded R&D center, converted it to offices and
moved their executive staff from Manhattan to the
new offices.
• A good example of capitalizing on a disaster
scenario to change your potential risk profile.
( But I’ve always wondered if the R&D team from the
French Riviera was the real force behind 9/11… )
30. Conclusions and Q&A
If you take nothing else away from this presentation, remember:
#1 Test. Refine. Repeat.
#2 Be very flexible. It probably won’t happen like you think it will
#3 When it does happen, you’ll find out which pieces you
didn’t test enough.