Site Recovery Manager (SRM) allows customers to simplify disaster recovery and site migration. The document discusses three customer experiences using SRM:
1) TÜV Rheinland, Japan used SRM to protect servers between two data centers in eastern and western Japan following the 2011 earthquake and tsunami. SRM automated their failover and enabled testing.
2) Mainfreight, New Zealand implemented SRM to reduce DR test times from 15 to 4 hours and minimize downtime costs of $10k per hour following the Christchurch earthquakes.
3) Independence Blue Cross used SRM to migrate 300+ VMs between data centers with minimal outages, saving admin time and allowing focus on other
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Disaster Recovery Customer Experiences with SRM
1. BCO3276
Disaster Recovery and Site Migration
with Site Recovery Manager: Customer
Experiences from Around the World
Gil Haberman, Product Marketing Manager, Business Continuity and Disaster Recovery, VMware, Inc.
Alan Baird, VMware, Inc.
Christopher Wells, TUV Rheinland Japan Ltd.
Paul Schlosser, VMware, Inc.
Robert Busillo, Independence Blue Cross
2. Disclaimer
This session may contain product features that are
currently under development.
This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
2
3. Agenda
SRM and vSphere For Simple and Reliable DR
TÜV Rheinland, Japan
Mainfreight, New Zealand
Independence Blue Cross, USA
3
5. Disasters Happen. Do You Need Protection?
43% of companies experiencing disasters never
re-open, and 29% close within two years.
(McGladrey and Pullen)
93% of business that lost their data center for
10 days went bankrupt within one year.
(National Archives & Records Administration)
40% of all companies that experience a major
disaster will go out of business if they cannot
gain access to their data within 24 hours.
(Gartner)
Top executives say 10 hours to recovery;
IT managers say up to 30 hours.
(Harris Interactive)
5
6. vCenter Site Recovery Manager Ensures Simple, Reliable DR
Site Recovery Manager Complements vSphere to provide the simplest
and most reliable disaster protection and site migration for all applications
Provide cost-efficient replication of
applications to failover site
• Built-in vSphere Replication
• Broad support for storage-based
Site A (Primary) Site B (Recovery)
VMware
vCenter Server
Site Recovery
Manager
VMware
vCenter Server
Site Recovery
Manager
replication
Simplify management of recovery and
VMware vSphere VMware vSphere
migration plans
• Replace manual runbooks with
centralized recovery plans
• From weeks to minutes to set up new
plan
Servers Servers Automate failover and migration
processes for reliable recovery
• Enable frequent non-disruptive testing
• Ensure fast, automated failover
• Automate failback processes
6
7. SRM Momentum
Introduced in Q2’ 2008
125,000+ units sold
5,000+ customers
50% annual growth in 2010
“If your organization is already taking advantage of virtualization,
then adding Site Recovery Manager to handle disaster recovery
is a no-brainer.”
― Jerry Wilkin
Senior Systems Administrator, Dayton Superior Corp
7
8. What’s New In Site Recovery Manager 5.0?
vSphere Replication
Bundled with SRM at no additional cost
Expand DR coverage to
Tier 2 apps and smaller
Provides simple, cost-efficient replication
between vSphere clusters
sites
Automated failback
Bi-directional recovery plans
Automates failback to original site Streamline planned
Planned migration migrations
New workflow that can be applied to any
(for disaster avoidance,
recovery plan planned maintenance, …)
Ensures no data-loss, application-consistent
migrations of virtual machines
Others
More granular control over VM startup order
Protection-side APIs
IPv6 support
8
9. Beyond DR: Disaster Avoidance And Planned Migrations
3 typical use-cases for SRM
Disaster Failover Disaster Avoidance Planned Migration
Recover from unexpected Anticipate potential Most frequent SRM use case
site failure datacenter outages • Planned datacenter
• Full or partial site failure • For example: in case of maintenance
planned hurricane, floods, • Global load balancing
The most critical but least forced evacuation, etc.
frequent use-case Streamline routine
• Unexpected site failures do Initiate preventive failover migrations across sites
not happen often for smooth migration • Test to minimize risk
• When they do, fast recovery • Leverage SRM ‘planned • Execute partial failovers
is critical to the business migration’ to ensure no • Leverage SRM ‘planned
data-loss migration’ to ensure no
• ‘Automated failback’ data-loss
enables easy return to • ‘Automated failback’
original site enables bi-directional
migrations
9
11. Background
TÜV Rheinland was started in Germany in 1872 to perform safety
testing of steam pressure vessels.
Today TÜV Rheinland is active in 61 countries and 39 different
business fields.
Technical certification of a wide range of technology products and
services.
Examples: PV cells, X-ray machines, photocopiers, computer
monitors, computer mice/keyboards.
Also perform Business Continuity Management, Data Protection
Management, Information Security and ITIL services.
11
12. Justification
Propensity for seismicity in Japan.
Already had infrastructure at more than one location.
Services hosted for external customers required specific SLA.
Simplify difficult process of disaster recovery.
12
13. Status Quo
Before the earthquake, companies where using physical servers at
their DR site, or had no DR site at all!
Companies in Japan are now conscious of a need for DR and BCP
solutions.
Many Japanese VMware customers are only familiar with the
vSphere base product, not complimentary solutions.
VMware is now more actively marketing the SRM products as a
result of the recent earthquake.
13
14. History
Prior to SRM, DR process was manual.
Already had implemented SAN replication, so running SRM was
next logical step.
DR testing was non-existent due to manual overhead involved with
testing.
Leveraged VMware snapshots to reduce RTO during failback.
14
15. Implementation
Met with VMware and a local reseller for guidance.
Set up a POC and learned the product, especially with help of
official documentation and books by 3rd party authors.
Performed tests of the recovery plan.
Leveraged IP address mapping CSV.
3-4 months later, put system into production.
15
16. Use Cases
General use of VMware products helps conserve power (useful
during power shortages).
Shift workloads from areas under power consumption
constraints/reductions to unaffected areas.
Typical DR protection between Eastern and Western Japan offices.
Temporary fail-over to remote site for planned power outage
situations (once per year).
16
17. Disaster & Aftermath
On March 11th, at 2:46PM JST our disaster recovery plan went into
motion.
Immediately following the initial shock, systems were functional.
Performed testing of the SRM recovery plans as extra precaution.
Rolling power outages were implemented by TEPCO, necessitating
failover process.
Systems not covered by SRM (physical machines) had RTO of >24
hours.
17
18. Lessons & Suggestions
Planning for the initial disaster is not enough, you must also plan
for energy and other supply shortages.
Ensure there is a chain of command to kick-off recovery and ensure
more than 1 person can initiate it.
Make sure newly created VMs are configured in the Recovery Plan.
Be sure to back-up the SRM configuration (local files) and DB
backend prior to upgrade.
Perform frequent disaster tests.
Provide more user-friendly way to map IP addresses.
Alert administrators about unprotected or misconfigured VMs.
18
22. New Zealand - We are here!
We are here!
22 Confidential
23. Challenges we face
Natural Disasters
• Earthquakes ( 3 major and 250 minor in the last 12 months)
• Tsunami
• Volcanic – 2 active
Remote
• 3 hour flight to Australia
Stability of Power
• 1998 Auckland power crisis
• Reliance on hydro electricity
WAN Considerations
• Cost and bandwidth limitations
23 Confidential
24. What was learnt from Christchurch
Christchurch was considered low risk for earthquakes
Servers and desktops
• Unable to return to the office 6 months later
• Servers were protected but desktops were lost
Reliance on backup media
• Slow and potentially unreliable
The Human factor
• Other priorities
• Civil unrest
The value of virtualisation
• DR with SRM becomes viable
24 Confidential
25. SRM - Customer Experience From Around the Globe
David Hall
Mainfreight Group
IT Infrastructure Manager
25 Confidential
26. Who are we
“A company with a 100 year vision”
Mainfreight is a global supply chain
logistics provider
Commenced business in 1978
Today has a market capitalisation of $993
million
Sales revenues in excess of $1.75 billion
4,600+ team members
Unique culture & philosophy
We have a quality focus and aim to delight
our customers.
26 Confidential
27. Where We Are
“Ready, Fire, Aim!”
27 Confidential
28. Our Challenges
“Do more with less”
Hybrid model consisting of mostly physical
Cost of DR & BCP
Previous DR process worked but was complex
& time consuming
Recent Christchurch earthquakes reiterated to our business the
reality of disaster occurring & the importance of DR & BCP
Costs of ~$10,000 every hour the systems are down
28 Confidential
30. About our environment
“Top performing organisation's are those that have
harnessed the true potential of todays cutting edge
technologies”
Hardware / Software
HP servers & storage
South Auckland
Cisco network
Production
Microsoft, Citrix, VSphere/SRM 4.x
Active – Active data centres
Applications protected with SRM
Maintrak - Web-based consignment tracking system Recovery
MIMs - Inventory management system
Cargowise – International freight forwarding system
On Account – Accounting system
On Sale – CRM system
Central Auckland
30 Confidential
31. SRM Highlights
“DR is only as good as the last time it was tested”
Reduced DR test times from ~15 hours to 4 hours
Reduced number of team for DR from 4 to 2
Minimised downtime costs – estimated at $10k per hour
Achieved 99.999% availability
SRM has been proven and used in ‘anger’ - SAN failure
Installation well planned and implemented
Project completed on time and on budget
Minimal external consultancy required
Provided a platform to deliver DR for future business applications
31 Confidential
32. Thank you
“VMware has provided us with
a flexible, reliable IT platform to
support the business and deliver
IT services in more responsive and
cost-effective ways.”
– Kevin Drinkwater,
Global Chief Information Officer
32 Confidential
34. Company Background
VMware History
IBC started in 2004 to convert physical servers to VM's in a company wide
effort to consolidate hardware, drive down maintenance cost & datacenter
space/utilities.
Servers Virtualized
We currently manage about 800 VM's residing on 60 plus ESX Hosts
running ESX 4.1 & ESXi.
Since 2005 we have converted over 300 physical servers to VM's.
Storage
EMC DMX 4 (Production and DR ) & NetApp (Test, Dev and QA)
Uses for VMware
We run Windows 2003, 2008, Red Hat v5 (64 and 32 bit O/S's).
We have many Tiers 1 applications running in our VM environment SQL,
Share Point, Citrix, Hyperion/Informatics and our Claims processing servers.
34
35. Business Needs
What was needed
We were moving our data center in the Summer of 2009 from Philadelphia
to Hershey , PA and needed to migrate 300+ Production VM's to our new
location.
SRM Review
VMware came onsite to present the SRM product for a future IBC project
(DR insourcing) after the product presentation we saw the potential in
using this product for our Datacenter move. Working with VMware
professional services served very beneficial for IBC.
Did it solve the problem?
Yes, SRM made our D.C. move less stressful and streamlined, it also
solved our plans for DR insourcing & Redundant Production environment.
35
36. Business Needs
Why VMware solution
When we saw the SRM product and how it could help us move 300+
production VM's from our Center City Philadelphia D.C to our new
Hershey, PA D.C it was clear to us that this product would save us many
man hours that we needed elsewhere on our D.C move weekend.
SRM Characteristics
The SRM advantages that IBC leveraged were the pre-move testing,
streamlining and automation of the over all D.C move script which we
could plan out the recovery sequence of Tier 1 Prod VM’s to Tier 3 Test
VM’s. The over reliability of this product saved our company many Admin
man hours, pre and post migration.
36
37. Business Needs
Time outages avoided
We saved hours of Production server outage times by using SRM instead
of a manual migration and countless Admin man hours were saved
allowing our staff to be utilized in other areas of the move weekend.
What was needed
SRM plugin for Virtual Center
EMC – SRDF
VMware Professional Services – The professional services contact was
very knowledgeable in the SRM product and how to integrate this with our
EMC storage.
SRM script and planning – Setting up your server priority migration
planning.
37
38. Data Center Migration
How much time till DC cutover
Professional Service came out a few months prior to the
DC move and were onsite for 2 days to prepare the plan
and gather information about the environment.
What was the setup and integration process
We worked with VMware to setup our migration script
and verify that the EMC storage was replicating correctly
38
39. Data Center Migration
Services needed
Replication of data – Our initial synch was about 50
LUNS and about 30TB of data.
We then setup daily replication of about 1TB a day.
Setup our server priority script (what servers to power
down last and which servers to power up 1st.
VMware came onsite 1 more day for verification that all
was well before the final move date.
39
40. Data Center Migration
What happened on Labor day move weekend?
VMware was on site Friday night when we kicked off SRM,
there was about 1TB of changes left to be synched. We then
disconnected our EMC storage at the old datacenter and failed
over to the new datacenter storage.
We had less than 10 VM's that needed some attention to get
back online.
I would highly recommend the VMware Professional Services.
They were on site a total of 4 days and walked us through the
whole datacenter migration.
40
41. Today
How is SRM running today?
We currently insource our Disaster Recovery Drill at our
D.R./Redundant Production datacenter in Reading, PA utilizing
SRM and VMware to get us through the DR drill with replication
and failover. We currently run these tests 3-4 times a year.
41