Disaster Recovery Customer Experiences with SRM

BCO3276
Disaster Recovery and Site Migration
with Site Recovery Manager: Customer
Experiences from Around the World
Gil Haberman, Product Marketing Manager, Business Continuity and Disaster Recovery, VMware, Inc.
Alan Baird, VMware, Inc.
Christopher Wells, TUV Rheinland Japan Ltd.
Paul Schlosser, VMware, Inc.
Robert Busillo, Independence Blue Cross

Disclaimer

 This session may contain product features that are
currently under development.
 This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
 Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
 Technical feasibility and market demand will affect final delivery.
 Pricing and packaging for any new technologies or features
discussed or presented have not been determined.

2

Agenda
SRM and vSphere For Simple and Reliable DR
TÜV Rheinland, Japan
Mainfreight, New Zealand
Independence Blue Cross, USA

3

SRM and vSphere For Simple
and Reliable DR

4

Disasters Happen. Do You Need Protection?

43% of companies experiencing disasters never
re-open, and 29% close within two years.
(McGladrey and Pullen)

93% of business that lost their data center for
10 days went bankrupt within one year.
(National Archives & Records Administration)

40% of all companies that experience a major
disaster will go out of business if they cannot
gain access to their data within 24 hours.
(Gartner)

Top executives say 10 hours to recovery;
IT managers say up to 30 hours.
(Harris Interactive)

5

vCenter Site Recovery Manager Ensures Simple, Reliable DR

Site Recovery Manager Complements vSphere to provide the simplest
and most reliable disaster protection and site migration for all applications

Provide cost-efficient replication of
applications to failover site
• Built-in vSphere Replication
• Broad support for storage-based
Site A (Primary) Site B (Recovery)

VMware
vCenter Server
Site Recovery
Manager
VMware
vCenter Server
Site Recovery
Manager
replication
Simplify management of recovery and
VMware vSphere VMware vSphere
migration plans
• Replace manual runbooks with
centralized recovery plans
• From weeks to minutes to set up new
plan
Servers Servers Automate failover and migration
processes for reliable recovery
• Enable frequent non-disruptive testing
• Ensure fast, automated failover
• Automate failback processes

6

SRM Momentum

Introduced in Q2’ 2008
125,000+ units sold
5,000+ customers
50% annual growth in 2010

“If your organization is already taking advantage of virtualization,
then adding Site Recovery Manager to handle disaster recovery
is a no-brainer.”
― Jerry Wilkin
Senior Systems Administrator, Dayton Superior Corp

7

What’s New In Site Recovery Manager 5.0?

vSphere Replication
 Bundled with SRM at no additional cost
Expand DR coverage to
Tier 2 apps and smaller
 Provides simple, cost-efficient replication
between vSphere clusters
sites

Automated failback
 Bi-directional recovery plans
 Automates failback to original site Streamline planned
Planned migration migrations
 New workflow that can be applied to any
(for disaster avoidance,
recovery plan planned maintenance, …)
 Ensures no data-loss, application-consistent
migrations of virtual machines

Others
 More granular control over VM startup order
 Protection-side APIs
 IPv6 support

8

Beyond DR: Disaster Avoidance And Planned Migrations

3 typical use-cases for SRM

Disaster Failover Disaster Avoidance Planned Migration

Recover from unexpected Anticipate potential Most frequent SRM use case
site failure datacenter outages • Planned datacenter
• Full or partial site failure • For example: in case of maintenance
planned hurricane, floods, • Global load balancing
The most critical but least forced evacuation, etc.
frequent use-case Streamline routine
• Unexpected site failures do Initiate preventive failover migrations across sites
not happen often for smooth migration • Test to minimize risk
• When they do, fast recovery • Leverage SRM ‘planned • Execute partial failovers
is critical to the business migration’ to ensure no • Leverage SRM ‘planned
data-loss migration’ to ensure no
• ‘Automated failback’ data-loss
enables easy return to • ‘Automated failback’
original site enables bi-directional
migrations

9

Background

TÜV Rheinland was started in Germany in 1872 to perform safety
testing of steam pressure vessels.
Today TÜV Rheinland is active in 61 countries and 39 different
business fields.
Technical certification of a wide range of technology products and
services.
Examples: PV cells, X-ray machines, photocopiers, computer
monitors, computer mice/keyboards.
Also perform Business Continuity Management, Data Protection
Management, Information Security and ITIL services.

11

Justification

Propensity for seismicity in Japan.
Already had infrastructure at more than one location.
Services hosted for external customers required specific SLA.
Simplify difficult process of disaster recovery.

12

Status Quo

Before the earthquake, companies where using physical servers at
their DR site, or had no DR site at all!
Companies in Japan are now conscious of a need for DR and BCP
solutions.
Many Japanese VMware customers are only familiar with the
vSphere base product, not complimentary solutions.
VMware is now more actively marketing the SRM products as a
result of the recent earthquake.

13

History

Prior to SRM, DR process was manual.
Already had implemented SAN replication, so running SRM was
next logical step.
DR testing was non-existent due to manual overhead involved with
testing.
Leveraged VMware snapshots to reduce RTO during failback.

14

Implementation

Met with VMware and a local reseller for guidance.
Set up a POC and learned the product, especially with help of
official documentation and books by 3rd party authors.
Performed tests of the recovery plan.
Leveraged IP address mapping CSV.
3-4 months later, put system into production.

15

Use Cases

General use of VMware products helps conserve power (useful
during power shortages).
Shift workloads from areas under power consumption
constraints/reductions to unaffected areas.
Typical DR protection between Eastern and Western Japan offices.
Temporary fail-over to remote site for planned power outage
situations (once per year).

16

Disaster & Aftermath

On March 11th, at 2:46PM JST our disaster recovery plan went into
motion.
Immediately following the initial shock, systems were functional.
Performed testing of the SRM recovery plans as extra precaution.
Rolling power outages were implemented by TEPCO, necessitating
failover process.
Systems not covered by SRM (physical machines) had RTO of >24
hours.

17

Lessons & Suggestions

Planning for the initial disaster is not enough, you must also plan
for energy and other supply shortages.
Ensure there is a chain of command to kick-off recovery and ensure
more than 1 person can initiate it.
Make sure newly created VMs are configured in the Recovery Plan.
Be sure to back-up the SRM configuration (local files) and DB
backend prior to upgrade.
Perform frequent disaster tests.
Provide more user-friendly way to map IP addresses.
Alert administrators about unprotected or misconfigured VMs.

18

Thanks!

For more information:
 www.tuv.com

Follow me:
 Blog: http://www.vsamurai.com
 Twitter: @wygtya
 LinkedIn: http://jp.linkedin.com/in/wygtya
 Facebook: http://www.facebook.com/wygtya

20

New Zealand - We are here!

We are here!

22 Confidential

Challenges we face

Natural Disasters
• Earthquakes ( 3 major and 250 minor in the last 12 months)
• Tsunami
• Volcanic – 2 active

Remote
• 3 hour flight to Australia

Stability of Power
• 1998 Auckland power crisis
• Reliance on hydro electricity

WAN Considerations
• Cost and bandwidth limitations
23 Confidential

What was learnt from Christchurch

Christchurch was considered low risk for earthquakes

Servers and desktops
• Unable to return to the office 6 months later
• Servers were protected but desktops were lost

Reliance on backup media
• Slow and potentially unreliable

The Human factor
• Other priorities
• Civil unrest

The value of virtualisation
• DR with SRM becomes viable
24 Confidential

SRM - Customer Experience From Around the Globe

David Hall
Mainfreight Group
IT Infrastructure Manager

25 Confidential

Who are we
“A company with a 100 year vision”

Mainfreight is a global supply chain
logistics provider
Commenced business in 1978
Today has a market capitalisation of $993
million
Sales revenues in excess of $1.75 billion
4,600+ team members
Unique culture & philosophy
We have a quality focus and aim to delight
our customers.

26 Confidential

Where We Are
“Ready, Fire, Aim!”

27 Confidential

Our Challenges
“Do more with less”

 Hybrid model consisting of mostly physical

 Cost of DR & BCP

 Previous DR process worked but was complex
& time consuming

 Recent Christchurch earthquakes reiterated to our business the
reality of disaster occurring & the importance of DR & BCP

 Costs of ~$10,000 every hour the systems are down

28 Confidential

When Disaster Strikes - Christchurch

29 Confidential

About our environment

“Top performing organisation's are those that have
harnessed the true potential of todays cutting edge
technologies”
Hardware / Software
 HP servers & storage
South Auckland
 Cisco network
Production
 Microsoft, Citrix, VSphere/SRM 4.x
 Active – Active data centres
Applications protected with SRM
 Maintrak - Web-based consignment tracking system Recovery

 MIMs - Inventory management system
 Cargowise – International freight forwarding system
 On Account – Accounting system
 On Sale – CRM system
Central Auckland
30 Confidential

SRM Highlights

“DR is only as good as the last time it was tested”

Reduced DR test times from ~15 hours to 4 hours
Reduced number of team for DR from 4 to 2
Minimised downtime costs – estimated at $10k per hour
Achieved 99.999% availability
SRM has been proven and used in ‘anger’ - SAN failure
Installation well planned and implemented
Project completed on time and on budget
Minimal external consultancy required
Provided a platform to deliver DR for future business applications

31 Confidential

Thank you

“VMware has provided us with
a flexible, reliable IT platform to
support the business and deliver
IT services in more responsive and
cost-effective ways.”

– Kevin Drinkwater,
Global Chief Information Officer

32 Confidential

Company Background
VMware History
IBC started in 2004 to convert physical servers to VM's in a company wide
effort to consolidate hardware, drive down maintenance cost & datacenter
space/utilities.

Servers Virtualized
We currently manage about 800 VM's residing on 60 plus ESX Hosts
running ESX 4.1 & ESXi.
Since 2005 we have converted over 300 physical servers to VM's.

Storage
EMC DMX 4 (Production and DR ) & NetApp (Test, Dev and QA)

Uses for VMware
We run Windows 2003, 2008, Red Hat v5 (64 and 32 bit O/S's).
We have many Tiers 1 applications running in our VM environment SQL,
Share Point, Citrix, Hyperion/Informatics and our Claims processing servers.

34

Business Needs
What was needed
We were moving our data center in the Summer of 2009 from Philadelphia
to Hershey , PA and needed to migrate 300+ Production VM's to our new
location.

SRM Review
VMware came onsite to present the SRM product for a future IBC project
(DR insourcing) after the product presentation we saw the potential in
using this product for our Datacenter move. Working with VMware
professional services served very beneficial for IBC.

Did it solve the problem?
Yes, SRM made our D.C. move less stressful and streamlined, it also
solved our plans for DR insourcing & Redundant Production environment.

35

Business Needs
Why VMware solution
When we saw the SRM product and how it could help us move 300+
production VM's from our Center City Philadelphia D.C to our new
Hershey, PA D.C it was clear to us that this product would save us many
man hours that we needed elsewhere on our D.C move weekend.

SRM Characteristics
The SRM advantages that IBC leveraged were the pre-move testing,
streamlining and automation of the over all D.C move script which we
could plan out the recovery sequence of Tier 1 Prod VM’s to Tier 3 Test
VM’s. The over reliability of this product saved our company many Admin
man hours, pre and post migration.

36

Business Needs
Time outages avoided
We saved hours of Production server outage times by using SRM instead
of a manual migration and countless Admin man hours were saved
allowing our staff to be utilized in other areas of the move weekend.

What was needed
SRM plugin for Virtual Center
EMC – SRDF
VMware Professional Services – The professional services contact was
very knowledgeable in the SRM product and how to integrate this with our
EMC storage.
SRM script and planning – Setting up your server priority migration
planning.

37

Data Center Migration
How much time till DC cutover
Professional Service came out a few months prior to the
DC move and were onsite for 2 days to prepare the plan
and gather information about the environment.

What was the setup and integration process
We worked with VMware to setup our migration script
and verify that the EMC storage was replicating correctly

38

Services needed

 Replication of data – Our initial synch was about 50
LUNS and about 30TB of data.

 We then setup daily replication of about 1TB a day.

 Setup our server priority script (what servers to power
down last and which servers to power up 1st.

 VMware came onsite 1 more day for verification that all
was well before the final move date.

39

What happened on Labor day move weekend?

VMware was on site Friday night when we kicked off SRM,
there was about 1TB of changes left to be synched. We then
disconnected our EMC storage at the old datacenter and failed
over to the new datacenter storage.

We had less than 10 VM's that needed some attention to get
back online.

I would highly recommend the VMware Professional Services.
They were on site a total of 4 days and walked us through the
whole datacenter migration.

40

Today
How is SRM running today?

We currently insource our Disaster Recovery Drill at our
D.R./Redundant Production datacenter in Reading, PA utilizing
SRM and VMware to get us through the DR drill with replication
and failover. We currently run these tests 3-4 times a year.

41

Where Can I Learn More?

At VMworld
• Visit us at the booth
• Multiple great sessions on SRM
 BCO 1269 – SRM 5 technical – Tue 4:30PM; Wed 1 PM
 BCO 1562 – SRM 5 technical – Tue 12 PM, Wed 10 AM
 BCO 2527 – SRM 5 technical – Tue 3 PM
 BCO 3334 – Cloud DR – Mon 10 AM; Wed 4 PM
 BCO 3336 – Cloud DR – SP perspective – Mon 11:30AM; Tue 12 PM

VMware.com
• Product Page – www.vmware.com/products/srm
• Overview, datasheet, webinars, docs, community links
• Free 60-day Evaluation – all you need to get started!
• Solutions from VMware – www.vmware.com/solutions/continuity

43

BCO3276
Disaster Recovery and Site Migration
with Site Recovery Manager: Customer
Experiences from Around the World

Disaster Recovery Customer Experiences with SRM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Disaster Recovery Customer Experiences with SRM

Similar to Disaster Recovery Customer Experiences with SRM (20)

Recently uploaded

Recently uploaded (20)

Disaster Recovery Customer Experiences with SRM