SlideShare ist ein Scribd-Unternehmen logo
1 von 38
High Availability and Disaster Recovery
in IBM PureApplication System
Scott Moonen <smoonen@us.ibm.com>
Agenda
• Principles and definitions
• HA and DR tools in PureApplication System
• Composing tools to meet your requirements
• Caveats
• Resources
Principles and definitions
3
Principles and definitions: HA and DR
• Business continuity
Ability to recover business operations within specified parameters in case of specified disasters
• Continuous availability
Operation of a system where unplanned outages prevent the operation for at most 5 minutes
per year (“five nines” or 99.999% availability)
• High availability
Operation of a system where unplanned outages prevent the operation for at most a few
seconds or minutes while failover occurs. Often used as an umbrella term to include continuous
availability.
• Disaster recovery
Operation of a system with a plan and process for reconstructing or recovering operations in a
separate location in case of disaster.
Principles and definitions: Active, Passive, etc.
• Active–Active
A system where continuous or high availability is achieved by having active operation in multiple
locations
• Active–Standby (or “warm standby”)
A system where high availability is achieved by having active operation in one location with
another location or locations able to become active within seconds or minutes, without a
“failover” of responsibility
• Active–Passive (or “cold standby”)
A system where high availability or disaster recovery is achieved by having active operation in
one location with another location or locations able to become active within minutes or hours
after a “failover” of responsibility
Principles and definitions: RTO and RPO
• RTO: recovery time objective
How long it takes for an HA or DR procedure to bring a system back into operation
• RPO: recovery point objective
How much data (measured in elapsed time) might be lost in the event of a disaster
RPO
zero daysminutesseconds hours
mirrored file systems
replicated file systems backup and restore
Principles and definitions: Scenarios
• Metropolitan distance: multiple data centers within 100–300km
–High availability is achievable using Active–Active or Active–Standby solutions that involve active
mirroring of data between sites.
–Disaster recovery with zero RPO is achievable using Active–Passive solutions that involve replication of
data between sites.
• Regional to global distance: multiple data centers beyond 200–300km
Disaster recovery with nonzero RPO is achievable using Active–Passive solutions that involve
replication of data between sites.
Principles and definitions: Personas
• Application architect
Responsible for planning the application design in such a way that high availability or disaster
recovery is achievable (e.g., separating application from data)
• Infrastructure administrator
Responsible for configuring and managing infrastructure in such a way as to achieve the ability
to implement high availability or disaster recovery (e.g., configuring and managing disk
mirroring or replication)
• Application administrator
Responsible for deploying and managing the components of an application in such a way as to
achieve high availability or disaster recovery (e.g., deploying the application in duplicate
between two sites and orchestrating the failover of the application and its disks together with
the infrastructure administrator)
Principles: Automation and repeatability
• Automate all aspects of your application’s deployment and configuration
–Using PureApplication patterns, pattern components, script packages, customized images
–Using external application lifecycle tooling such as IBM UrbanCode Deploy
• Why? This achieves rapid and confident repeatability of your application deployment, allowing:
–Quality and control: lower risk and chance of error
–Agility and simplicity
• Quickly recover application if you need to redeploy it
• Quickly deploy your application at separate sites for HA or DR purposes
• Quickly deploy new versions of the application for test or upgrade purposes
• Create a continuous integration lifecycle for faster and more frequent application deployment and testing
–Portability: deploy to other cloud environments (e.g., PureApplication Service)
Principles: Separation of application and data
• Ensure that all persistent data (transaction logs, database, etc.) is stored on separate disks from
the application or database application itself
• Why? This multiplies your recovery options because it decouples your strategy for application
and data recovery, which often must be addressed in different ways:
–Application recovery may involve backup & restore, re–deployment, or multiple deployment
Often the application cannot be replicated due to infrastructure entanglement
–Data recovery may involve backup & restore, replication, or mirroring
• This also allows additional flexibility for development and test cycles, for example:
–Deploy new versions of the application or database server and connect to original data
–Deploy test instances of the application using copies of the production data
Principles: Transaction consistency
If your application stores data in multiple locations (e.g., transaction logs on file server and transactions in
database), then you must ensure that either:
• The “lower” statements of record are replicated with total consistency together with the “higher”
statements of record, or else
• The “lower” statements of record are at all times replicated in advance of the “higher” statements of
record.
This ensures that you do not replicate inconsistent data (e.g., transaction log indicates a transaction is
committed but the transaction is not present in the database). So, for example:
• Your database and fileserver disks are replicated together with strict consistency, or instead
• Your database is replicated synchronously (zero RPO) but your fileserver asynchronously (nonzero RPO).
HA and DR tools in
PureApplication System
12
Tools: Compute node availability
• PureApplication System offers two options for planning for failure of compute nodes:
–Cloud group HA, if enabled, will reserve 1/n CPU and memory overhead on each compute node in a
cloud group containing n compute nodes. If one compute node fails, all VMs will be recovered into this
reserved space on the remaining nodes.
–System HA allows you to designate one or more compute nodes as spares for all cloud groups that are
enabled for system HA. This allows you both to (1) allocate more than one spare, and also (2) share a
spare between multiple cloud groups.
• If neither cloud group HA or system HA is enabled and a compute node fails, the system will
attempt to recover as many VMs as possible on the remaining nodes in the cloud group, in
priority order.
• VMs being recovered will experience an outage equivalent to being rebooted.
• Recommendation: always enable cloud group HA or system HA
–This ensures your workload capacity is restored quickly after a compute node failure
–This also ensures that workload does not need to be stopped for planned compute node maintenance
Tools: Block storage
Block storage volumes in PureApplication System:
• May be up to 8TB in size
• Are allocated and managed independently of VM storage, can be attached and detached
• Is not included in VM snapshots
• Can be cloned (copied)
• Can be exported and imported against external scp servers
• Groups of volumes can be created for time–consistent cloning or export of multiple volumes
Tools: Shared block storage
• Block storage volumes may be shared (simultaneously attached) by virtual machines
–On the same system
Note: this is supported on Intel, and on Power beginning with V2.2.
–Between systems. Notes:
• This is supported only for external block storage that resides outside of the system (see later slide).
• This is supported on Intel. Support on Power is forthcoming.
• This allows for creation of highly available clusters (GPFS, GFS, DB2 pureScale, Windows cluster)
–A clustering protocol is necessary for sharing of the disk
–The IBM GPFS pattern (see later slide) supports GPFS clusters on a single rack using shared block
storage, but does not support cross–system clusters using shared external block storage
• Restrictions
–Storage volumes must be specifically created as “shared” volumes
–Special placement techniques are required in the pattern to ensure anti–collocation of VMs
–IBM GPFS pattern supports clustering (see below)
Tools: Block storage replication
Two PureApplication Systems can be connected for replication of block storage
• Connectivity options
–Fiber channel connectivity supported beginning in V2.0
–TCP/IP connectivity supported beginning in V2.2
• Volumes are selected for replication individually
–Replicate in either direction
–Replicate synchronously up to 3ms latency (~300km), asynchronously up to 80ms latency (~8000km).
RPO for asynchronous replication is up to 1 second.
• All volumes are replicated together with strict consistency
• Target volume must not be attached while replication is taking place
• Replication may be terminated (unplanned failover) or reversed in place (planned failover).
Reverse in place requires volume to be unattached on both sides.
Tools: External block storage
• PureApplication System can connect to external SVC, V7000, V9000 devices:
–Allows for block and block “shared” volumes to be accessed by VMs on PureApplication System.
Base VM disks cannot reside on external storage.
–Depending on extent size, allows for volumes larger than 8TB in size
–Requires both TCP/IP and fiber channel connectivity to external device
• All volume management is performed outside of system
–Volumes are allocated and deleted by admin on external device
–Alternate storage providers, RAID configurations, or combinations of HDD and SSD may be used
–Volumes may be mirrored externally (e.g., SVC–managed mirroring across multiple devices)
–Volumes may be replicated externally (e.g., SVC to SVC replication between data centers)
• Advanced scenarios, sharing access to the same SVC cluster or V7000, or replicated ones:
–Two systems sharing access to cluster or to replicated volumes
–PureApplication System and PureApplication Software sharing access to cluster or replicated volumes
Tools: IBM GPFS (General Parallel File System) / Spectrum Scale
• GPFS is:
–A shared filesystem (like NFS)
–Optionally: a clustered filesystem (unlike NFS) providing HA and high performance.
Note: clustering supported on Power Systems beginning with V2.2.
–Optionally: mirrored between cloud groups or systems
• A tiebreaker (on third rack or external system) is required for quorum
• Mirroring is not recommended above 1–3ms (~100–300km) latency
–Optionally: (using block storage or external storage replication) replicated between systems
ServerServer
Server
Data
Client Client
Server
Data
Client Client
ServerServer
Server
Data
Client Client
ServerServer
Server
Data
ServerServer
Server
Data
Client Client
ServerServer
Server
Data
Shared Clustered Mirrored Replicated
Tie
Tools: Multi–system deployment
• Connect systems in a “deployment subdomain” for cross–system pattern deployment
–Virtual machines for individual vsys.next or vapp deployments may be distributed across systems
–Allows for easier deployment and management of highly available applications using a single pattern
–Systems may be located in same or different data centers
• Notes and restrictions
–Up to four systems may be connected (limit is two systems prior to V2.2)
–Inter–system network latencies must be less than 3ms (~300km)
–An external 1GB iSCSI tiebreaker target must be configured for quorum purposes
–Special network configuration is required for inter–system management communications
Composing tools
to meet your requirements
20
Scenario: Test application, middleware, or schema update
Copy block storage from production application for use in testing
Data
base
Data
App
Data
base
Data
Test
App
copy
Scenario: Update application or middleware
When both the current and new application and middleware can share the same database
without conflict (e.g., no changes to database schema), you can run the newer version of the
application or middleware side by side for testing, and then eventually direct clients to the new
version and retire the old version.
Data
base
Data
App
App
V2
Scenario: Backward incompatible updates to database or schema
In some cases, a new version of an application, database server, or database schema may be
unable to coexist with the existing application. In this case, you can use the “copy” strategy on a
previous slide to test the upgrade of your application. When you are ready to promote the new
version to production, you can detach the block storage from the existing deployment and attach
it to the upgraded deployment.
Data
base
Data
App
DB
V2
App
V2
detach attach
Scenario: HA planning for compute node failure
Principles:
• Deploy multiple instances of each service so that each service continues if one instance is lost
• Enable cloud group or system HA so that failed instances can be recovered quickly
DB
primary
Data
App
DB
secondary
Data
App
HADR
Load balancer
GPFSGPFS
GPFS
Data
Scenario: recovery planning for VM failure or corruption
Three scenarios:
• Backup and restore of the VM itself is feasible if it can be recovered in place
• If the VM cannot be recovered:
–If the VM is part of a horizontally scalable cluster, you can scale
in to remove the failed VM and scale out to create a new VM
–If the VM is not horizontally scalable, you must plan to re–deploy it:
• You can deploy the entire pattern again and recover the data to it
• You may be able to deploy a new pattern that recreates only the failed VM,
and use manual or scripted configuration to reconnect it to your existing
deployment
Scenario: recovery planning for database corruption
You may use your database’s own capabilities for backup and restore, import and export.
Alternatively, you may use block storage copies (and optionally export and import) to backup your
database. Attach the backup copy (importing it beforehand if necessary) to restore.
Data
base
Data
App
Data
copy
export/import
detach
attach
Scenario: HA planning for system or site failure
• As with planning for compute node failure, deploy multiple instances: now across systems.
• You may deploy separately on each system, or use multi–system deployment across systems.
• Distance at which HA is possible is limited.
• GPFS clustering is optional. It can provide additional throughput and also additional availability
on a single system.
DB
primary
Data
App
DB
secondary
Data
HADR
Load balancer
GPFSGPFS
GPFS
Data
GPFSGPFS
GPFS
Data
App
System A System B
mirror
Tie
Scenario: Two–tier HA planning for system or site failure
• Compared to the previous slide, if you desire HA both within a site and also between sites, you
must duplicate your application, database and filesystem both within and between sites.
• Native database replication between sites must be synchronous, or may be asynchronous if you
have no need of GPFS (see slide 11).
DB
primary
Data
App
DB
secondary
Data
HADR
Load balancer or DNS
GPFSGPFS
GPFS
Data
GPFSGPFS
GPFS
Data
App
may be
standby
Site A Site B
DB
secondary
Data
HADR
App
mirror
Tie
Scenario: DR planning for rack or site failure
• You should expect nonzero RPO if the sites are too far apart to allow synchronous replication
• Applications must be quiesced at the recovery site because replicated disks are inaccessible
• The database is here replicated using disk replication for transaction consistency. You can use
native database replication (as on slide 28) only if it is synchronous, or asynchronously only if
you have no need of GPFS (see slide 11).
DB
primary
Data
App
DB
secondary
Data
HADR
Load balancer or DNS
GPFSGPFS
GPFS
Data
GPFSGPFS
GPFS
Data
App
System A System B
DB
primary
Data
DB
secondary
Data
HADR
Replication
Scenario: horizontal scaling and bursting
• Use of the base scaling policy allows you to horizontally scale, manually or in some cases
automatically, new instances of a virtual machine with clustered software components.
• When using multi–system deployment, horizontally scaled virtual machines will be distributed
as much as possible across systems referenced in your environment profile
• An alternate approach, especially in heterogeneous environments like PureApplication System
and PureApplication Service, is to deploy new pattern instances for scaling or bursting, and
federate them together.
Caveats
31
Caveats: Networking considerations
• Some middleware is sensitive to IP addresses and hostnames (e.g., WAS) and for DR purposes
you may need to plan to duplicate either IP addresses or hostnames in your backup data center
• Both HA architectures and zero–RPO DR architectures are sensitive to latency. If latency is too
high you can experience poor write throughput or even mirroring or replication failure. For
these cases you should ideally plan for less than 1ms (~100km) of latency between sites.
• You must also plan for adequate network throughput between sites when mirroring or
replicating.
• HA architectures require the use of a tiebreaker to govern quorum–leader determination in case
of a network split. In a multi–site HA design, you should plan to locate the quorum at a third
location, with equally low latency.
Caveats: Middleware–specific considerations
• Combining both mirroring and replication (Active–Active–Passive–Passive)
–The IBM GPFS pattern does not support combining both mirroring and replication
–This combination is possible for other middleware (e.g., DB2 as on slide 29), but you must manually
determine and designate which instance is Primary or Secondary at the time of recovery
• Read carefully your middleware’s recommendations for configuring HA. For example:
–IBM WebSphere recommends against cross–site cells
–The IBM DB2 HADR pattern preconfigures a reservationless IP–based tiebreaker, which is not
recommended
–IBM DB2 HADR provides a variety of synchronization modes with different RPO characteristics
• Ensure your middleware tolerates attaching existing storage if you replicate or copy volumes
–The IBM DB2 HADR pattern requires an empty disk when first deploying. You can attach a new disk or
replicate into this disk only after deployment.
–The IBM GPFS pattern does not support attaching existing GPFS disks
Caveats: Virtual machine backup and restore
The power and flexibility of PureApplication patterns means that your PureApplication VMs are
tightly integrated both within a single deployment, and with the system on which they are
deployed.
Because of this tight integration, you cannot use backup and restore techniques to recover your
PureApplication VMs unless you are recovering to the exact same virtual machine that was
previously backed up.
Your cloud strategy for recovering corrupted deployments should build on the efficiency and
repeatability of patterns so that you are able to re–deploy in the event of extreme failure
scenarios such as accidental virtual machine deletion or total system failure.
Caveats: Practice, practice, practice
Because of the complexity of HA and DR implementation, and especially because of some of the
caveats we have noted and which you may encounter in your unique situation, it is vital for you to
practice all aspects of your HA or DR implementation and lifecycle before you roll it out into
production.
This includes testing network bandwidth and latency to their expected limits. It also includes
simulating failures and verifying and perfecting your procedures for recovery and also for failback.
Resources
36
Resources
• Implementing High Availability and Disaster Recovery in IBM PureApplication Systems V2
http://www.redbooks.ibm.com/abstracts/sg248246.html
• “Implement multisystem management and deployment with IBM PureApplication System”
http://www.ibm.com/developerworks/websphere/techjournal/1506_vanrun/1506_vanrun-
trs.html
• “Demystifying virtual machine placement in IBM PureApplication System”
http://www.ibm.com/developerworks/websphere/library/techarticles/1605_moonen-
trs/1605_moonen.html
Resources, continued
• “High availability (again) versus continuous availability”
http://www.ibm.com/developerworks/websphere/techjournal/1004_webcon/1004_webcon.ht
ml
• “Can I run a WebSphere Application Server cell over multiple data centers?”
http://www.ibm.com/developerworks/websphere/techjournal/0606_col_alcott/0606_col_alcot
t.html#sec1d
• “Increase DB2 availability”
http://www.ibm.com/developerworks/data/library/techarticle/dm-1406db2avail/index.html
• “HADR sync mode”
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/pag
e/HADR%20sync%20mode

Weitere ähnliche Inhalte

Was ist angesagt?

Whitepaper Exchange 2007 Changes, Resilience And Storage Management
Whitepaper   Exchange 2007 Changes, Resilience And Storage ManagementWhitepaper   Exchange 2007 Changes, Resilience And Storage Management
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
Alan McSweeney
 
Dell data protection & performance management solutions
Dell data protection & performance management solutionsDell data protection & performance management solutions
Dell data protection & performance management solutions
Dell Data Protection Software
 
Designing Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTMDesigning Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTM
MavenWire
 
Smb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - AccelSmb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - Accel
accelfb
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
EMC
 

Was ist angesagt? (20)

1.ibm pure flex system mar 2013
1.ibm pure flex system   mar 20131.ibm pure flex system   mar 2013
1.ibm pure flex system mar 2013
 
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
Whitepaper   Exchange 2007 Changes, Resilience And Storage ManagementWhitepaper   Exchange 2007 Changes, Resilience And Storage Management
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
 
Sap fundamentals overview_for_sap_minors
Sap fundamentals overview_for_sap_minorsSap fundamentals overview_for_sap_minors
Sap fundamentals overview_for_sap_minors
 
VMworld 2013: Building a Validation Factory for VMware Partners
VMworld 2013: Building a Validation Factory for VMware Partners VMworld 2013: Building a Validation Factory for VMware Partners
VMworld 2013: Building a Validation Factory for VMware Partners
 
Java/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationJava/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimization
 
2.ibm flex system manager overview
2.ibm flex system manager overview2.ibm flex system manager overview
2.ibm flex system manager overview
 
Dell data protection & performance management solutions
Dell data protection & performance management solutionsDell data protection & performance management solutions
Dell data protection & performance management solutions
 
Comprehensive and Simplified Management for VMware vSphere Environments - now...
Comprehensive and Simplified Management for VMware vSphere Environments - now...Comprehensive and Simplified Management for VMware vSphere Environments - now...
Comprehensive and Simplified Management for VMware vSphere Environments - now...
 
VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part One
VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part OneVMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part One
VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part One
 
1457 - Reviewing Experiences from the PureExperience Program
1457 - Reviewing Experiences from the PureExperience Program1457 - Reviewing Experiences from the PureExperience Program
1457 - Reviewing Experiences from the PureExperience Program
 
Sap solutions-on-v mware-best-practices-guide
Sap solutions-on-v mware-best-practices-guideSap solutions-on-v mware-best-practices-guide
Sap solutions-on-v mware-best-practices-guide
 
A Year of Testing in the Cloud: Lessons Learned
A Year of Testing in the Cloud: Lessons LearnedA Year of Testing in the Cloud: Lessons Learned
A Year of Testing in the Cloud: Lessons Learned
 
Designing Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTMDesigning Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTM
 
Big data and ibm flashsystems
Big data and ibm flashsystemsBig data and ibm flashsystems
Big data and ibm flashsystems
 
VMware Log Insight
VMware Log Insight VMware Log Insight
VMware Log Insight
 
Smb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - AccelSmb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - Accel
 
VMworld 2013: Real-world Design Examples for Virtualized SAP Environments
VMworld 2013: Real-world Design Examples for Virtualized SAP Environments VMworld 2013: Real-world Design Examples for Virtualized SAP Environments
VMworld 2013: Real-world Design Examples for Virtualized SAP Environments
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL Server
 
Cast Iron Overview Webinar 6.13
Cast Iron Overview Webinar 6.13Cast Iron Overview Webinar 6.13
Cast Iron Overview Webinar 6.13
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 

Andere mochten auch

DB2 pureScale Technology Preview
DB2 pureScale Technology PreviewDB2 pureScale Technology Preview
DB2 pureScale Technology Preview
Cristian Molaro
 
DB2 LUW - Backup and Recovery
DB2 LUW - Backup and RecoveryDB2 LUW - Backup and Recovery
DB2 LUW - Backup and Recovery
imranasayed
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple Standby
Dale McInnis
 

Andere mochten auch (10)

High Availability Options for DB2 Data Centre
High Availability Options for DB2 Data CentreHigh Availability Options for DB2 Data Centre
High Availability Options for DB2 Data Centre
 
Multi-Dimensional Clustering: A High-Level Overview
Multi-Dimensional Clustering: A High-Level Overview Multi-Dimensional Clustering: A High-Level Overview
Multi-Dimensional Clustering: A High-Level Overview
 
DB2 pureScale Technology Preview
DB2 pureScale Technology PreviewDB2 pureScale Technology Preview
DB2 pureScale Technology Preview
 
Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013
 
SAP HANA SPS10- Scale-Out, High Availability and Disaster Recovery
SAP HANA SPS10- Scale-Out, High Availability and Disaster RecoverySAP HANA SPS10- Scale-Out, High Availability and Disaster Recovery
SAP HANA SPS10- Scale-Out, High Availability and Disaster Recovery
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red HatMultiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
 
DB2 LUW - Backup and Recovery
DB2 LUW - Backup and RecoveryDB2 LUW - Backup and Recovery
DB2 LUW - Backup and Recovery
 
D02 Evolution of the HADR tool
D02 Evolution of the HADR toolD02 Evolution of the HADR tool
D02 Evolution of the HADR tool
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple Standby
 

Ähnlich wie High availability and disaster recovery in IBM PureApplication System

NetBackup Appliance Family presentation
NetBackup Appliance Family presentationNetBackup Appliance Family presentation
NetBackup Appliance Family presentation
Symantec
 
Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...
EMC Forum India
 

Ähnlich wie High availability and disaster recovery in IBM PureApplication System (20)

2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
 
Presentation recovery manager (rman) configuration and performance tuning ...
Presentation    recovery manager (rman) configuration and performance tuning ...Presentation    recovery manager (rman) configuration and performance tuning ...
Presentation recovery manager (rman) configuration and performance tuning ...
 
Backups And Recovery
Backups And RecoveryBackups And Recovery
Backups And Recovery
 
NetBackup Appliance Family presentation
NetBackup Appliance Family presentationNetBackup Appliance Family presentation
NetBackup Appliance Family presentation
 
CSD-2881 - Achieving System Production Readiness for IBM PureApplication System
CSD-2881 - Achieving System Production Readiness for IBM PureApplication SystemCSD-2881 - Achieving System Production Readiness for IBM PureApplication System
CSD-2881 - Achieving System Production Readiness for IBM PureApplication System
 
Vmware srm 6.1
Vmware srm 6.1Vmware srm 6.1
Vmware srm 6.1
 
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
Oracle Storage a ochrana dat
Oracle Storage a ochrana datOracle Storage a ochrana dat
Oracle Storage a ochrana dat
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...Track 2, session 3, business continuity and disaster recovery in the virtuali...
Track 2, session 3, business continuity and disaster recovery in the virtuali...
 
02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha
 
les12.pdf
les12.pdfles12.pdf
les12.pdf
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recovery
 
Operating System
Operating SystemOperating System
Operating System
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 
Database replication
Database replicationDatabase replication
Database replication
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

High availability and disaster recovery in IBM PureApplication System

  • 1. High Availability and Disaster Recovery in IBM PureApplication System Scott Moonen <smoonen@us.ibm.com>
  • 2. Agenda • Principles and definitions • HA and DR tools in PureApplication System • Composing tools to meet your requirements • Caveats • Resources
  • 4. Principles and definitions: HA and DR • Business continuity Ability to recover business operations within specified parameters in case of specified disasters • Continuous availability Operation of a system where unplanned outages prevent the operation for at most 5 minutes per year (“five nines” or 99.999% availability) • High availability Operation of a system where unplanned outages prevent the operation for at most a few seconds or minutes while failover occurs. Often used as an umbrella term to include continuous availability. • Disaster recovery Operation of a system with a plan and process for reconstructing or recovering operations in a separate location in case of disaster.
  • 5. Principles and definitions: Active, Passive, etc. • Active–Active A system where continuous or high availability is achieved by having active operation in multiple locations • Active–Standby (or “warm standby”) A system where high availability is achieved by having active operation in one location with another location or locations able to become active within seconds or minutes, without a “failover” of responsibility • Active–Passive (or “cold standby”) A system where high availability or disaster recovery is achieved by having active operation in one location with another location or locations able to become active within minutes or hours after a “failover” of responsibility
  • 6. Principles and definitions: RTO and RPO • RTO: recovery time objective How long it takes for an HA or DR procedure to bring a system back into operation • RPO: recovery point objective How much data (measured in elapsed time) might be lost in the event of a disaster RPO zero daysminutesseconds hours mirrored file systems replicated file systems backup and restore
  • 7. Principles and definitions: Scenarios • Metropolitan distance: multiple data centers within 100–300km –High availability is achievable using Active–Active or Active–Standby solutions that involve active mirroring of data between sites. –Disaster recovery with zero RPO is achievable using Active–Passive solutions that involve replication of data between sites. • Regional to global distance: multiple data centers beyond 200–300km Disaster recovery with nonzero RPO is achievable using Active–Passive solutions that involve replication of data between sites.
  • 8. Principles and definitions: Personas • Application architect Responsible for planning the application design in such a way that high availability or disaster recovery is achievable (e.g., separating application from data) • Infrastructure administrator Responsible for configuring and managing infrastructure in such a way as to achieve the ability to implement high availability or disaster recovery (e.g., configuring and managing disk mirroring or replication) • Application administrator Responsible for deploying and managing the components of an application in such a way as to achieve high availability or disaster recovery (e.g., deploying the application in duplicate between two sites and orchestrating the failover of the application and its disks together with the infrastructure administrator)
  • 9. Principles: Automation and repeatability • Automate all aspects of your application’s deployment and configuration –Using PureApplication patterns, pattern components, script packages, customized images –Using external application lifecycle tooling such as IBM UrbanCode Deploy • Why? This achieves rapid and confident repeatability of your application deployment, allowing: –Quality and control: lower risk and chance of error –Agility and simplicity • Quickly recover application if you need to redeploy it • Quickly deploy your application at separate sites for HA or DR purposes • Quickly deploy new versions of the application for test or upgrade purposes • Create a continuous integration lifecycle for faster and more frequent application deployment and testing –Portability: deploy to other cloud environments (e.g., PureApplication Service)
  • 10. Principles: Separation of application and data • Ensure that all persistent data (transaction logs, database, etc.) is stored on separate disks from the application or database application itself • Why? This multiplies your recovery options because it decouples your strategy for application and data recovery, which often must be addressed in different ways: –Application recovery may involve backup & restore, re–deployment, or multiple deployment Often the application cannot be replicated due to infrastructure entanglement –Data recovery may involve backup & restore, replication, or mirroring • This also allows additional flexibility for development and test cycles, for example: –Deploy new versions of the application or database server and connect to original data –Deploy test instances of the application using copies of the production data
  • 11. Principles: Transaction consistency If your application stores data in multiple locations (e.g., transaction logs on file server and transactions in database), then you must ensure that either: • The “lower” statements of record are replicated with total consistency together with the “higher” statements of record, or else • The “lower” statements of record are at all times replicated in advance of the “higher” statements of record. This ensures that you do not replicate inconsistent data (e.g., transaction log indicates a transaction is committed but the transaction is not present in the database). So, for example: • Your database and fileserver disks are replicated together with strict consistency, or instead • Your database is replicated synchronously (zero RPO) but your fileserver asynchronously (nonzero RPO).
  • 12. HA and DR tools in PureApplication System 12
  • 13. Tools: Compute node availability • PureApplication System offers two options for planning for failure of compute nodes: –Cloud group HA, if enabled, will reserve 1/n CPU and memory overhead on each compute node in a cloud group containing n compute nodes. If one compute node fails, all VMs will be recovered into this reserved space on the remaining nodes. –System HA allows you to designate one or more compute nodes as spares for all cloud groups that are enabled for system HA. This allows you both to (1) allocate more than one spare, and also (2) share a spare between multiple cloud groups. • If neither cloud group HA or system HA is enabled and a compute node fails, the system will attempt to recover as many VMs as possible on the remaining nodes in the cloud group, in priority order. • VMs being recovered will experience an outage equivalent to being rebooted. • Recommendation: always enable cloud group HA or system HA –This ensures your workload capacity is restored quickly after a compute node failure –This also ensures that workload does not need to be stopped for planned compute node maintenance
  • 14. Tools: Block storage Block storage volumes in PureApplication System: • May be up to 8TB in size • Are allocated and managed independently of VM storage, can be attached and detached • Is not included in VM snapshots • Can be cloned (copied) • Can be exported and imported against external scp servers • Groups of volumes can be created for time–consistent cloning or export of multiple volumes
  • 15. Tools: Shared block storage • Block storage volumes may be shared (simultaneously attached) by virtual machines –On the same system Note: this is supported on Intel, and on Power beginning with V2.2. –Between systems. Notes: • This is supported only for external block storage that resides outside of the system (see later slide). • This is supported on Intel. Support on Power is forthcoming. • This allows for creation of highly available clusters (GPFS, GFS, DB2 pureScale, Windows cluster) –A clustering protocol is necessary for sharing of the disk –The IBM GPFS pattern (see later slide) supports GPFS clusters on a single rack using shared block storage, but does not support cross–system clusters using shared external block storage • Restrictions –Storage volumes must be specifically created as “shared” volumes –Special placement techniques are required in the pattern to ensure anti–collocation of VMs –IBM GPFS pattern supports clustering (see below)
  • 16. Tools: Block storage replication Two PureApplication Systems can be connected for replication of block storage • Connectivity options –Fiber channel connectivity supported beginning in V2.0 –TCP/IP connectivity supported beginning in V2.2 • Volumes are selected for replication individually –Replicate in either direction –Replicate synchronously up to 3ms latency (~300km), asynchronously up to 80ms latency (~8000km). RPO for asynchronous replication is up to 1 second. • All volumes are replicated together with strict consistency • Target volume must not be attached while replication is taking place • Replication may be terminated (unplanned failover) or reversed in place (planned failover). Reverse in place requires volume to be unattached on both sides.
  • 17. Tools: External block storage • PureApplication System can connect to external SVC, V7000, V9000 devices: –Allows for block and block “shared” volumes to be accessed by VMs on PureApplication System. Base VM disks cannot reside on external storage. –Depending on extent size, allows for volumes larger than 8TB in size –Requires both TCP/IP and fiber channel connectivity to external device • All volume management is performed outside of system –Volumes are allocated and deleted by admin on external device –Alternate storage providers, RAID configurations, or combinations of HDD and SSD may be used –Volumes may be mirrored externally (e.g., SVC–managed mirroring across multiple devices) –Volumes may be replicated externally (e.g., SVC to SVC replication between data centers) • Advanced scenarios, sharing access to the same SVC cluster or V7000, or replicated ones: –Two systems sharing access to cluster or to replicated volumes –PureApplication System and PureApplication Software sharing access to cluster or replicated volumes
  • 18. Tools: IBM GPFS (General Parallel File System) / Spectrum Scale • GPFS is: –A shared filesystem (like NFS) –Optionally: a clustered filesystem (unlike NFS) providing HA and high performance. Note: clustering supported on Power Systems beginning with V2.2. –Optionally: mirrored between cloud groups or systems • A tiebreaker (on third rack or external system) is required for quorum • Mirroring is not recommended above 1–3ms (~100–300km) latency –Optionally: (using block storage or external storage replication) replicated between systems ServerServer Server Data Client Client Server Data Client Client ServerServer Server Data Client Client ServerServer Server Data ServerServer Server Data Client Client ServerServer Server Data Shared Clustered Mirrored Replicated Tie
  • 19. Tools: Multi–system deployment • Connect systems in a “deployment subdomain” for cross–system pattern deployment –Virtual machines for individual vsys.next or vapp deployments may be distributed across systems –Allows for easier deployment and management of highly available applications using a single pattern –Systems may be located in same or different data centers • Notes and restrictions –Up to four systems may be connected (limit is two systems prior to V2.2) –Inter–system network latencies must be less than 3ms (~300km) –An external 1GB iSCSI tiebreaker target must be configured for quorum purposes –Special network configuration is required for inter–system management communications
  • 20. Composing tools to meet your requirements 20
  • 21. Scenario: Test application, middleware, or schema update Copy block storage from production application for use in testing Data base Data App Data base Data Test App copy
  • 22. Scenario: Update application or middleware When both the current and new application and middleware can share the same database without conflict (e.g., no changes to database schema), you can run the newer version of the application or middleware side by side for testing, and then eventually direct clients to the new version and retire the old version. Data base Data App App V2
  • 23. Scenario: Backward incompatible updates to database or schema In some cases, a new version of an application, database server, or database schema may be unable to coexist with the existing application. In this case, you can use the “copy” strategy on a previous slide to test the upgrade of your application. When you are ready to promote the new version to production, you can detach the block storage from the existing deployment and attach it to the upgraded deployment. Data base Data App DB V2 App V2 detach attach
  • 24. Scenario: HA planning for compute node failure Principles: • Deploy multiple instances of each service so that each service continues if one instance is lost • Enable cloud group or system HA so that failed instances can be recovered quickly DB primary Data App DB secondary Data App HADR Load balancer GPFSGPFS GPFS Data
  • 25. Scenario: recovery planning for VM failure or corruption Three scenarios: • Backup and restore of the VM itself is feasible if it can be recovered in place • If the VM cannot be recovered: –If the VM is part of a horizontally scalable cluster, you can scale in to remove the failed VM and scale out to create a new VM –If the VM is not horizontally scalable, you must plan to re–deploy it: • You can deploy the entire pattern again and recover the data to it • You may be able to deploy a new pattern that recreates only the failed VM, and use manual or scripted configuration to reconnect it to your existing deployment
  • 26. Scenario: recovery planning for database corruption You may use your database’s own capabilities for backup and restore, import and export. Alternatively, you may use block storage copies (and optionally export and import) to backup your database. Attach the backup copy (importing it beforehand if necessary) to restore. Data base Data App Data copy export/import detach attach
  • 27. Scenario: HA planning for system or site failure • As with planning for compute node failure, deploy multiple instances: now across systems. • You may deploy separately on each system, or use multi–system deployment across systems. • Distance at which HA is possible is limited. • GPFS clustering is optional. It can provide additional throughput and also additional availability on a single system. DB primary Data App DB secondary Data HADR Load balancer GPFSGPFS GPFS Data GPFSGPFS GPFS Data App System A System B mirror Tie
  • 28. Scenario: Two–tier HA planning for system or site failure • Compared to the previous slide, if you desire HA both within a site and also between sites, you must duplicate your application, database and filesystem both within and between sites. • Native database replication between sites must be synchronous, or may be asynchronous if you have no need of GPFS (see slide 11). DB primary Data App DB secondary Data HADR Load balancer or DNS GPFSGPFS GPFS Data GPFSGPFS GPFS Data App may be standby Site A Site B DB secondary Data HADR App mirror Tie
  • 29. Scenario: DR planning for rack or site failure • You should expect nonzero RPO if the sites are too far apart to allow synchronous replication • Applications must be quiesced at the recovery site because replicated disks are inaccessible • The database is here replicated using disk replication for transaction consistency. You can use native database replication (as on slide 28) only if it is synchronous, or asynchronously only if you have no need of GPFS (see slide 11). DB primary Data App DB secondary Data HADR Load balancer or DNS GPFSGPFS GPFS Data GPFSGPFS GPFS Data App System A System B DB primary Data DB secondary Data HADR Replication
  • 30. Scenario: horizontal scaling and bursting • Use of the base scaling policy allows you to horizontally scale, manually or in some cases automatically, new instances of a virtual machine with clustered software components. • When using multi–system deployment, horizontally scaled virtual machines will be distributed as much as possible across systems referenced in your environment profile • An alternate approach, especially in heterogeneous environments like PureApplication System and PureApplication Service, is to deploy new pattern instances for scaling or bursting, and federate them together.
  • 32. Caveats: Networking considerations • Some middleware is sensitive to IP addresses and hostnames (e.g., WAS) and for DR purposes you may need to plan to duplicate either IP addresses or hostnames in your backup data center • Both HA architectures and zero–RPO DR architectures are sensitive to latency. If latency is too high you can experience poor write throughput or even mirroring or replication failure. For these cases you should ideally plan for less than 1ms (~100km) of latency between sites. • You must also plan for adequate network throughput between sites when mirroring or replicating. • HA architectures require the use of a tiebreaker to govern quorum–leader determination in case of a network split. In a multi–site HA design, you should plan to locate the quorum at a third location, with equally low latency.
  • 33. Caveats: Middleware–specific considerations • Combining both mirroring and replication (Active–Active–Passive–Passive) –The IBM GPFS pattern does not support combining both mirroring and replication –This combination is possible for other middleware (e.g., DB2 as on slide 29), but you must manually determine and designate which instance is Primary or Secondary at the time of recovery • Read carefully your middleware’s recommendations for configuring HA. For example: –IBM WebSphere recommends against cross–site cells –The IBM DB2 HADR pattern preconfigures a reservationless IP–based tiebreaker, which is not recommended –IBM DB2 HADR provides a variety of synchronization modes with different RPO characteristics • Ensure your middleware tolerates attaching existing storage if you replicate or copy volumes –The IBM DB2 HADR pattern requires an empty disk when first deploying. You can attach a new disk or replicate into this disk only after deployment. –The IBM GPFS pattern does not support attaching existing GPFS disks
  • 34. Caveats: Virtual machine backup and restore The power and flexibility of PureApplication patterns means that your PureApplication VMs are tightly integrated both within a single deployment, and with the system on which they are deployed. Because of this tight integration, you cannot use backup and restore techniques to recover your PureApplication VMs unless you are recovering to the exact same virtual machine that was previously backed up. Your cloud strategy for recovering corrupted deployments should build on the efficiency and repeatability of patterns so that you are able to re–deploy in the event of extreme failure scenarios such as accidental virtual machine deletion or total system failure.
  • 35. Caveats: Practice, practice, practice Because of the complexity of HA and DR implementation, and especially because of some of the caveats we have noted and which you may encounter in your unique situation, it is vital for you to practice all aspects of your HA or DR implementation and lifecycle before you roll it out into production. This includes testing network bandwidth and latency to their expected limits. It also includes simulating failures and verifying and perfecting your procedures for recovery and also for failback.
  • 37. Resources • Implementing High Availability and Disaster Recovery in IBM PureApplication Systems V2 http://www.redbooks.ibm.com/abstracts/sg248246.html • “Implement multisystem management and deployment with IBM PureApplication System” http://www.ibm.com/developerworks/websphere/techjournal/1506_vanrun/1506_vanrun- trs.html • “Demystifying virtual machine placement in IBM PureApplication System” http://www.ibm.com/developerworks/websphere/library/techarticles/1605_moonen- trs/1605_moonen.html
  • 38. Resources, continued • “High availability (again) versus continuous availability” http://www.ibm.com/developerworks/websphere/techjournal/1004_webcon/1004_webcon.ht ml • “Can I run a WebSphere Application Server cell over multiple data centers?” http://www.ibm.com/developerworks/websphere/techjournal/0606_col_alcott/0606_col_alcot t.html#sec1d • “Increase DB2 availability” http://www.ibm.com/developerworks/data/library/techarticle/dm-1406db2avail/index.html • “HADR sync mode” https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/pag e/HADR%20sync%20mode