SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Proibida cópia ou divulgação sem
permissão escrita do CMG Brasil.
GDPS/Active-Active and Load
Balancing via Server/Application
State Protocol (SASP)
Dr. Steve Guendert
Brocade Communications
sguender@brocade.com
@BRCD_DrSteve
Trademarks, notices, and disclaimers
• Advanced Peer-to-Peer
Networking®
• AIX®
• alphaWorks®
• AnyNet®
• AS/400®
• BladeCenter®
• Candle®
• CICS®
• DataPower®
• DB2 Connect
• DB2®
• DRDA®
• e-business on demand®
• e-business (logo)
• e business(logo)®
• ESCON®
• FICON®
• GDDM®
• GDPS®
• Geographically Dispersed
Parallel Sysplex
• HiperSockets
• HPR Channel Connectivity
• HyperSwap
• i5/OS (logo)
• i5/OS®
• IBM eServer
• IBM (logo)®
• IBM®
• IBM zEnterprise™ System
• IMS
• InfiniBand ®
• IP PrintWay
• IPDS
• iSeries
• LANDP®
• Language Environment®
• MQSeries®
• MVS
• NetView®
• OMEGAMON®
• Open Power
• OpenPower
• Operating System/2®
• Operating System/400®
• OS/2®
• OS/390®
• OS/400®
• Parallel Sysplex®
• POWER®
• POWER7®
• PowerVM
• PR/SM
• pSeries®
• RACF®
• Rational Suite®
• Rational®
• Redbooks
• Redbooks (logo)
• Sysplex Timer®
• System i5
• System p5
• System x®
• System z®
• System z9®
• System z10
• Tivoli (logo)®
• Tivoli®
• VTAM®
• WebSphere®
• xSeries®
• z9®
• z10 BC
• z10 EC
• zEnterprise
• zSeries®
• z/Architecture
• z/OS®
• z/VM®
• z/VSE
The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States or other countries or both:
• Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
• Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from.
• Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
• Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
• InfiniBand is a trademark and service mark of the InfiniBand Trade Association.
• Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
• UNIX is a registered trademark of The Open Group in the United States and other countries.
• Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
• ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
• IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
Notes:
• Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any
user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload
processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
• IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
• All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have
achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
• This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to
change without notice. Consult your local IBM business contact for information on the product or services available in your area.
• All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
• Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the
performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
• Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
Refer to www.ibm.com/legal/us for further legal information.
The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States or other countries or both:
* All other products may be
trademarks or registered
trademarks of their
respective companies.
Trademarks, notices, and disclaimers
Abstract
The GDPS/Active-Active sites concept is a fundamental paradigm shift in
disaster recovery from a failover model to a continuous avail-
ability model. GDPS/Active-Active consists of two sites, separated by
virtually unlimited distances, running the same applications and having
the same data to provide cross site workload balancing and continuous
availability. One of the key components to a GDPS/Active-Active
solution are external load balancing IP routers that balance workloads
through the Server/Application State Protocol (SASP). This session will
discuss the GDPS/Active-Active workload balancing function, with a
focus on SASP, the router functionality, and how it functions in
conjunction with the IBM Multi-site Workload Lifeline for z/OS.
3
Agenda
• Business Continuity vs. IT Resiliency
• Introduction to GDPS Active-Active
• Requirements-hardware and software
• Server Application State Protocol (SASP) Overview
– Motivation and high level overview of the protocol
• Overview of IBM solutions based on SASP
– z/OS Sysplex Clusters (z/OS Load Balancing Advisor)
– Active/Active - Next generation of IBM’s disaster recovery technology (Multi-site
Workload Lifeline product)
• Conclusion and questions.
4
BUSINESS CONTINUITY
IT RESILIENCY
Definitions
5
Lightning Strikes
Hurricanes / Cyclones
Cut cables and powerData theft and
security breaches
Overloaded lines and
infrastructure
Earthquake
Tornadoes and other storms
Tsunami
Terrorism
Why Worry?
6
IT resilience
• The ability to rapidly adapt and respond to any internal or
external disruption, demand, or threat and continue
business operations without significant impact.
– Continuous/near continuous application availability (CA)
– Planned and unplanned outages
• Broader in scope than disaster recovery (DR)
– DR concentrates solely on recovering from unplanned events
• **Bottom Line**
– Business continuance is no longer simply IT DR
7
RTO
• Recovery Time Objective (RTO)
– A metric for how long it takes to recover the
application and resume operations after a planned or
unplanned outage
– How long your business can afford to wait for IT
services to be resumed.
– How much pain can you take?
– Days, hours, or minutes?
8
RPO
• Recovery Point Objective (RPO)
– A metric for how much data is lost
– The actual recovery point to which all data is current
and consistent.
– How much data your company is willing to recreate
following an outage.
• What is the acceptable time difference between the data in
your production system and the data at the recovery site?
9
10
IBM Confidential May 12th
2011
Tiers of Disaster Recovery
MissionMission
CriticalCritical
Not Critical
Somewhat
Critical
Time to Recover
(hrs)
15 Min. 1-4 4 -6 8-12 12-16 24
Value
Tiers based on Share Group 1992
*PTAM = Pickup Truck Access Method
6-8 72
Tier 1 - PTAM*
Tier 2 - PTAM, Hot Site
Point-in-Time Backup
Active
Secondary Site
Tier 3 - Electronic Vaulting
Tier 4 - Batch/Online database shadowing & journaling,
repetitive PiT copies, fuzzy copy disk mirroring
Tier 5 - software two site, two phase commit (transaction
integrity); or repetitive PiT copies w/ small data loss
Tier 6 - Near zero or zero Data Loss remote disk mirroring helping with
data integrity and data consistency
Tier 7 - Near zero or zero Data Loss: Highly automated takeover on
a complex-wide or business-wide basis, using remote disk mirroring
Dedicated Remote Hot Site
GDPS/PPRC HyperSwap Manager
RTO depends on customer automation;
RPO 0
GDPS/XRC GDPS/Global Mirror
RTO < 2 hr; RPO < 1min
GDPS/PPRC
RTO < 1 hr; RPO 0Tier 8 – Active/Active Sites
Application/workload level HA. Automatic monitoring.
Automatic workload routing/recovery. Uses async replication
between sites.
RTO < 1 min;
RPO < 3 sec
1 Min.
Failover models can only achieve so much in improving RTO
RTO and RPO
• Cost tradeoffs- balancing need vs. afford
11
Cost of business continuity solution versus cost of outage
Guendert: Revisiting Business Continuity and Disaster Recovery Planning and Performance For 21st
Century Regional
Disasters: The case for GDPS. Journal of Computer Resource Management. Summer 2007
INTRODUCTION TO GDPS
ACTIVE-ACTIVE
12
13
IBM Confidential May 12th
2011
Availability and the IBM
Mainframe
1964 1982-1990 1995
Single system Mutliple partitions/images Parallel Sysplex
14
IBM Confidential May 12th
2011
Availability and the IBM
Mainframe
1998
GDPS/PPRC
Parallel Sysplex
Synchronous Mirroring
HyperSwap for primary disk
failures
Near-continuous Availability
Limitation: Performance at
distance
15
IBM Confidential May 12th
2011
What are GDPS/PPRC customers
doing today?
• GDPS/PPRC, based upon a multi-site Parallel Sysplex and synchronous disk
replication, is a metro area Continuous Availability (CA) and Disaster
Recovery solution (DR)
• GDPS/PPRC supports two configurations:
– Active/standby or single site workload
– Active/active or multi-site workload
• Some customers have deployed GDPS/PPRC active/active configurations
– All critical data must be PPRCed and HyperSwap enabled
– All critical CF structures must be duplexed
– Applications must be parallel sysplex enabled
– Signal latency will impact OLTP thru-put and batch duration resulting in the sites
being separated by no more than a couple tens of KM (fiber)
• Issue: the GDPS/PPRC active/active configuration does not provide
enough site separation for some enterprises
16
IBM Confidential May 12th
2011
What are GDPS/XRC & GDPS/GM customers
doing today?
• GDPS/XRC and GDPS/GM, based upon asynchronous disk
replication, are unlimited distance DR solutions
• The current GDPS async replication products require the failed site’s
workload to be restarted in the recovery site and this typically will take
30-60 min
– Power fail consistency
– Transaction consistency
• There are no identified extensions to the existing GDPS async
replication products that will allow the RTO to be substantially
reduced.
• Issue: GDPS/XRC and GDPS/GM will not achieve an RTO of
seconds being requested by some enterprises
17
IBM Confidential May 12th
2011
What are GDPS customers doing
today ?
Two Data Centers
Rapid Systems Disaster
Recovery with “seconds” of
Data Loss
Disaster recovery for out of
region interruptions
Multi-site workloads can
withstand site and/or
storage failures
Two Data Centers
Systems remain active
Continuous Availability /
Disaster Recovery within
a Metropolitan Region
GDPS/PPRC
RPO=0 & RTO<1 hr
Continuous Availability
Regionally and Disaster
Recovery Extended
Distance
Continuous Availability of
Data within a Data Center
Continuous access to data
in the event of a storage
subsystem outage
Single Data Center
Applications remain active
GDPS/HyperSwap Mgr
RPO=0 & RTO=0
Disaster Recovery at
Extended Distance
GDPS/GM & GDPS/XRC
RPO secs & RTO <1 hr
Three Data Centers
High availability for site
disasters
Disaster recovery for
regional disasters
GDPS/MGM & GDPS/MzGM
A B
C
18
IBM Confidential May 12th
2011
Drivers for improvements in HA
and DR
• Interagency Paper on Sound Practices to Strengthen the Resilience of
the U.S. Financial System [Docket No. R-1128] (April 7, 2003)
– Focus on mission critical workloads, their recovery and
resumption of normal processing
• Cost of an outage
– Financial
– Reputation
• Global Business Model
– 24x7 processing
– Planned outage avoidance
Cost of Downtime by Industry
Industry Sector Loss per Hour
Financial $8,213,470
Telecommunications $4,611,604
Information Technology $3,316,058
Insurance $2,582,382
Pharmaceuticals $2,058,710
Energy $1,468,798
Transportation $1,463,128
Banking $1,145,129
Chemicals $1,071,404
Consumer Products $989,795
Source: Robert Frances Group 2006, “Picking up the value of PKI:
Leveraging z/OS for Improving Manageability, Reliability, and Total
Cost of Ownership of PKI and Digital Certificates.”
19
IBM Confidential May 12th
2011
Customer requirements for
HA/DR/BC in 2013
• Shift focus from a failover model to a nearly-continuous availability model
(RTO near zero)
• Access data from any site (unlimited distance between sites)
• No application changes
• Multi-sysplex,multi-platform solution
– “Recover my business rather than my platform technology”
• Ensure successful recovery via automated processes (similar to GDPS
technology today).
– Can be handled by less-skilled operators
• Provide workload distribution between sites (route around failed sites,
dynamically select sites based on ability of site to handle additional
workload).
• Provide application level granularity
– Some workloads may require immediate access from every site, other
workloads may only need to update other sites every 24 hours (less
critical data).
– Current solutions employ an all-or-nothing approach (complete disk
mirroring, requiring extra network capacity).
IBM GDPS active/active
• Long distance disaster recovery with only seconds of impact
• Continuous availability
• Fundamental paradigm shift from a failover model to a near continuous
availability model.
• Allows for unlimited distance replication with only seconds of user impact if
there is a site disaster.
• Uses software based replication and techniques for copying the data between
sites
• Provides control over which workloads are being protected.
• GDPS automation provides an end to end automated solution
– Helps manage the availability of the workload
– Coordination point/ controller for activities including being a focal point for
operating and monitoring the solution and readiness for recovery.
20
21
IBM Confidential May 12th
2011
Active/Active concepts
New York
Zurich
Replication
Data at geographically dispersed sites are kept in sync via replication
Workloads are managed by a client and routed to
one of many replicas, depending upon workload
weight and latency constraints … extends workload
balancing to SYSPLEXs across multiple sites!
Workload
Distributor
Load Balancing with SASP
(z/OS Comm Server)
Transactions
Two or more sites, separated
by unlimited distances, running
the same applications & having the
same data to provide:
– Cross-site Workload Balancing
– Continuous Availability
– Disaster Recovery
22
IBM Confidential May 12th
2011
Active/Active concepts
New York
Zurich
Replication
Tivoli Enterprise Portal
Monitoring spans the sites and now becomes
an essential element of the solution for site
health checks, performance tuning, etc.
Workload
Distributor
Load Balancing with SASP
(z/OS Comm Server)
Transactions
Two or more sites, separated
by unlimited distances, running
the same applications & having the
same data to provide:
– Cross-site Workload Balancing
– Continuous Availability
– Disaster Recovery
GDPS ACTIVE-ACTIVE
REQUIREMENTS
Hardware and software supporting SASP
23
24
IBM Confidential May 12th
2011
Conceptual view of GDPS
Active-Active
Active
Production
Workload
TransactionsTransactions
Workload
Distribution
Standby
Production
Workload
Controllers
S/W Replication
Control information passed between systems and workload distributor
Workload Routing
to active sysplex
25
IBM Confidential May 12th
2011
GDPS Active-Active High level architecture
GDPS/Active-Active
DB2
System z Hardware
SA zOS
NetView
z/OS
IMS
Replication
Lifeline
Replication
TCP/IPMQ Workload
Monitoring
26
IBM Confidential May 12th
2011
GDPS/Active-Active Software Components
• Integration of a number of software
products
– z/OS 1.11 or higher
– IBM Multi-site Workload Lifeline v1.1
– IBM Tivoli NetView for z/OS v6.1
– IBM Tivoli Monitoring v6.2.2 FP3
– IBM InfoSphere Replication Server for z/OS v10.1
– IBM InfoSphere IMS Replication for z/OS v10.1
– System Automation for z/OS v3.3
– GDPS/Active-Active v1.1
– Optionally the OMEGAMON suite of monitoring tools to
provide additional insight
Replication
• IBM InfoSphere Replication Server for z/OS v10.1
– Runs on production images where required to capture (active) and
apply (standby) data updates for DB2 data. Relies on MQ as the data
transport mechanism (QREP).
• IBM InfoSphere IMS Replication for z/OS v10.1
– Runs on production images where required to capture (active) and
apply (standby) data updates for IMS data. Relies on TCPIP as the
data transport mechanism.
• System Automation for z/OS v3.3
– Runs on all images. Provides a number of critical functions:
• Remote communications capability to enable GDPS
to manage sysplexes from outside the sysplex
• System Automation infrastructure for workload and
server management
27
• Not hardware based mirroring
28
IBM Confidential May 12th
2011
GDPS/Active-Active Hardware components
• Two Production Sysplex environments (also referred to as sites) in different
locations
– One active, one standby – for each defined workload
– Software-based replication between the two sysplexes/sites
• IMS and DB2 data is supported
• Two Controller Systems
– Primary/Backup
– Typically one in each of the production locations, but there is no
requirement that they are co-located in this way
• Workload balancing/routing switches
– Must be Server/Application State Protocol compliant (SASP)
• RFC4678 describes SASP
SASP
Basics of how it works and how workloads get balanced
29
Agenda
• Server Application State Protocol (SASP) Overview
– Motivation and high level overview of the protocol
• Overview of IBM solutions based on SASP
– z/OS Sysplex Clusters (z/OS Load Balancing Advisor)
– Active/Active - Next generation of IBM’s disaster recovery technology
(Multi-site Workload Lifeline product)
The ability to distribute work across equal servers...
factoring in the availability of server resources,
factoring in the business importance of the work,
estimating the liklihood of meeting objectives,
avoiding over-utilized servers, where possible,
factoring in down-stream server dependencies.
Traffic Weight recommendations to Load
Balancers
Server/Application State Protocol
(SASP) Objectives
• Provide a mechanism for workload managers to give distribution
recommendations to load balancers.
• Must be lightweight – minimal:
– implementation complexity
– processing overhead
– additional user configuration
• Must be extensible
• SASP will not handle the transport or actual distribution of work, only
give recommendations
• Open protocol documented in RFC4678:
– http://www.faqs.org/rfcs/rfc4678.html
SASP - High Level
Architecture
Group Workload
Manager (GWM)
Individual Workload
Manager
Member
Requests
Load
Balancer
Request
Origins
Individual Workload
Manager
Member
Individual Workload
Manager
Member
Requests
SASP
• Load Balancer uses SASP
to register members that is it
is interested in load
balancing
• Group Workload Manager
(GWM) provides for a single
point of contact for the entire
cluster
Cluster
Overview of SASP protocol flows
SetLBStateRequest/Response
LB UUID, WeightServerConnector
WeightRegisterRequest/Response
Group1: prot,port,IP@1,IP@2,..
Group2: prot,port,IP@1,IP@2,..
GetWeightRequest/Response
Group1, Group2, ..
SendWeight
Group1, weights
Group2, weights
WeightDeRegisterRequest/Response
Group1, Group2, ...
LB
3
2
GWM
4
5
6
1
7 Close TCP connection
TCP Connect(DVIPAx, port 3860)
One time actions
Performed on a one time
basis or when LB
configuration changes
Actions performed on
periodic interval basis or
when state of systems
and/or applications change
Registering interest in target applications using SASP
• Load Balancer registers Groups of clustered servers it is interested in load balancing
– Each group designates an application cluster to be load balanced
• Each group consists of a list of members (i.e. target servers)
– System-level cluster: A list of target Systems identified by IP address (in lieu of individual application
servers)
• Recommendations returned in this scenario are also at a System-level
• No specific target application information returned in this case
– Application-level cluster: A list of applications comprising the “load balancing” group
• Allows specific recommendations to be provided for each target socket application
– vs providing the same recommendation for all application instances running on the same
host
• Identified by protocol (TCP/UDP), IP address of the target system they reside on, and the port the
application is using.
• SASP allows for target servers in a load balancing group to use different ports (and even different
protocols TCP/UDP)
– Probably not applicable for real application workloads
– Supports IPv4 and IPv6
• Both for identifying members (target servers) and for the actual communications to the GWM (i.e.
SASP connections)
Frequency of SASP communications
• SASP supports both a "push" and a "pull" model for updating the load
balancer with workload recommendations
– Load balancer tells GWM which model it wants to use
• "Pull" model
– GWM "suggests" a polling interval to the load balancer
• z/OS Load Balancing Advisor uses the configurable update_interval value for this
purposcte
– Load balancer has the option to ignore this value
• Load balancer requests updates each polling interval
• "Push" model
– GWM sends updated information to the load balancer on an interval basis
• z/OS Load Balancing Advisor uses the configurable update_interval value for this
purpose
– GWM may send data more frequently than the interval period
– Active/Active Multi-site Workload Lifeline product requires “push” to be enabled
• Load balancer determines whether it wants information about all members it
registered or only changed information about its registered members
 SASP Header
 Member Data
 Weight Entry
Basic Protocol Components
• Member State Instance
• Group Data
Protocol
Port
IP Address
Label length
Label
Member Data Contact Flag
Quiesce Flag
Registration Flag
Weight
Version
Message Length
Message ID
5 Basic Components are used throughout the protocol
Member Data
Opaque State
Quiesce Flag
LB UID Length
LB UID
Group Name Length
Group Name
 Group of Member Data
Group Protocol Components
• Group of Weight Data
• Group of Member State Data
Group Data
Weight Entry Count
Array of Weight Entry Components
3 Group Components are used throughout the protocol
Group Data
Resource State Instance Count
Array of Resource State Instances
Group Data
Member Data Count
Array of Member Data
Components
IBM solutions supporting SASP
z/OS Load Balancing Advisor
z/OS
Load Balancing
Advisor (GWM)
Request
Origins
Request
Origins
z/OS LB Agent
Resource
Work
Requests
Load
Balancer
Request
Origins
z/OS LB Agent
Resource
z/OS LB Agent
Resource
SASP
Private
Communication
Protocol
Work
Requests
z/OS Sysplex
•z/OS WLM Workload Balancing
•Support for clustered z/OS
servers in a sysplex environment
•Based on sysplex-wide WLM
policy
How does the z/OS LBA calculate weights?
• The weights are composed of several components:
– Available CPU capacity for each target system
– Displaceable capacity for each target application and system
• For systems with high CPU utilization, what is the relative importance of the workload being
load balanced to other workloads on that system
– If a lot of lower important work is running on that system, it can be displaced by this
higher priority workload
– Is the target application server meeting user specified WLM (Workload Manager)
goals for the workload?
• Health of the application – What if CPU capacity exists but the application
infrastructure is experiencing other constraints and abnormal conditions?
– When these conditions are detected, the weights for the “unhealthy” target
application instance is reduced appropriate
– Health conditions monitored:
• Monitoring of TCP backlog queue. What is the application is falling behind in accepting
new connections?
• Application reported health
– APIs are available on the platform that allow applications to report any abnormal
conditions that result in sub optimal workload processing
– memory constraints, thread constraints, resources unavailable
– The next tier server or resource manager is not available (e.g. Database server)
z/OS LBA Solution Overview
TCP/IP
Target
Applications
LB
Agent
LB
Advisor
WLM
TCP/IP
Target
Applications
LB
Agent
WLM
TCP/IP
Target
Applications
LB
Agent
WLM
CISCO
Load Balancer
Load Balancer
SASP
z/OS Sysplex
Work requests
(TCP, UDP)
DVIPAxIPx IPy IPz
SASP Flows (TCP)
Load Balanced Traffic
(TCP/UDP)
LB Agent to LB
Advisor Flows (TCP)
IPx, IPy, IPz
• Target System IP addresses
• Static VIPAs recommended
•Provide redundancy in case
of network adapter failures
DVIPAx
• Application specific DVIPA
(associated with LB Advisor)
• Allows for automatic movement
in failure scenarios
IPa
VIPa
Active/Active Sites use case – Multi-Site
Workload Lifeline
Primary Controller
GDPS software
Workload Manager
(Multi-Site Workload
Lifeline)
sys_a
sys_b
Sysplex 1
Application/database tierSite 1
1st
-Tier LB
Sysplex 2
Application/database tier
Site 2
2nd
-Tier LB
Server
Applications
Server
Applications1st
-Tier LBs
2nd
-Tier LBs
Data
Replication
sys_c
sys_d
Server
Applications
Server
Applications
Data
Replication
2nd
-Tier LB
2nd
-Tier LBs
Secondary Controller
GDPS software
Workload Manager
(Multi-Site Workload
Lifeline)
SASP-enabled
load balancers
Current vendors (Cisco,
F5 Networks, Citrix)
SASP
SASP
IBM zEnterprise System Overview
z/OS
z/OS
z/OS
z/OS
z/VM
LinuxonSystemz
LinuxonSystemz
LinuxonSystemz
PR/SM
Z CPU, Memory and IO
Support Element (SE)
x/Linux
xHypervisor
x/Linux
x/Linux
x/Linux
xHypervisor
x/Linux
x/Linux
LinuxonSystemxxHypervisor
Windows(1)
LinuxonSystemx
x/Linux
xHypervisor
x/Linux
x/Linux
x/Linux
xHypervisor
x/Linux
x/Linux
AIX
pHypervisor
AIX
AIX
DataPower
Future
Future
X86 Blades Power Blades Optimizers
Blade Center Advanced Management Module (AMM)
zBX
zEnterprise Node
SystemzHardwareManagementConsole
WithUnifiedResourceManager
z196
Connecting the pieces with zManager (aka. Unified Resource Manager)!
Future
• Software Defined Networking (SDN) for
SASP
Proibida cópia ou divulgação sem
permissão escrita do CMG Brasil.
CONCLUSION AND
QUESTIONS?
46

Weitere ähnliche Inhalte

Was ist angesagt?

Memory Matters in 2011
Memory Matters in 2011Memory Matters in 2011
Memory Matters in 2011
Martin Packer
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph Temple
Joao Galdino Mello de Souza
 
zEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewzEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware Overview
Felipe Lanzillotta
 

Was ist angesagt? (15)

Memory Matters in 2011
Memory Matters in 2011Memory Matters in 2011
Memory Matters in 2011
 
The IBM z13 - January 14, 2015 - IBM Latin America Hardware Announcement LG15...
The IBM z13 - January 14, 2015 - IBM Latin America Hardware Announcement LG15...The IBM z13 - January 14, 2015 - IBM Latin America Hardware Announcement LG15...
The IBM z13 - January 14, 2015 - IBM Latin America Hardware Announcement LG15...
 
Munich 2016 - Z011599 Martin Packer - More Fun With DDF
Munich 2016 - Z011599 Martin Packer - More Fun With DDFMunich 2016 - Z011599 Martin Packer - More Fun With DDF
Munich 2016 - Z011599 Martin Packer - More Fun With DDF
 
7 opportunities to reduce wlc costs - por Danilo
7 opportunities to reduce wlc costs - por Danilo7 opportunities to reduce wlc costs - por Danilo
7 opportunities to reduce wlc costs - por Danilo
 
Informix IWA data life cycle mgmt & Performance on Intel.
Informix IWA data life cycle mgmt & Performance on Intel.Informix IWA data life cycle mgmt & Performance on Intel.
Informix IWA data life cycle mgmt & Performance on Intel.
 
System z virtualization and security
System z  virtualization and securitySystem z  virtualization and security
System z virtualization and security
 
System Z Mainframe Security For An Enterprise
System Z Mainframe Security For An EnterpriseSystem Z Mainframe Security For An Enterprise
System Z Mainframe Security For An Enterprise
 
Flexible DevOps Deployment of Enterprise Test Environments in the Cloud
Flexible DevOps Deployment of Enterprise Test Environments in the CloudFlexible DevOps Deployment of Enterprise Test Environments in the Cloud
Flexible DevOps Deployment of Enterprise Test Environments in the Cloud
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph Temple
 
zEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewzEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware Overview
 
Server pac 101
Server pac 101Server pac 101
Server pac 101
 
Pack Upgrade Mp2 to INFOR EAM Business Edition, Datastream Arabia INFOR EAM C...
Pack Upgrade Mp2 to INFOR EAM Business Edition, Datastream Arabia INFOR EAM C...Pack Upgrade Mp2 to INFOR EAM Business Edition, Datastream Arabia INFOR EAM C...
Pack Upgrade Mp2 to INFOR EAM Business Edition, Datastream Arabia INFOR EAM C...
 
Phone&Mobile Solutions Datastream Arabia Dubai, INFOR EAM Channel Partner
Phone&Mobile Solutions Datastream Arabia Dubai, INFOR EAM Channel PartnerPhone&Mobile Solutions Datastream Arabia Dubai, INFOR EAM Channel Partner
Phone&Mobile Solutions Datastream Arabia Dubai, INFOR EAM Channel Partner
 
z/OS small enhancements, episode 2018A
z/OS small enhancements, episode 2018Az/OS small enhancements, episode 2018A
z/OS small enhancements, episode 2018A
 
Pack Business Edition INFOR EAM, Datastream Arabia Channel Partner
Pack Business Edition INFOR EAM, Datastream Arabia Channel PartnerPack Business Edition INFOR EAM, Datastream Arabia Channel Partner
Pack Business Edition INFOR EAM, Datastream Arabia Channel Partner
 

Ähnlich wie 14 guendert pres

Ähnlich wie 14 guendert pres (20)

z/OS V2R2 Communications Server Overview
z/OS V2R2 Communications Server Overviewz/OS V2R2 Communications Server Overview
z/OS V2R2 Communications Server Overview
 
2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen2016 02-16-announce-overview-zsp04505 usen
2016 02-16-announce-overview-zsp04505 usen
 
Maximize o valor do z/OS
Maximize o valor do z/OSMaximize o valor do z/OS
Maximize o valor do z/OS
 
Academic Discussion Group Workshop 2018 November 10 st 2018 Nimbix CAPI SNAP...
Academic Discussion  Group Workshop 2018 November 10 st 2018 Nimbix CAPI SNAP...Academic Discussion  Group Workshop 2018 November 10 st 2018 Nimbix CAPI SNAP...
Academic Discussion Group Workshop 2018 November 10 st 2018 Nimbix CAPI SNAP...
 
Building Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksBuilding Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery Networks
 
z/OS V2R3 Communications Server Content Preview
z/OS V2R3 Communications Server Content Previewz/OS V2R3 Communications Server Content Preview
z/OS V2R3 Communications Server Content Preview
 
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herdBenchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herd
 
IBM Wave for z/VM
IBM Wave for z/VMIBM Wave for z/VM
IBM Wave for z/VM
 
z/OS Small Enhancements - Episode 2015A
z/OS Small Enhancements - Episode 2015Az/OS Small Enhancements - Episode 2015A
z/OS Small Enhancements - Episode 2015A
 
Whyifor Was
Whyifor WasWhyifor Was
Whyifor Was
 
Controlling performance in the cloud: taking charge of your hosting environment
Controlling performance in the cloud: taking charge of your hosting environmentControlling performance in the cloud: taking charge of your hosting environment
Controlling performance in the cloud: taking charge of your hosting environment
 
Nrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif PedersenNrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif Pedersen
 
IBM Informix on cloud webcast August 2017
IBM Informix on cloud webcast August 2017IBM Informix on cloud webcast August 2017
IBM Informix on cloud webcast August 2017
 
Ims01 ims trends and directions - IMS UG May 2014 Sydney & Melbourne
Ims01   ims trends and directions - IMS UG May 2014 Sydney & MelbourneIms01   ims trends and directions - IMS UG May 2014 Sydney & Melbourne
Ims01 ims trends and directions - IMS UG May 2014 Sydney & Melbourne
 
z/OS Through V2R1Communications Server Performance Functions Update
z/OS Through V2R1Communications Server Performance Functions Updatez/OS Through V2R1Communications Server Performance Functions Update
z/OS Through V2R1Communications Server Performance Functions Update
 
z/OS Small Enhancements - Episode 2015B
z/OS Small Enhancements - Episode 2015Bz/OS Small Enhancements - Episode 2015B
z/OS Small Enhancements - Episode 2015B
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Unisanta - Visão Geral de hardware Servidor IBM System z
Unisanta - Visão Geral de hardware Servidor IBM System zUnisanta - Visão Geral de hardware Servidor IBM System z
Unisanta - Visão Geral de hardware Servidor IBM System z
 
Linux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
Linux on Z13 and Simulatenus Multithreading - Sebastien LlaurencyLinux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
Linux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
 
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
 

Mehr von Rodrigo Campos

Desempenho e Escalabilidade de Banco de Dados em ambiente x86
Desempenho e Escalabilidade de Banco de Dados em ambiente x86Desempenho e Escalabilidade de Banco de Dados em ambiente x86
Desempenho e Escalabilidade de Banco de Dados em ambiente x86
Rodrigo Campos
 
Cloud Computing Oportunidades e Desafios
Cloud Computing Oportunidades e DesafiosCloud Computing Oportunidades e Desafios
Cloud Computing Oportunidades e Desafios
Rodrigo Campos
 
CMG 2012 - Tuning where it matters - Gerry Tuddenham
CMG 2012 - Tuning where it matters - Gerry TuddenhamCMG 2012 - Tuning where it matters - Gerry Tuddenham
CMG 2012 - Tuning where it matters - Gerry Tuddenham
Rodrigo Campos
 
Racionalização e Otimização de Energia em Computação na Nuvem
Racionalização e Otimização de Energia em Computação na NuvemRacionalização e Otimização de Energia em Computação na Nuvem
Racionalização e Otimização de Energia em Computação na Nuvem
Rodrigo Campos
 

Mehr von Rodrigo Campos (20)

Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOps
 
DevOps no mundo real - QCON 2014
DevOps no mundo real - QCON 2014DevOps no mundo real - QCON 2014
DevOps no mundo real - QCON 2014
 
7Masters Webops in the Cloud
7Masters Webops in the Cloud7Masters Webops in the Cloud
7Masters Webops in the Cloud
 
Large and Giant Pages
Large and Giant PagesLarge and Giant Pages
Large and Giant Pages
 
Otimização holistica de ambiente computacional
Otimização holistica de ambiente computacionalOtimização holistica de ambiente computacional
Otimização holistica de ambiente computacional
 
Desempenho e Escalabilidade de Banco de Dados em ambiente x86
Desempenho e Escalabilidade de Banco de Dados em ambiente x86Desempenho e Escalabilidade de Banco de Dados em ambiente x86
Desempenho e Escalabilidade de Banco de Dados em ambiente x86
 
13 coelho final-pres
13 coelho final-pres13 coelho final-pres
13 coelho final-pres
 
Mistério ou tecnologia? Paralelismo!
Mistério ou tecnologia? Paralelismo!Mistério ou tecnologia? Paralelismo!
Mistério ou tecnologia? Paralelismo!
 
z/VM Performance Analysis
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance Analysis
 
Sistemas de proteção de perímetro
Sistemas de proteção de perímetroSistemas de proteção de perímetro
Sistemas de proteção de perímetro
 
Devops at Walmart GeC Brazil
Devops at Walmart GeC BrazilDevops at Walmart GeC Brazil
Devops at Walmart GeC Brazil
 
Disk IO Benchmarking in shared multi-tenant environments
Disk IO Benchmarking in shared multi-tenant environmentsDisk IO Benchmarking in shared multi-tenant environments
Disk IO Benchmarking in shared multi-tenant environments
 
Cloud Computing Oportunidades e Desafios
Cloud Computing Oportunidades e DesafiosCloud Computing Oportunidades e Desafios
Cloud Computing Oportunidades e Desafios
 
The good, the bad and the big... data
The good, the bad and the big... dataThe good, the bad and the big... data
The good, the bad and the big... data
 
CMG 2012 - Tuning where it matters - Gerry Tuddenham
CMG 2012 - Tuning where it matters - Gerry TuddenhamCMG 2012 - Tuning where it matters - Gerry Tuddenham
CMG 2012 - Tuning where it matters - Gerry Tuddenham
 
A Consumerização da TI e o Efeito BYOT
A Consumerização da TI e o Efeito BYOTA Consumerização da TI e o Efeito BYOT
A Consumerização da TI e o Efeito BYOT
 
CMG Brasil 2012 - Uso de Lines nos z196
CMG Brasil 2012 - Uso de Lines nos z196CMG Brasil 2012 - Uso de Lines nos z196
CMG Brasil 2012 - Uso de Lines nos z196
 
Racionalização e Otimização de Energia em Computação na Nuvem
Racionalização e Otimização de Energia em Computação na NuvemRacionalização e Otimização de Energia em Computação na Nuvem
Racionalização e Otimização de Energia em Computação na Nuvem
 
SDN - Openflow + OpenVSwitch + Quantum
SDN - Openflow + OpenVSwitch + QuantumSDN - Openflow + OpenVSwitch + Quantum
SDN - Openflow + OpenVSwitch + Quantum
 
AWS RDS Benchmark - CMG Brasil 2012
AWS RDS Benchmark - CMG Brasil 2012AWS RDS Benchmark - CMG Brasil 2012
AWS RDS Benchmark - CMG Brasil 2012
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

14 guendert pres

  • 1. Proibida cópia ou divulgação sem permissão escrita do CMG Brasil. GDPS/Active-Active and Load Balancing via Server/Application State Protocol (SASP) Dr. Steve Guendert Brocade Communications sguender@brocade.com @BRCD_DrSteve
  • 2. Trademarks, notices, and disclaimers • Advanced Peer-to-Peer Networking® • AIX® • alphaWorks® • AnyNet® • AS/400® • BladeCenter® • Candle® • CICS® • DataPower® • DB2 Connect • DB2® • DRDA® • e-business on demand® • e-business (logo) • e business(logo)® • ESCON® • FICON® • GDDM® • GDPS® • Geographically Dispersed Parallel Sysplex • HiperSockets • HPR Channel Connectivity • HyperSwap • i5/OS (logo) • i5/OS® • IBM eServer • IBM (logo)® • IBM® • IBM zEnterprise™ System • IMS • InfiniBand ® • IP PrintWay • IPDS • iSeries • LANDP® • Language Environment® • MQSeries® • MVS • NetView® • OMEGAMON® • Open Power • OpenPower • Operating System/2® • Operating System/400® • OS/2® • OS/390® • OS/400® • Parallel Sysplex® • POWER® • POWER7® • PowerVM • PR/SM • pSeries® • RACF® • Rational Suite® • Rational® • Redbooks • Redbooks (logo) • Sysplex Timer® • System i5 • System p5 • System x® • System z® • System z9® • System z10 • Tivoli (logo)® • Tivoli® • VTAM® • WebSphere® • xSeries® • z9® • z10 BC • z10 EC • zEnterprise • zSeries® • z/Architecture • z/OS® • z/VM® • z/VSE The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States or other countries or both: • Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. • Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. • Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. • Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. • InfiniBand is a trademark and service mark of the InfiniBand Trade Association. • Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. • UNIX is a registered trademark of The Open Group in the United States and other countries. • Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. • ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. • IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce. Notes: • Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. • IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. • All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. • This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. • All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. • Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. • Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. Refer to www.ibm.com/legal/us for further legal information. The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States or other countries or both: * All other products may be trademarks or registered trademarks of their respective companies. Trademarks, notices, and disclaimers
  • 3. Abstract The GDPS/Active-Active sites concept is a fundamental paradigm shift in disaster recovery from a failover model to a continuous avail- ability model. GDPS/Active-Active consists of two sites, separated by virtually unlimited distances, running the same applications and having the same data to provide cross site workload balancing and continuous availability. One of the key components to a GDPS/Active-Active solution are external load balancing IP routers that balance workloads through the Server/Application State Protocol (SASP). This session will discuss the GDPS/Active-Active workload balancing function, with a focus on SASP, the router functionality, and how it functions in conjunction with the IBM Multi-site Workload Lifeline for z/OS. 3
  • 4. Agenda • Business Continuity vs. IT Resiliency • Introduction to GDPS Active-Active • Requirements-hardware and software • Server Application State Protocol (SASP) Overview – Motivation and high level overview of the protocol • Overview of IBM solutions based on SASP – z/OS Sysplex Clusters (z/OS Load Balancing Advisor) – Active/Active - Next generation of IBM’s disaster recovery technology (Multi-site Workload Lifeline product) • Conclusion and questions. 4
  • 6. Lightning Strikes Hurricanes / Cyclones Cut cables and powerData theft and security breaches Overloaded lines and infrastructure Earthquake Tornadoes and other storms Tsunami Terrorism Why Worry? 6
  • 7. IT resilience • The ability to rapidly adapt and respond to any internal or external disruption, demand, or threat and continue business operations without significant impact. – Continuous/near continuous application availability (CA) – Planned and unplanned outages • Broader in scope than disaster recovery (DR) – DR concentrates solely on recovering from unplanned events • **Bottom Line** – Business continuance is no longer simply IT DR 7
  • 8. RTO • Recovery Time Objective (RTO) – A metric for how long it takes to recover the application and resume operations after a planned or unplanned outage – How long your business can afford to wait for IT services to be resumed. – How much pain can you take? – Days, hours, or minutes? 8
  • 9. RPO • Recovery Point Objective (RPO) – A metric for how much data is lost – The actual recovery point to which all data is current and consistent. – How much data your company is willing to recreate following an outage. • What is the acceptable time difference between the data in your production system and the data at the recovery site? 9
  • 10. 10 IBM Confidential May 12th 2011 Tiers of Disaster Recovery MissionMission CriticalCritical Not Critical Somewhat Critical Time to Recover (hrs) 15 Min. 1-4 4 -6 8-12 12-16 24 Value Tiers based on Share Group 1992 *PTAM = Pickup Truck Access Method 6-8 72 Tier 1 - PTAM* Tier 2 - PTAM, Hot Site Point-in-Time Backup Active Secondary Site Tier 3 - Electronic Vaulting Tier 4 - Batch/Online database shadowing & journaling, repetitive PiT copies, fuzzy copy disk mirroring Tier 5 - software two site, two phase commit (transaction integrity); or repetitive PiT copies w/ small data loss Tier 6 - Near zero or zero Data Loss remote disk mirroring helping with data integrity and data consistency Tier 7 - Near zero or zero Data Loss: Highly automated takeover on a complex-wide or business-wide basis, using remote disk mirroring Dedicated Remote Hot Site GDPS/PPRC HyperSwap Manager RTO depends on customer automation; RPO 0 GDPS/XRC GDPS/Global Mirror RTO < 2 hr; RPO < 1min GDPS/PPRC RTO < 1 hr; RPO 0Tier 8 – Active/Active Sites Application/workload level HA. Automatic monitoring. Automatic workload routing/recovery. Uses async replication between sites. RTO < 1 min; RPO < 3 sec 1 Min. Failover models can only achieve so much in improving RTO
  • 11. RTO and RPO • Cost tradeoffs- balancing need vs. afford 11 Cost of business continuity solution versus cost of outage Guendert: Revisiting Business Continuity and Disaster Recovery Planning and Performance For 21st Century Regional Disasters: The case for GDPS. Journal of Computer Resource Management. Summer 2007
  • 13. 13 IBM Confidential May 12th 2011 Availability and the IBM Mainframe 1964 1982-1990 1995 Single system Mutliple partitions/images Parallel Sysplex
  • 14. 14 IBM Confidential May 12th 2011 Availability and the IBM Mainframe 1998 GDPS/PPRC Parallel Sysplex Synchronous Mirroring HyperSwap for primary disk failures Near-continuous Availability Limitation: Performance at distance
  • 15. 15 IBM Confidential May 12th 2011 What are GDPS/PPRC customers doing today? • GDPS/PPRC, based upon a multi-site Parallel Sysplex and synchronous disk replication, is a metro area Continuous Availability (CA) and Disaster Recovery solution (DR) • GDPS/PPRC supports two configurations: – Active/standby or single site workload – Active/active or multi-site workload • Some customers have deployed GDPS/PPRC active/active configurations – All critical data must be PPRCed and HyperSwap enabled – All critical CF structures must be duplexed – Applications must be parallel sysplex enabled – Signal latency will impact OLTP thru-put and batch duration resulting in the sites being separated by no more than a couple tens of KM (fiber) • Issue: the GDPS/PPRC active/active configuration does not provide enough site separation for some enterprises
  • 16. 16 IBM Confidential May 12th 2011 What are GDPS/XRC & GDPS/GM customers doing today? • GDPS/XRC and GDPS/GM, based upon asynchronous disk replication, are unlimited distance DR solutions • The current GDPS async replication products require the failed site’s workload to be restarted in the recovery site and this typically will take 30-60 min – Power fail consistency – Transaction consistency • There are no identified extensions to the existing GDPS async replication products that will allow the RTO to be substantially reduced. • Issue: GDPS/XRC and GDPS/GM will not achieve an RTO of seconds being requested by some enterprises
  • 17. 17 IBM Confidential May 12th 2011 What are GDPS customers doing today ? Two Data Centers Rapid Systems Disaster Recovery with “seconds” of Data Loss Disaster recovery for out of region interruptions Multi-site workloads can withstand site and/or storage failures Two Data Centers Systems remain active Continuous Availability / Disaster Recovery within a Metropolitan Region GDPS/PPRC RPO=0 & RTO<1 hr Continuous Availability Regionally and Disaster Recovery Extended Distance Continuous Availability of Data within a Data Center Continuous access to data in the event of a storage subsystem outage Single Data Center Applications remain active GDPS/HyperSwap Mgr RPO=0 & RTO=0 Disaster Recovery at Extended Distance GDPS/GM & GDPS/XRC RPO secs & RTO <1 hr Three Data Centers High availability for site disasters Disaster recovery for regional disasters GDPS/MGM & GDPS/MzGM A B C
  • 18. 18 IBM Confidential May 12th 2011 Drivers for improvements in HA and DR • Interagency Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System [Docket No. R-1128] (April 7, 2003) – Focus on mission critical workloads, their recovery and resumption of normal processing • Cost of an outage – Financial – Reputation • Global Business Model – 24x7 processing – Planned outage avoidance Cost of Downtime by Industry Industry Sector Loss per Hour Financial $8,213,470 Telecommunications $4,611,604 Information Technology $3,316,058 Insurance $2,582,382 Pharmaceuticals $2,058,710 Energy $1,468,798 Transportation $1,463,128 Banking $1,145,129 Chemicals $1,071,404 Consumer Products $989,795 Source: Robert Frances Group 2006, “Picking up the value of PKI: Leveraging z/OS for Improving Manageability, Reliability, and Total Cost of Ownership of PKI and Digital Certificates.”
  • 19. 19 IBM Confidential May 12th 2011 Customer requirements for HA/DR/BC in 2013 • Shift focus from a failover model to a nearly-continuous availability model (RTO near zero) • Access data from any site (unlimited distance between sites) • No application changes • Multi-sysplex,multi-platform solution – “Recover my business rather than my platform technology” • Ensure successful recovery via automated processes (similar to GDPS technology today). – Can be handled by less-skilled operators • Provide workload distribution between sites (route around failed sites, dynamically select sites based on ability of site to handle additional workload). • Provide application level granularity – Some workloads may require immediate access from every site, other workloads may only need to update other sites every 24 hours (less critical data). – Current solutions employ an all-or-nothing approach (complete disk mirroring, requiring extra network capacity).
  • 20. IBM GDPS active/active • Long distance disaster recovery with only seconds of impact • Continuous availability • Fundamental paradigm shift from a failover model to a near continuous availability model. • Allows for unlimited distance replication with only seconds of user impact if there is a site disaster. • Uses software based replication and techniques for copying the data between sites • Provides control over which workloads are being protected. • GDPS automation provides an end to end automated solution – Helps manage the availability of the workload – Coordination point/ controller for activities including being a focal point for operating and monitoring the solution and readiness for recovery. 20
  • 21. 21 IBM Confidential May 12th 2011 Active/Active concepts New York Zurich Replication Data at geographically dispersed sites are kept in sync via replication Workloads are managed by a client and routed to one of many replicas, depending upon workload weight and latency constraints … extends workload balancing to SYSPLEXs across multiple sites! Workload Distributor Load Balancing with SASP (z/OS Comm Server) Transactions Two or more sites, separated by unlimited distances, running the same applications & having the same data to provide: – Cross-site Workload Balancing – Continuous Availability – Disaster Recovery
  • 22. 22 IBM Confidential May 12th 2011 Active/Active concepts New York Zurich Replication Tivoli Enterprise Portal Monitoring spans the sites and now becomes an essential element of the solution for site health checks, performance tuning, etc. Workload Distributor Load Balancing with SASP (z/OS Comm Server) Transactions Two or more sites, separated by unlimited distances, running the same applications & having the same data to provide: – Cross-site Workload Balancing – Continuous Availability – Disaster Recovery
  • 23. GDPS ACTIVE-ACTIVE REQUIREMENTS Hardware and software supporting SASP 23
  • 24. 24 IBM Confidential May 12th 2011 Conceptual view of GDPS Active-Active Active Production Workload TransactionsTransactions Workload Distribution Standby Production Workload Controllers S/W Replication Control information passed between systems and workload distributor Workload Routing to active sysplex
  • 25. 25 IBM Confidential May 12th 2011 GDPS Active-Active High level architecture GDPS/Active-Active DB2 System z Hardware SA zOS NetView z/OS IMS Replication Lifeline Replication TCP/IPMQ Workload Monitoring
  • 26. 26 IBM Confidential May 12th 2011 GDPS/Active-Active Software Components • Integration of a number of software products – z/OS 1.11 or higher – IBM Multi-site Workload Lifeline v1.1 – IBM Tivoli NetView for z/OS v6.1 – IBM Tivoli Monitoring v6.2.2 FP3 – IBM InfoSphere Replication Server for z/OS v10.1 – IBM InfoSphere IMS Replication for z/OS v10.1 – System Automation for z/OS v3.3 – GDPS/Active-Active v1.1 – Optionally the OMEGAMON suite of monitoring tools to provide additional insight
  • 27. Replication • IBM InfoSphere Replication Server for z/OS v10.1 – Runs on production images where required to capture (active) and apply (standby) data updates for DB2 data. Relies on MQ as the data transport mechanism (QREP). • IBM InfoSphere IMS Replication for z/OS v10.1 – Runs on production images where required to capture (active) and apply (standby) data updates for IMS data. Relies on TCPIP as the data transport mechanism. • System Automation for z/OS v3.3 – Runs on all images. Provides a number of critical functions: • Remote communications capability to enable GDPS to manage sysplexes from outside the sysplex • System Automation infrastructure for workload and server management 27 • Not hardware based mirroring
  • 28. 28 IBM Confidential May 12th 2011 GDPS/Active-Active Hardware components • Two Production Sysplex environments (also referred to as sites) in different locations – One active, one standby – for each defined workload – Software-based replication between the two sysplexes/sites • IMS and DB2 data is supported • Two Controller Systems – Primary/Backup – Typically one in each of the production locations, but there is no requirement that they are co-located in this way • Workload balancing/routing switches – Must be Server/Application State Protocol compliant (SASP) • RFC4678 describes SASP
  • 29. SASP Basics of how it works and how workloads get balanced 29
  • 30. Agenda • Server Application State Protocol (SASP) Overview – Motivation and high level overview of the protocol • Overview of IBM solutions based on SASP – z/OS Sysplex Clusters (z/OS Load Balancing Advisor) – Active/Active - Next generation of IBM’s disaster recovery technology (Multi-site Workload Lifeline product)
  • 31. The ability to distribute work across equal servers... factoring in the availability of server resources, factoring in the business importance of the work, estimating the liklihood of meeting objectives, avoiding over-utilized servers, where possible, factoring in down-stream server dependencies. Traffic Weight recommendations to Load Balancers
  • 32. Server/Application State Protocol (SASP) Objectives • Provide a mechanism for workload managers to give distribution recommendations to load balancers. • Must be lightweight – minimal: – implementation complexity – processing overhead – additional user configuration • Must be extensible • SASP will not handle the transport or actual distribution of work, only give recommendations • Open protocol documented in RFC4678: – http://www.faqs.org/rfcs/rfc4678.html
  • 33. SASP - High Level Architecture Group Workload Manager (GWM) Individual Workload Manager Member Requests Load Balancer Request Origins Individual Workload Manager Member Individual Workload Manager Member Requests SASP • Load Balancer uses SASP to register members that is it is interested in load balancing • Group Workload Manager (GWM) provides for a single point of contact for the entire cluster Cluster
  • 34. Overview of SASP protocol flows SetLBStateRequest/Response LB UUID, WeightServerConnector WeightRegisterRequest/Response Group1: prot,port,IP@1,IP@2,.. Group2: prot,port,IP@1,IP@2,.. GetWeightRequest/Response Group1, Group2, .. SendWeight Group1, weights Group2, weights WeightDeRegisterRequest/Response Group1, Group2, ... LB 3 2 GWM 4 5 6 1 7 Close TCP connection TCP Connect(DVIPAx, port 3860) One time actions Performed on a one time basis or when LB configuration changes Actions performed on periodic interval basis or when state of systems and/or applications change
  • 35. Registering interest in target applications using SASP • Load Balancer registers Groups of clustered servers it is interested in load balancing – Each group designates an application cluster to be load balanced • Each group consists of a list of members (i.e. target servers) – System-level cluster: A list of target Systems identified by IP address (in lieu of individual application servers) • Recommendations returned in this scenario are also at a System-level • No specific target application information returned in this case – Application-level cluster: A list of applications comprising the “load balancing” group • Allows specific recommendations to be provided for each target socket application – vs providing the same recommendation for all application instances running on the same host • Identified by protocol (TCP/UDP), IP address of the target system they reside on, and the port the application is using. • SASP allows for target servers in a load balancing group to use different ports (and even different protocols TCP/UDP) – Probably not applicable for real application workloads – Supports IPv4 and IPv6 • Both for identifying members (target servers) and for the actual communications to the GWM (i.e. SASP connections)
  • 36. Frequency of SASP communications • SASP supports both a "push" and a "pull" model for updating the load balancer with workload recommendations – Load balancer tells GWM which model it wants to use • "Pull" model – GWM "suggests" a polling interval to the load balancer • z/OS Load Balancing Advisor uses the configurable update_interval value for this purposcte – Load balancer has the option to ignore this value • Load balancer requests updates each polling interval • "Push" model – GWM sends updated information to the load balancer on an interval basis • z/OS Load Balancing Advisor uses the configurable update_interval value for this purpose – GWM may send data more frequently than the interval period – Active/Active Multi-site Workload Lifeline product requires “push” to be enabled • Load balancer determines whether it wants information about all members it registered or only changed information about its registered members
  • 37.  SASP Header  Member Data  Weight Entry Basic Protocol Components • Member State Instance • Group Data Protocol Port IP Address Label length Label Member Data Contact Flag Quiesce Flag Registration Flag Weight Version Message Length Message ID 5 Basic Components are used throughout the protocol Member Data Opaque State Quiesce Flag LB UID Length LB UID Group Name Length Group Name
  • 38.  Group of Member Data Group Protocol Components • Group of Weight Data • Group of Member State Data Group Data Weight Entry Count Array of Weight Entry Components 3 Group Components are used throughout the protocol Group Data Resource State Instance Count Array of Resource State Instances Group Data Member Data Count Array of Member Data Components
  • 40. z/OS Load Balancing Advisor z/OS Load Balancing Advisor (GWM) Request Origins Request Origins z/OS LB Agent Resource Work Requests Load Balancer Request Origins z/OS LB Agent Resource z/OS LB Agent Resource SASP Private Communication Protocol Work Requests z/OS Sysplex •z/OS WLM Workload Balancing •Support for clustered z/OS servers in a sysplex environment •Based on sysplex-wide WLM policy
  • 41. How does the z/OS LBA calculate weights? • The weights are composed of several components: – Available CPU capacity for each target system – Displaceable capacity for each target application and system • For systems with high CPU utilization, what is the relative importance of the workload being load balanced to other workloads on that system – If a lot of lower important work is running on that system, it can be displaced by this higher priority workload – Is the target application server meeting user specified WLM (Workload Manager) goals for the workload? • Health of the application – What if CPU capacity exists but the application infrastructure is experiencing other constraints and abnormal conditions? – When these conditions are detected, the weights for the “unhealthy” target application instance is reduced appropriate – Health conditions monitored: • Monitoring of TCP backlog queue. What is the application is falling behind in accepting new connections? • Application reported health – APIs are available on the platform that allow applications to report any abnormal conditions that result in sub optimal workload processing – memory constraints, thread constraints, resources unavailable – The next tier server or resource manager is not available (e.g. Database server)
  • 42. z/OS LBA Solution Overview TCP/IP Target Applications LB Agent LB Advisor WLM TCP/IP Target Applications LB Agent WLM TCP/IP Target Applications LB Agent WLM CISCO Load Balancer Load Balancer SASP z/OS Sysplex Work requests (TCP, UDP) DVIPAxIPx IPy IPz SASP Flows (TCP) Load Balanced Traffic (TCP/UDP) LB Agent to LB Advisor Flows (TCP) IPx, IPy, IPz • Target System IP addresses • Static VIPAs recommended •Provide redundancy in case of network adapter failures DVIPAx • Application specific DVIPA (associated with LB Advisor) • Allows for automatic movement in failure scenarios IPa VIPa
  • 43. Active/Active Sites use case – Multi-Site Workload Lifeline Primary Controller GDPS software Workload Manager (Multi-Site Workload Lifeline) sys_a sys_b Sysplex 1 Application/database tierSite 1 1st -Tier LB Sysplex 2 Application/database tier Site 2 2nd -Tier LB Server Applications Server Applications1st -Tier LBs 2nd -Tier LBs Data Replication sys_c sys_d Server Applications Server Applications Data Replication 2nd -Tier LB 2nd -Tier LBs Secondary Controller GDPS software Workload Manager (Multi-Site Workload Lifeline) SASP-enabled load balancers Current vendors (Cisco, F5 Networks, Citrix) SASP SASP
  • 44. IBM zEnterprise System Overview z/OS z/OS z/OS z/OS z/VM LinuxonSystemz LinuxonSystemz LinuxonSystemz PR/SM Z CPU, Memory and IO Support Element (SE) x/Linux xHypervisor x/Linux x/Linux x/Linux xHypervisor x/Linux x/Linux LinuxonSystemxxHypervisor Windows(1) LinuxonSystemx x/Linux xHypervisor x/Linux x/Linux x/Linux xHypervisor x/Linux x/Linux AIX pHypervisor AIX AIX DataPower Future Future X86 Blades Power Blades Optimizers Blade Center Advanced Management Module (AMM) zBX zEnterprise Node SystemzHardwareManagementConsole WithUnifiedResourceManager z196 Connecting the pieces with zManager (aka. Unified Resource Manager)!
  • 45. Future • Software Defined Networking (SDN) for SASP Proibida cópia ou divulgação sem permissão escrita do CMG Brasil.

Hinweis der Redaktion

  1. Business continuity is no longer simply IT Disaster Recovery. Business continuity has evolved into a management process that relies on each component in the business chain to sustain operation at all times. Effective business continuity depends on the ability to accomplish five things. First, the risk of business interruption must be reduced. Second, when an interruption does occur, a business must be able to stay in business. Third, businesses that want to stay in business must be able to respond to customers. Fourth, as described earlier, businesses need to maintain the confidence of the public. Finally, businesses must comply with requirements such as audits, insurance, health/safety, and regulatory/legislative requirements. In some nations, government legislation and regulations lay down very specific rules for how organizations must handle its business processes and data. Some examples are the Basell II rules for the European banking sector and the United States’ Sarbanes-Oxley Act. These both stipulate that banks must have a resilient back office infrastructure by this year (2007). Another example is the Health Insurance Portability and Accountability Act (HIPAA) in the United States. This legislation determines how the U.S. health care industry must account for and handle patient related data. This ever increasing need for “365x24x7xforever” availability really means that many businesses are now looking for a greater level of availability covering a wider range of events and scenarios beyond the ability to recover from a disaster. This broader requirement is called IT resilience. As stated earlier, IBM has developed a definition for IT Resilience: the ability to rapidly adapt and respond to any internal or external opportunity, threat, disruption, or demand and continue business operations without significant impact.” [Add Presentation Title: Insert tab &gt; Header &amp; Footer &gt; Notes and Handouts] 08/16/13 Page © 2011 Brocade Communications Systems, Inc. CONFIDENTIAL—For Internal Use Only
  2. . Two familiar terms that need to be at the forefront of the discussion are RTO and RPO   Recovery Time Objective (RTO). RTO traditionally refers to the question “How long can you afford to be without your systems?” In other words, how long can your business afford to wait for IT services to be resumed following a disaster? How much time is available to recover the applications and have all critical operations up and running again? [Add Presentation Title: Insert tab &gt; Header &amp; Footer &gt; Notes and Handouts] 08/16/13 Page © 2011 Brocade Communications Systems, Inc. CONFIDENTIAL—For Internal Use Only
  3. Recovery Point Objective (RPO). RPO is how much data your company is willing to have to recreate following a disaster. How much data can be lost? What is the acceptable time difference between the data in your production system and the data at the recover site, i.e., what is the actual point-in-time recovery point at which all data is current? If you have an RPO less than 24 hours, expect to be able to do some form of onsite real time mirroring. If your DR plan is dependent upon daily full volume dumps you probably have an RPO of 24 hours or more.   Some other related terms that have evolved in the past 6 years include degraded operations objective (DOO) which answers the question “what will be the impact on operations with fewer data centers?” Network Recovery Objective (NRO) refers to how long it takes to switch over the network. Recovery distance objective (RDO) refers to how far away the copies of data need to be located. The remainder of this paper will focus on RTO and RPO. [Add Presentation Title: Insert tab &gt; Header &amp; Footer &gt; Notes and Handouts] 08/16/13 Page © 2011 Brocade Communications Systems, Inc. CONFIDENTIAL—For Internal Use Only
  4. Rather than talking about HA, lets talk a little bit about Disaster Recovery because the two should not be necessarily mixed in the same mode. Back in the 1964 timeframe and I decided not to do a timeline here because this chart really detects most of the continuum that would have been moving along, we have the physical trucked access method. In other words we have tapes we wrote data to tapes and we sent them offsite if we were really clever and then we either recalled them from sites to our facilities or we used a DR services such as that from IBM in order to restore our service. And then over a period of time things improved and we’ve now reached the point where we have capabilities such as GDPS/XRC and GBS Global Mirror where we extend DR to effectively unlimited distance and still achieve a very recovery point objective and reasonable aggressive recovery time objective.
  5. There are several factors involved in determining your RTO and RPO requirements. Organizations need to consider the cost of some data loss while still maintaining cross-subsystem/cross-volume data consistency. Maintaining data consistency enables the ability to perform a data base restart which typically has a duration of seconds to minutes. This cost needs to be weighed versus the cost of no data loss which will either a) impact production on all operational errors in addition to disaster recovery failure situations or yield a data base recovery disaster (typically hours to days in situation) as cross-subsystem/cross-volume data consistency is not maintained during the failing period. The real solution that will be chosen is based on a particular cost curve slope: if I spend a little more, how much faster is disaster recovery? If I spend a little less, how much slower is disaster recovery?   In other words, the cost of your business continuity solution is realized by balancing the equation of how quickly you need to recover your organization’s data versus how much will it cost the company in terms of lost revenue due to being unable to continue business operations. The shorter the time period decided on to recover the data to continue business operations, the higher the costs. It should be obvious that the longer a company is down and unable to process transactions, the more expensive the outage is going to be for the company, and if the outage is long enough, survival of the company is doubtful. Figure 2 below takes us back to some basic economics cost curves. Much like deciding on the price of widgets, and the optimal quantity to produce, the optimal solution is the intersection point of the 2 cost curves. [Add Presentation Title: Insert tab &gt; Header &amp; Footer &gt; Notes and Handouts] 08/16/13 Page © 2011 Brocade Communications Systems, Inc. CONFIDENTIAL—For Internal Use Only
  6. IBM announced and started delivering the mainframe back in 1964. IBM started off with a single system image and through the business requirements we developed a very reliable robust operating system and hardware configuration which was pretty much bulletproof. But over a period of time IBM further enhanced that by introducing multiple partitions and then later on in 1990 timeframe came along with LPAR technology which has virtual multiple partitions should we say, physical partitions to enable the sharing of the hardware more effectively. And then that continued into 1995 with the introduction of what we know today as Parallel Sysplex. The key thing here is Parallel Sysplex is a cornerstone of the technology for delivering HA for applications on the mainframe. We should not forget that we should not overlook that and certainly what we’re talking about today doesn’t replace any of the techniques that are available for providing HA on a local basis within a Parallel Sysplex.
  7. In the 1998 timeframe IBM announced GDPS/PPRC. Now GDPS/PPRC is still with us today but again what we have today in the marketplace and the functionality in GDPS/PPRC is somewhat different to that which was first available in 1998. IBM now has a full suite of capabilities within the mainframe environment at the top of which are GDPS/PPRC, HyperSwap enabled, continuous or near continuous availability configurations. Now there is one limitation which we’ll talk about a little bit and that is at longer distances we’re unable to sustain or workloads are typically unable to sustain the synchronize replication and/or coupling facility duplexing is required in order to achieve these continuous or near continuous configurations.
  8. GDPS/PPRC, based upon a multi-site Parallel Sysplex and synchronous disk replication, is a metro area Continuous Availability (CA) and Disaster Recovery solution (DR) GDPS/PPRC supports two configurations: Active/standby or single site workload Active/active or multi-site workload Some customers have deployed GDPS/PPRC active/active configurations: All critical data must be PPRCed and HyperSwap enabled All critical CF structures must be duplexed Applications must be parallel sysplex enabled Signal latency will impact OLTP thru-put and batch duration resulting in the sites being separated by no more than a couple tens of KM (fiber) Issue: the GDPS/PPRC active/active configuration does not provide enough site separation for some enterprises Here’s a brief synopsis on what customers are doing today. There are two configurations for GDPS/PPRC and this is where some of the confusion starts to come. We have an active/standby configuration or a single site workload this is not to be confused with active/standby configuration for GDPS active/active which we will go on to talk about. And we also have what’s called active/active configuration for GDPS/PPRC which is also known as a multi-site workload. So this is where we have a cross site Parallel Sysplex with data sharing between the two sites. Now some clients have indeed implemented these but the site separation that can be achieved using GDPS/PPRC technology is not enough for some clients who are demanding ever greater capabilities in their HA and DR capability.
  9. GDPS/XRC and GDPS/GM, based upon asynchronous disk replication, are unlimited distance DR solutions The current GDPS async replication products require the failed site’s workload to be restarted in the recovery site and this typically will take 30-60 min Power fail consistency Transaction consistency There are no identified extensions to the existing GDPS async replication products that will allow the RTO to be substantially reduced. Issue: GDPS/XRC and GDPS/GM will not achieve an RTO of seconds being requested by some enterprises GDPS/XRC and GDPS/GM based upon asynchronous disk mirroring providing failover, recovery capability in a remote site typically in the order of 3-60 minutes recover time. And the recovery point of a few seconds to a few minutes depending on bandwidth and a number of other considerations. However this 30-60 minutes is not considered good enough by some enterprises and their looking at a recovery time objective measured more in smaller numbers of seconds in order to achieve what they want for their business.
  10. Continuous Availability of Data within a Data Center: Single Data Center Applications remain active Continuous access to data in the event of a storage subsystem outage GDPS/HyperSwap Mgr RPO=0 &amp; RTO=0 Continuous Availability / Disaster Recovery within a Metropolitan Region: Two Data Centers Systems remain active Multi-site workloads can withstand site and/or storage failures GDPS/PPRC RPO=0 &amp; RTO&lt;1 hr Disaster Recovery at Extended Distance: Two Data Centers Rapid Systems Disaster Recovery with “seconds” of Data Loss Disaster recovery for out of region interruptions GDPS/GM &amp; GDPS/XRC RPO secs &amp; RTO &lt;1 hr Continuous Availability Regionally and Disaster Recovery Extended Distance: Three Data Centers High availability for site disasters Disaster recovery for regional disasters GDPS/MGM &amp; GDPS/MzGM
  11. Interagency Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System [Docket No. R-1128] (April 7, 2003) Focus on mission critical workloads, their recovery and resumption of normal processing Cost of an outage Financial Reputation Global Business Model 24x7 processing Planned outage avoidance This slide gives a summary of some of the drivers for HA and DR within our clients. The much discussed Interagency Paper on Sound Practices is aimed at the financial sector primarily but there’s not only financial sector that are having to do, certainly in health care and other critical businesses that are having to look at how they can provide a much higher availability and better recoverability of their environments. Clearly costs does enter into it and not only is it the financial cost but its also the reputational cost. Their clients are always concerned what happens when they have an outage. And also the pressure of global business models is ever increasing and therefore the opportunity to take planned outages is dramatically reducing around the globe and anything that can help to continue the business while taking planned outages in part of their environment is highly sought after.
  12. Want to shift focus from a failover model to a nearly-continuous availability model (RTO near zero) Access data from any site (unlimited distance between sites) No application changes Multi-sysplex, multi-platform solution “ Recover my business rather than my platform technology” Ensure successful recovery via automated processes (similar to GDPS technology today). Can be handled by less-skilled operators Provide workload distribution between sites (route around failed sites, dynamically select sites based on ability of site to handle additional workload). Provide application level granularity Some workloads may require immediate access from every site, other workloads may only need to update other sites every 24 hours (less critical data). Current solutions employ an all-or-nothing approach (complete disk mirroring, requiring extra network capacity). A number of customer requirements have been brought forward and there has been a customer council working on some of this Active-Active sites concept and these requirements have been validated through that design council. And there is a desire to move from a failover model to a near continuous availability model where this recovery time is dramatically reduced from what is achievable today. In addition not only do clients demand a shorter recovery time objective, they are also looking at unlimited distance between sites, when the Interagency Paper first came out after 911 there was fairly unachievable statements in it and some of those requirements are now being translated into technology or solutions that are addressing those requirements. Most of the rest you can read for yourself. One that isn’t on your set of charts which was highlighted to me is not only are clients looking to automate and have workload distribution between the two sites but also there is a number of clients who actually developed technology of their own to address some of these requirements and they are looking where possible to replace that grow your own technology in order to reduce overall cost of ownership and looking to IBM to provide the answer.
  13. Two or more sites, separated by unlimited distances, running the same applications &amp; having the same data to provide: Cross-site Workload Balancing Continuous Availability Disaster Recovery Workloads are managed by a client and routed to one of many replicas, depending upon workload weight and latency constraints … extends workload balancing to SYSPLEXs across multiple sites! Data at geographically dispersed sites are kept in sync via replication Here we have a view of the Active/Active concepts Separated by any distance, running the same applications and having access to the same data, these are very important points that they are running and they have access to the same the same data, not the same copy of the data however. And we have cross-site workload balancing delivering potential for continuous availability. There is replication between the two sites and this is a synchronized replication hence the ability to do this unlimited distance and it is software base replication that we’re talking about. And there is a workload distribution mechanism with low balancing via Server Application State Protocol (SASP) through the use of products or through the use of con-server and a new component being announced as the same time as the GDPS solution to provide the workload balancing between the two sites that effectively gives a view of SYSPLEXs almost extending the SYSPLEX across sites is the desire.
  14. Two or more sites, separated by unlimited distances, running the same applications &amp; having the same data to provide: Cross-site Workload Balancing Continuous Availability Disaster Recovery Monitoring spans the sites and now becomes an essential element of the solution for site health checks, performance tuning, etc. And taking one look forward onto slide 14, clearly its key to have a good eye on what’s going on in this environment and through the use of IBM Tivoli Monitoring and the power of NetView on the mainframe we’re able to provide detailed analysis and monitoring looking the state of the replication environment, the latency between the sites and use some of these metrics to feed the workload distribution mechanism.
  15. I would describe this as a conceptual view of GDPS Active-Active. So starting at the top we have a workload which is comprised of transactions which arrive through the network to some workload distribution point. And again we’ll go into this in more detail on following charts. The workload distribution has algorithms which say at this moment in time I am going to route those transactions to the active SYSPLEX, now that can be two tiers of routing which is the primary tier because within the SYSPLEX itself there can be a secondary tier of routing and it will distribute those transactions across the data sharing parallel SYSPLEX. So we have active production workload running in our active production SYSPLEX for this workload and we’ll come onto that in a little bit longer. And we have software replication in place sending the updates from the active site to the standby site where they are applied to a running copy of the database in near real-time. And the recovery point is going to be dependent somewhat on the bandwidth and the rate of transactions and so on, however its typically in a small number of seconds typically similar to those that we would see with GDPS XRC and global mirror. Now the same time that we have the workload distribution mechanism sending the transactions to what is currently the active site for this workload we have control functions which is sat in what we’ll terms GDPS controllers; its more than just GDPS running on there we’ll see that shortly which are receiving and sending control information to the systems in the environment. So they will receive health information from the active and standby sites and they will send command effectively to the workload distribution mechanisms if there is a need to change the current routing decisions that are being made.
  16. This slide illustrates the very high level architecture of the software components and the underlying hardware. There are functions being exploited in the hardware, in the operating system, within the monitoring environment in order to alert on situations and enable decisions to be taken based on the current of state resources. We also have the database subsystems and their replications technologies and we have this new component of the lifeline which is providing key information about the state of the application within the workloads as we’ll see in a few charts. And across the top we have GDPS/Active-Active.
  17. First there is a pre-requisite of z/OS 1.11 or higher in the production and in the controller systems. There’s a new IBM product coming out at the same time as GDPS called IBM Multi-site Workload Lifeline and the Workload Lifeline has a pivotal role in this solution in that it is the workload lifeline that is communicating with or issuing the commands or told to issue the commands by the GDPS potentially but is issuing the commands to the workload routing mechanism. There is a new version of NetView also announcing at the same time which is required. IBM Tivoli Monitoring v6.2.2 with FP3 is also required products as part of the solution. Then depending on your data you either need IBM InfoSphere Replication Server for z/OS which is the DB2 although DB2 doesn’t appear in its name or IBM InfoSphere IMS Replication for z/OS those levels for the software replication peeks. InfoSphere Replication services z/OS which is for the DB2 data does have a pre-requisite of MQ as well and has previously been also known as QREP. System automation v3.3 is already available; there is some functional PDS required but no new version there. And then there is GDPS/Active-Active which is the control software to orchestrate and manage the entire environment. Optionally you can also choose to use the OMEGAMON suite of monitoring tools which will feed into the IBM Tivoli Monitoring solution and give richer information to some of the situations that can be displayed and automated and used through the monitoring and alerting capabilities.
  18. {DESCRIPTION} IBM InfoSphere Replication Server for z/OS v10.1 Runs on production images where required to capture (active) and apply (standby) data updates for DB2 data. Relies on MQ as the data transport mechanism (QREP). IBM InfoSphere IMS Replication for z/OS v10.1 Runs on production images where required to capture (active) and apply (standby) data updates for IMS data. Relies on TCPIP as the data transport mechanism. System Automation for z/OS v3.3 Runs on all images. Provides a number of critical functions: BCPii Remote communications capability to enable GDPS to manage sysplexes from outside the sysplex System Automation infrastructure for workload and server management {TRANSCRIPT} Slide 22 covers a little bit of information on the functions for the replication servers. So clearly it runs only on the production images, it has no need to have the replications that are running on the GDPS controllers and it will capture transactions on the Active site and apply those data packets up to the running instance in the standby site. And as I said previously the DB2 replications server is also known or has been known previously as QREP and relies on MQ as the data transport mechanism. Our next replication does not use MQ, effectively it works in a similar manner but uses TCPIP as the data transport. System automation as with other GDPS we are using it for the BCP control interface to the hardware to enable us to carry out hardware based functions and is also providing and this is one of the new pieces the ability to have remote communications from a GDPS image that is external to the SYSPLEX. In GDPS PPRC, when we’re controlling systems and performing actions we’re often using SYSPLEX communications between images where here we’re now using an outboard mechanism from the SYSPLEX and using system automation functionality as the bridge between the systems. Its also worth pointing out the clearly system automation is being used for the automation infrastructure for workload and server manager, however and you’ll see this on another couple charts, it is possible to if your client does not have IBM system automation to use another automation product for their system automation functions. But as with GDPS PPRC and other flavors of GDPS system automation is still required on each image in the environment.
  19. Two Production Sysplex environments (also referred to as sites) in different locations One active, one standby – for each defined workload Software-based replication between the two sysplexes/sites IMS and DB2 data is supported Two Controller Systems Primary/Backup Typically one in each of the production locations, but there is no requirement that they are co-located in this way Workload balancing/routing switches Must be Server/Application State Protocol compliant (SASP) RFC4678 describes SASP What comprises the GDPS Active-Active environment? So we have two production SYSPLEX which are also known as sites which are typically in different locations. Clearly for test environments that could be co-located in the same site but for product environments it is considered that they will be geographically separated by significant distance potentially. So you have production SYSPLEX in one site and a production SYSPLEX in the other site, one is active and one is standby for each workload. If you have multiple workloads you could make your environment very complex by having some active in one site and some active in another. And that may make sense for your client or it maybe more sensible at least in the furthest instances to have all your active workloads in one site and your standby site ready to receive or any of those workloads should be in need. They have software replication between the two sites. We do have as part of the environment two controller systems, one is a primary, one is a backup and there are a number of control functions running on these systems which we’ll talk about in the coming slides. Typically there will be one in each of the production locations but there is no requirement that they are co-located with production. They are standalone systems running in their own And in addition to the production Sysplex and the controllers as we saw on the previous diagram we have this workload balancing functions which must be server, application state protocol compliant. There is an RFC the number on this chart which is on the chart which describes this capability.
  20. Emphasizing the output from the planning poker meeting. It’s more than just the sizing.