Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Â
IBM Cloud Integration Platform High Availability - Integration Tech Conference
1. Integration Technical Conference 2019
E03: High Availability
for Cloud Integration
Platform
Rob Nicholson
rob_nicholson@uk.ibm.com
2. Integration Technical Conference 2019 2
Cloud Integration Platform High Availability
Understanding High Availability for the Cloud Integration Platform involves first understanding the overall architecture and the
High availability considerations for the container platform. Based on this we then explore HA for each Integration service.
Agenda:
⢠Overview of Cloud Integration Platform Architecture.
⢠Kubernetes and IBM Cloud Private High Availability Concerns.
⢠Deployment considerations
⢠MQ High Availability
⢠Event Streams High Availability
⢠API Connect High Availability
⢠App Connect High Availability
⢠Aspera High Availability
3. 3IntegrationTechnical Conference 2019
Cloud Integration
PlatformArchitecture
ICP Services
IAM Logging
Monitoring Metering
Integration Services
MQ
Event Streams
App Connect DataPower
API Connect Aspera
Cloud Integration Platform Management UI
ICP Services UI Components Integration Services UI Components
Kubernetes
Master/Proxy
Nodes
Management
Nodes
Worker
Nodes
Worker
Nodes
Worker
Nodes
⢠Cloud Integration Platform (CIP) deploys and manages
instances of Integration Services running on a Kubernetes
Infrastructure
⢠Instances of Integration services are deployed individually as
needed to satisfy each use-case.
⢠Services can be deployed in Highly Available topologies
across multiple Kubernetes worker nodes or non HA on a
single worker.
⢠Deployment and management is via the UI or CLI allowing
integration with CI/CD pipelines.
⢠CIP Leverages the IBM Cloud Private (ICP) services which run
on dedicated master/proxy and Management nodes in HA or
non HA configurations.
⢠The Management UI unifies the management UIs of the
Integration Services and the ICP services.
4. Integration Technical Conference 2019 4
Container Platform Availability
⢠Cloud Integration Platform is based on ICP Foundation.
⢠ICP Foundation is functionally identical to ICP Cloud Native edition. The only difference is licensing.
⢠All HA, sizing and deployment information provided for ICP Cloud Native edition applies to Cloud Integration
Platform.
⢠This presentation provides a summary. For more detail consult ICP Cloud Native edition documentation, articles
and education. See the links page at the end of this presentation.
5. 5IntegrationTechnical Conference 2019
Node Types
⢠A kubernetes Node is a VM or bare metal machine which is part
of a Kubernetes cluster.
⢠Master Nodes run the services that control the cluster including
the etcd database that stores the current state of the cluster.
⢠Proxy Nodes transmit external requests to the services created
inside your cluster.
⢠Management Nodes host management services such as
monitoring, metering, and logging.
⢠Worker Nodes run Integration Services. If Cloud Integration
Platform is running in a cluster with other workloads then these
also run on the worker nodes.
⢠Master, Proxy and Management node types can be combined
together but it is recommended in production clusters to keep
them separate so that excessive workload does not impact
stability.
⢠The minimum theoretical configuration is therefore one
Master/Proxy/Management node and one Worker node. This
would not be Highly available and is unlikely to have enough CPU
to be usable for more than limited demos.
Note: There are other more obscure node types such as dedicated etcd nodes and vulnerability
advisor nodes but these are beyond the scope of this presentation
Cloud Integration Platform Cluster
Node types
Master
Worker
Proxy
Mngmt
Node
Cloud Integration Platform
Theoretical minimum cluster
Master/
Proxy/
Mngmt
Worker
6. 6IntegrationTechnical Conference 2019
Container Platform Availability
⢠To make the solution fully Highly Available, each component
must be deployed in HA topology.
⢠Master Nodes contain software that uses a quorum* paradigm
for high availability and so these must be deployed as an odd
number of nodes. Typically either 3 or 5 masters are used in a HA
cluster depending on the size of the cluster and the type of load.
⢠Proxy Nodes do not require a quorum so 2 or more Proxy nodes
are needed for HA
⢠Management Nodes do not require a quorum so 2 or more Proxy
nodes are needed for HA
⢠Worker Nodes run Integration Services. Depending on the
integration services required 2 or more worker nodes may be
needed for HA. (More detail in subsequent slides)
⢠As previously noted of course Master/Proxy/Management nodes
can be combined so a minimal configuration could be 3
Master/Proxy/Management nodes.
⢠A topology that is often used is: 3 Master/Proxies + 2
Management + 3 Workers.
Master Master Master
Proxy Proxy
Mngmt
Node
Mngmt
Node
Worker Worker
Cloud Integration Platform HA Cluster
Master/
Proxy/
Mngmt
Cloud Integration Platform minimal HA
Cluster
Master/
Proxy/
Mngmt
Master/
Proxy/
Mngmt
Worker Worker
7. Taken from ICP Think 2019 session.
7Think 2019 / 5964A / Feb 15, 2019 / Š 2019 IBM Corporation
Machine Role Number vCPU (>= 2.4 GHz) Memory Disk Space Comment
Master 3 16 32GB 500GB 3 for HA
Management 2 16 32GB 500GB 2 for HA
Proxy 2 4 16GB 400GB 2 for HA
Vulnerability Advisor 1 8 32GB 500GB Optional (none-HA)
Worker Nodes 2-50 8 32GB 400GB
A typical production environment
http://ibm.biz/icpcapacityplan
8. Integration Technical Conference 2019 8
Deployment Considerations
⢠How many independent clusters do you need?
⢠Enterprises deploy multiple clusters for any/all of the following reasons:
⢠Geographic availability â Ability to survive a regional outage
⢠To separate organizations
⢠To separate development, test and production.
⢠To guard against errors by individual operators.
⢠Are the Kubernetes nodes deployed into separate failure domains?
⢠Separate Availability zones in a public cloud.
⢠Separate physical servers/racks in a data centre
⢠What are the common points of failure?
⢠See https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/#failure-domainbetakubernetesiozon
9. Example â HA Cluster in AWS
Leverage available zone
â Master/mgmt/va across available
zone
â User application across available
zone
AWS ALB/NLB
â Load balancer for management
control plane
â Load balancer for user application
â Security group to control network
access
EBS as persistent storage
May 28, 2019 ICP Solutioning Guide 101 | IBM Confidential | IBM Cloud Solutioning Centers
Think 2019 / 5964A / Feb 15, 2019 / Š 2019 IBM Corporation
Article: http://ibm.biz/icponaws
Documentation:
https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/supported_environments/aws/overview.html
links to a quickstart: https://aws.amazon.com/quickstart/architecture/ibm-cloud-private/
10. 10IntegrationTechnical Conference 2019
Public Cloud Deployment
⢠ICP 3.1.1 is supported on IBM Cloud and Amazon.
⢠Current CIP is based on ICP 3.1.1
⢠ICP 3.1.2 adds support for azure Azure
⢠On public clouds it is recommended to use the cloud storage
solution.
⢠IBM Block Storage
⢠Amazon EBS
⢠Distribute cluster across 3 Availability zones with
less than 30ms latency.
⢠Multiple AZs in single region. Not multiple regions.
11. Integration Technical Conference 2019 11
High Availability for Integration services.
Integration Services run on the worker nodes.
The Cloud Integration SolutionPak is composed from the CloudPaks making from the component
products.
IBM cloudpaks are developed by the product development teams to embody best practice for deploying
the product onto kubernetes as a secure, scalable Highly Available deployment.
Thus, at a high level, all that has to be ensured to provide a highly available deployment is:
⢠There are sufficient nodes for the solution*
⢠The nodes are deployed into separate failure domains so that a single failure will not take out
multiple nodes. **
Kubernetes takes care of:
⢠Running appropriate numbers of instances of the solutionpak, spread across the available workers.
⢠Mixing together the workloads on the available worker nodes taking account the resources they
need.
⢠Restarting workloads that fail and recovering from failed nodes by scheduling workloads onto
alternative nodes.
* Some of the component products require 2 worker nodes for active/standby syle HA and others require 3 or more workers for
quorum style deployment.
** In larger clusters the nodes should be spread between 3 ore more availability zones.
Master Master Master
Proxy Proxy
Mngmt
Node
Mngmt
Node
Worker Worker
Cloud Integration Platform HA Cluster
Worker
12. 12IntegrationTechnical Conference 2019
Myth Busting
Master Master Master
Proxy Proxy
Mngmt
Node
Mngmt
Node
Worker Worker
Cloud Integration Platform HA Cluster
Worker
⢠MYTH 1: I need separate worker nodes for each
Integration Service
⢠MYTH 2: I cannot run multiple Solutionpaks on the
same nodes as âCloudNativeâ workloads.
⢠REALITY: Kubernetes will schedule pods across the
available nodes based on its scheduling policies and
the declared resource requirements. This will almost
certainly mix workloads together on nodes unless you
explicitly prevent it.
⢠Its possible to use taints and annotations to control
which nodes a workload is scheduled to. (Advanced
topic not covered here)
13. Integration Technical Conference 2019 13
Summary - High Availability for Integration.
Product Approach(es) to HA Minimum number of worker nodes in the
the cluster
MQ Data Availability â Failover
Service availability â Active/Active
2
Event Streams Quorum 3
APIC Quorum 3
App Connect Stateless â Active/active
Stateful â Failover
2
Aspera Quorum 3
Datapower Quorum 3
At a high level. All you need to do is deploy the cluster in a HA configuration with at least 3 workers.
Deploy the MQ services from the helm charts and they will be highly available by default.
14. Integration Technical Conference 2019 14
Cloud Integration Platform High Availability
Understanding High Availability for the Cloud Integration Platform involves first understanding the overall architecture and the
High availability considerations for the container platform. Based on this we then explore HA for each Integration service.
Agenda:
⢠Overview of Cloud Integration Platform Architecture.
⢠Kubernetes and IBM Cloud Private High Availability Concerns.
⢠Deployment considerations
⢠MQ High Availability
⢠Event Streams High Availability
⢠API Connect High Availability
⢠App Connect High Availability
⢠Aspera High Availability
15. Integration Technical Conference 2019 15
IBM MQ High Availability Concepts
Availability of the âMQ Serviceâ.
⢠The âMQ Serviceâ is available when applications can connect to an MQ queue and send and
receive messages.
⢠The MQ service can be made highly available by running multiple queue managers in
active/active configurations.
Availability of individual messages.
⢠An individual message that has been sent to an MQ Queue is Available when a client can receive
that particular message.
⢠The data can be made highly available by minimizing the time when the actual queue manager
holding the message is not available.
16. IBM MQ High Availability Options on Distributed Servers
HA
Storage
Local
Storage
Local
Storage
Local
Storage
Single resilient queue
manager
Multi-instance queue
manager
Replicated data
queue manager
HA
Storage
MQ MQ
MQ
(Stand-by)
MQ
MQ
(Stand-by)
MQ
(Stand-by)
Platform Load Balancing
Platform HA
Client load balancing
IBM MQ Product HA
Client or Server load balancing
IBM MQ Product HA
17. IBM MQ High Availability in Cloud Pak / CIP
HA
Storage
Local
Storage
Local
Storage
Local
Storage
Single resilient queue
manager
Multi-instance queue
manager
Replicated data
queue manager
HA
Storage
MQ MQ
MQ
(Stand-by)
MQ
MQ
(Stand-by)
MQ
(Stand-by)
Platform Load Balancing
Platform HA
Current Approach
Client load balancing
IBM MQ Product HA
Client or Server load balancing
IBM MQ Product HA
Investigating use of
multi-instance within
Kubernetes
Technology not supported
in Kubernetes
18. Container Orchestration
Single Resilient Queue Manager on
Kubernetes
⢠Container restarted by Kubernetes*
⢠Data persisted to external storage
⢠Restarted container connects to existing
storage
⢠IP Gateway routes traffic to the active
instance
⢠Implemented as Kubernetes Stateful set
of 1
⢠Implications: Both the service and the
messages stored in the single queue
manager are unavailable during failover
IP Gateway (from container platform)
HA
Network
Storage
Application
* Total Kubernetes node failure considerations described in
https://developer.ibm.com/messaging/2018/05/02/availability-scalability-ibm-mq-containers/
MQ
19. Improving the MQ Service availability
Hybrid Cloud / March 2018 / Š 2018 IBM Corporation
Connection Routing
Application
Connection Routing
Application
Single resilient container Multiple resilient containers
ďź Provide access to
store messages,
even in failover
situations.
Connection Routing
provided to
distribute the traffic
across the available
instances.
ďź Applications
retrieving messages
attach to the
individual Queue
Manager.
Connection Routing
provided to route
traffic to container
location.
Application Application Application
MQ
MQ MQ
Connection Routing Connection Routing Connection Routing
ApplicationApplicationApplicationApplication
MQ GET
MQ PUT MQ PUT MQ PUT MQ PUT MQ PUT
MQ GET MQ GET MQ GET
20. Horizontal Scaling
MQMQMQ
Application Application
Connection Routing
Messaging Layer
Application
Network Connection
Container(s)
Connection Routing
ďźď Scale the MQ
instances
based on
workload
requirements
Application ApplicationApplication
Connection Routing
Application
Connection Routing
Application
Connection Routing
Application
MQ PUT MQ PUT MQ PUT MQ PUT MQ PUT
MQ GET MQ GET MQ GET
21. Connection Routing
Hybrid Cloud / March 2018 / Š 2018 IBM Corporation
Static Routing Client Connection
Definition Table
Load Balancer
Client embeds endpoints
Performance impact when
primary unavailable
Brittle configuration
No load balancing
Client references endpoints
Enhanced Workload
Management Strategies
Central configuration
management
Client references endpoints
Enhanced Workload
Management Strategies
Central configuration
management
Not recommended for JMS
Endpoint List
Application
MQ MQ
CCDT
Application
MQ MQ
IP Gateway
Application
MQ MQ
22. Recommended Routing
MQMQMQ
Application Application
IP Gateway
Messaging Layer
Application
Network Connection
Container(s)
Connection Routing
Application ApplicationApplication
IP Gateway
Application
IP Gateway
Application
IP Gateway
Application
MQ PUT MQ PUT MQ PUT MQ PUT MQ PUT
MQ GET MQ GET MQ GET
CCDT CCDT CCDT CCDT CCDT
IP Gateway IP Gateway
23. Integration Technical Conference 2019 23
Cloud Integration Platform High Availability
Understanding High Availability for the Cloud Integration Platform involves first understanding the overall architecture and the
High availability considerations for the container platform. Based on this we then explore HA for each Integration service.
Agenda:
⢠Overview of Cloud Integration Platform Architecture.
⢠Kubernetes and IBM Cloud Private High Availability Concerns.
⢠Deployment considerations
⢠MQ High Availability
⢠Event Streams High Availability
⢠API Connect High Availability
⢠App Connect High Availability
⢠Aspera High Availability
24. 24IntegrationTechnical Conference 2019
Event Streams High
Availability
⢠Event streams deploysApache Kafka in a HA topology by
default.
⢠Chart deploys brokers as a stateful set with 3 members by
default.
⢠Kubernetes will schedule the pods to different nodes.
⢠Kafkaâs architecture is inherently HA. Client connection
protocol handles discovery and failure.
⢠Default forTopics creation is 3 replicas with min in-sync
copies=2
⢠For Multi-AZ deployments number of brokers must
match number of AZs today.
24
Client
Intelligent client discovery and reconnection logic
Client
25. Integration Technical Conference 2019 25
Cloud Integration Platform High Availability
Understanding High Availability for the Cloud Integration Platform involves first understanding the overall architecture and the
High availability considerations for the container platform. Based on this we then explore HA for each Integration service.
Agenda:
⢠Overview of Cloud Integration Platform Architecture.
⢠Kubernetes and IBM Cloud Private High Availability Concerns.
⢠Deployment considerations
⢠MQ High Availability
⢠Event Streams High Availability
⢠API Connect High Availability
⢠App Connect High Availability
⢠Aspera High Availability
26. 26IntegrationTechnical Conference 2019
API Connect High Availability
ICP Cluster
Worker
Node 1
Worker
Node 2
Worker
Node 3
Gateway
Analytics
Portal
Manager
⢠API Connect requires 3Worker nodes for HA.
⢠Chart deploys as HA by default:
ď§ global setting: mode = standard (dev mode is non HA)
ď§ Cassandra cluster size = 3
ď§ Gateway replica count = 3
⢠Kubernetes distributes the resulting resources
across 3 (or more) nodes on the cluster for high
availability.
API Connect whitepaper discusses multi-cluster, multi
data centre and DR topics.
http://ibm.biz/APIC2018paper
27. Integration Technical Conference 2019 27
Cloud Integration Platform High Availability
Understanding High Availability for the Cloud Integration Platform involves first understanding the overall architecture and the
High availability considerations for the container platform. Based on this we then explore HA for each Integration service.
Agenda:
⢠Overview of Cloud Integration Platform Architecture.
⢠Kubernetes and IBM Cloud Private High Availability Concerns.
⢠Deployment considerations
⢠MQ High Availability
⢠Event Streams High Availability
⢠API Connect High Availability
⢠App Connect High Availability
⢠Aspera High Availability
28. 28IntegrationTechnical Conference 2019
App Connect HighAvailability
⢠ACE Control plane is HA (replicaset of 3) by
default.
⢠Integration servers deployed without local MQ
Queue Managers (QM) are stateless.
ď§ Deployed HA (replicaset of 3) by default.
ď§ BAR file is retrieved from control plane at startup.
⢠Integration servers deployed with a local QM are
deployed like MQ.
ď§ Single Resilient Queue Manager (stateful set of 1).
ICP Cluster
Worker
Node 1
Worker
Node 2
Worker
Node 3
ACE Control plane
ACE Stateless servers
ACE Stateful servers
CP CP CP
ACE
Server
ACE
Server
ACE
Server
ACE +
MQ
29. Integration Technical Conference 2019 29
Cloud Integration Platform High Availability
Understanding High Availability for the Cloud Integration Platform involves first understanding the overall architecture and the
High availability considerations for the container platform. Based on this we then explore HA for each Integration service.
Agenda:
⢠Overview of Cloud Integration Platform Architecture.
⢠Kubernetes and IBM Cloud Private High Availability Concerns.
⢠Deployment considerations
⢠MQ High Availability
⢠Event Streams High Availability
⢠API Connect High Availability
⢠App Connect High Availability
⢠Aspera High Availability
30. 30IntegrationTechnical Conference 2019
Aspera HSTS
⢠Aspera HSTS has an architecture that
can be deployed Highly Available.
⢠Aspera HSTS in CIP GA code cannot
be deployed as Highly available
directly from the helm chart.
⢠The deployment can be modified to
make it highly available.
31. 31IntegrationTechnical Conference 2019
Aspera HSTS â Routing HA
Aspera HSTS serves two types of traffic
⢠HTTPS API traffic
⢠This traffic is routed via ICPâs
Ingress Controller, with High
Availability
32. 32IntegrationTechnical Conference 2019
Aspera HSTS â Routing HA
⢠FASP Transfer traffic
⢠This traffic is routed by custom
proxies built by Aspera
⢠These can be scaled horizontally
as a ReplicaSet and are
redundantly available
⢠Proxies co-ordinate with Load
Balancer to find available
transfer pods.
⢠Load Balancer is stateless and
can be as ReplicaSet
34. 34IntegrationTechnical Conference 2019
Aspera HSTS â Application HA
⢠Aspera Transfer Pods are each
dedicated to a single transfer session
⢠Swarm Service manages Transfer
Podsâ auto-scaling as needed and
ensuring free pods are available at all
times
⢠If the Swarm Service dies, no existing
traffic will be impacted, but no new
transfer pods will be scheduled until
it is restarted
⢠Aspera Swarm Service is currently
deployed as a single instance
⢠Swarm Service will be adding leader
election as part of L2 Certification
35. 35IntegrationTechnical Conference 2019
Aspera HSTS â Control Plane
⢠Stats Service consumes transfer
events from Kafka and publishes pod
availability for consumption by
Aspera Load Balancer
⢠Stats Service is currently a single
instance, but will be updated to use
Leader Election as part of L2
Certification
⢠If Stats Service fails, load-balancing
will be based on stale information
until Stats Service restarts
⢠Kafka is provided by IBM Event
Streams, which is an HA service
As a role of thumb memory requirement is 4X vCPU requirements.
For HA, master nodes require a quorum so it should be an odd number while management nodes and proxy nodes do not require a quorum
The workload (application and middleware) sizing determines the total capacity requirement and the number of worker nodes is derived from that
When thinking about the options for IBM MQ high availability on a cloud platform, initially you may consider the following:
Single Resilient Queue Manager: this is where you have a single instance of a queue manager, and the cloud environment monitors it and replaces the VM or container as necessary. From a container point of view this requires persistent storage, which is likely to be HA storage to overcome a host machine failure. Container failover times are typically fast, and comparable to a multi-instance queue manager, while those for other technologies varies. Client connections will be routed to the active instance by an external load balancing mechanism, which removes the need for the client to provide this load balancing. Some examples of these technologies include: Docker, Kuberenetes, VMWare HA / SRM and AWS Autoscaling.
Multi-instance queue manager: This is the âtraditionalâ out of the box, high availability provided by IBM MQ. Multi-instance queue managers are instances of the same queue manager configured on different servers. One instance of the queue manager is defined as the active instance and another instance is defined as the standby instance. If the active instance fails, the multi-instance queue manager starts automatically on the standby server. Shared storage is used for the queue manager data, so either instance is able to access the data. Normally this shared storage will be HA storage to overcome failure situations. A file lock is used on the storage to determine which of the two possible instances should be active. This HA option does not provide any IP or hostname load balancing, and therefore it is the responsibility of the MQ client to understand the two possible locations where the MQ instance could be running.
Replicated data queue manager: This is a relatively recent addition to the MQ Advanced product (introduced in 9.0.4) and is based on technology use in the MQ Appliance. In the simplest of situations it will include three standard RedHat Linux servers, one being active handling requests, with the other two replicating the data, and waiting to become active in a similar logical manner to the standby instance within a Multi-Instance Queue Manager. When comparing to the other options there are three aspects to highlight:
No shared disk: each individual server includes a complete replica of the Queue Manager data, removing the need for a share disk, which simplifies the configuration, and potentially improves the performance of the overall solution.
Floating IP: with Multi-Instance Queue Managers, two servers, with separate IP addresses could be hosting and running the active instance of the Queue Manager. This means that the client needs to be explicitly aware of the two IP addresses. With the new replicated data queue manager a floating IP address is used, and associated with the server that is running the active queue manager. This means that clients only need to be aware of the floating IP address, simplifying the logic on the client.
When thinking about the options for IBM MQ high availability on a cloud platform, initially you may consider the following:
Single Resilient Queue Manager: this is where you have a single instance of a queue manager, and the cloud environment monitors it and replaces the VM or container as necessary. From a container point of view this requires persistent storage, which is likely to be HA storage to overcome a host machine failure. Container failover times are typically fast, and comparable to a multi-instance queue manager, while those for other technologies varies. Client connections will be routed to the active instance by an external load balancing mechanism, which removes the need for the client to provide this load balancing. Some examples of these technologies include: Docker, Kuberenetes, VMWare HA / SRM and AWS Autoscaling.
Multi-instance queue manager: This is the âtraditionalâ out of the box, high availability provided by IBM MQ. Multi-instance queue managers are instances of the same queue manager configured on different servers. One instance of the queue manager is defined as the active instance and another instance is defined as the standby instance. If the active instance fails, the multi-instance queue manager starts automatically on the standby server. Shared storage is used for the queue manager data, so either instance is able to access the data. Normally this shared storage will be HA storage to overcome failure situations. A file lock is used on the storage to determine which of the two possible instances should be active. This HA option does not provide any IP or hostname load balancing, and therefore it is the responsibility of the MQ client to understand the two possible locations where the MQ instance could be running.
Replicated data queue manager: This is a relatively recent addition to the MQ Advanced product (introduced in 9.0.4) and is based on technology use in the MQ Appliance. In the simplest of situations it will include three standard RedHat Linux servers, one being active handling requests, with the other two replicating the data, and waiting to become active in a similar logical manner to the standby instance within a Multi-Instance Queue Manager. When comparing to the other options there are three aspects to highlight:
No shared disk: each individual server includes a complete replica of the Queue Manager data, removing the need for a share disk, which simplifies the configuration, and potentially improves the performance of the overall solution.
Floating IP: with Multi-Instance Queue Managers, two servers, with separate IP addresses could be hosting and running the active instance of the Queue Manager. This means that the client needs to be explicitly aware of the two IP addresses. With the new replicated data queue manager a floating IP address is used, and associated with the server that is running the active queue manager. This means that clients only need to be aware of the floating IP address, simplifying the logic on the client.
In a container environment, a degree of high availability is provided automatically by docker and container orchestration. A container can be automatically restarted in the case of a failure, or if the container is detected to be unhealthy. In this case the container is terminated and destroyed, and a new container created and started, that will logically take over the function of the previous container. The container orchestration platform will provide an IP Gateway to allow the routing of requests to the location of the active container. Any data that the container needs to persist will be stored on shared storage, so any running container can attach (the platform assures that only one container will be running at any one time).
It is important to separately consider the availability to PUT a message, from GETting (retrieving) a message. Many use cases are wanting to assure an application is able to offload work (PUT a message), and therefore has a higher level of availability compared to the ability to GET (retrieve) the message. We call this higher level of availability for offloading work, continuous availability. Here we create at least two instances of MQ which both have queues that can receive an applications message. Routing of inbound connections to store messages is then provided across the available MQ instances. We will discuss later the options for this routing. For applications retrieving messages, these connections will be directly to the MQ instances, instead of using the routing capability across MQ instances. This is due to the need to connect to the MQ instance hosting the individual queue to retrieve messages.
As you may have noticed on the previous slide, we have also completed horizontal scaling of the queue managers, we can continue this scaling with 3, 4, ⌠instances of queue managers to support the level of demand required, using the previous pattern.
In the previous charts we included the term âConnection Routingâ, this provides the ability for the application to have its MQ traffic directed to one of multiple possible network destinations. There are three typical options:
Static Routing - The simplest way of removing the restriction of a connection to a single address is to use a comma-separated list of host names and their ports. This list can be set in connection factories or in the MQSERVER environment variable. This approach can be used simply to find a single queue manager that might be running at one of multiple possible network locations, or distribute traffic across multiple queue managers that offer the same queue. There are several considerations with this technique including:
The further down the list the client must go, the longer the connection takes, so the list should be ordered with the most likely available locations first.
The first available queue manager in the list receives all the connections, not making it a good match for workload balancing. It is possible to provide lists in different orders to different applications to have some level of balancing of connections, but there are better mechanisms.
The configuration is brittle, in that changes need to occur to the individual application configuration if and when servers are moved.
Client Connection Definition Table â The CCDT is a file that determines the connections information used by a client to connect to a Queue Manager. A CCDT file can contain multiple entries, for a single logical connection, allowing it to distribute traffic across a number of queue managers. The CCDT can be configured so that channels in a group are either sequentially tried (for availability), or randomly tried based on specified weightings (for workload balancing). This provides greater flexibility than defining the connection information statically. The CCDT file can also be retrieved from a central location using HTTP or FTP, meaning updates can occur once and picked up by all applications. This central management provides a key advance over the static routing mechanism.
Load Balancer â the load balancer could be provided by the container orchestration engine, or be a standard network load balancer such as an F5. Regardless it can provide TCP load balancing in front of the IBM MQ Queue Managers. This approach removes the application from making the connection selection, and instead will forward onto the load balancer IP and port. Often load balancers have the capability for more feature rich workload management strategies, that may make it more appealing than CCDT. Although there are certain capabilities such as JMS that are not recommended with an external load balancer, so these need to be considered when selecting an effective strategy.