Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Always On (HA + DR) Availability Groups-SQL SERVER By Sunil Kumar Anna

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
AlwaysON Basics
AlwaysON Basics
Wird geladen in …3
×

Hier ansehen

1 von 125 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Always On (HA + DR) Availability Groups-SQL SERVER By Sunil Kumar Anna (20)

Anzeige

Weitere von Sunil Kumar Anna (20)

Aktuellste (20)

Anzeige

Always On (HA + DR) Availability Groups-SQL SERVER By Sunil Kumar Anna

  1. 1. Sunil Kumar Anna Always On (HA + DR) Availability Groups-SQL SERVER
  2. 2. 3 Database mirroring Note:- you can have only one mirroring Database in Mirroring and it is into recovery mode but this recovery in always’s on
  3. 3. High availability: Clustering Clustering is performed at an instance level. It is not designed to protect data but rather to protect the availability of server hardware. Unlike AlwaysOn Availability Groups SQL Server clustering uses shared storage which does expose it to SAN failures. As a result clustering is often used with other disaster recovery technologies such as log shipping. Unlike most other options when failover occurs all SQL Server logins, jobs etc. failover. SQL Server offers many disaster recovery and high availability options. However, before we discuss the different options available it’s worth talking about the differences between disaster recovery and high availability implementations. Disaster recovery can be categorized as the failure of multiple servers often separated by distance. Often implementing a disaster recovery plan is a manual process resulting in a period of downtime. The amount of downtime will generally depend upon the personnel involved and whether the failover process is well documented and practiced. Obviously costs also play their part but a well planned and executed failover does not need to incur significant expenditure. The intention of a disaster recovery plan is to  Retain data  minimize downtime  minimize data loss High availability differs from disaster recovery as it is often an automated process involving fewer servers. Failover often happens within the same data center.  retain service  100% uptime  zero data loss Disaster recovery: Log shipping Performed at a database level and is available in both standard and enterprise editions. This is a tried and trusted disaster recovery implementation and has been an option within SQL Server for many years. It consists of copying logged operations from a primary to a secondary, often offsite location. It consists of three main operations; backup of the transaction log, copy the transaction log backup from source to destination and restore the transaction log at destination. The amount of potential data loss experienced is dependent upon the configuration options chosen.
  4. 4. Always On (HA + DR) you can have multiple mirror DB’s SQL Server 2012 - 4 secondary replicas with up to 2 Synchronous replicas SQL Server 2014 / 2016 - 8 secondary replicas with up to 2 Synchronous replicas Note:- Standard Edition is not possible for always’s on setup till SQL server 2014 SQL Server 2016, you can have up to eight secondary replicas with up to three synchronous replicas. Synchronous replicas are typically used for high availability with automatic failover and they are usually connected on low latency networks. Asynchronous replicas are used for disaster recovery. They support manual failover and they are usually located on high latency networks in separate geographical locations or in the cloud. The SQL Server 2016 Enterprise edition supports up to eight secondary replicas while the Standard edition support two secondary replicas. You can combine AlwaysOn FCI and AlwaysOn AGs for server and database level protection
  5. 5. Install 2 Standalone machines with Windows Server 2012 OS and apply latest sp’s  Recommend to have similar configurations in both nodes including  # of vCPUs  Memory  # of vNICs (2 or more recommended, with one of the vNICs being dedicated to WFSC traffic)  # of vDisks, with matching capacities, SCSI controller properties, and SCSI IDs  All the cluster nodes must be in the same Active Directory Domain Services  Create Failover Clustering using Windows Server 2012 nodes which are participants of cluster setup  Install Failover Clustering and .Net Framework feature Via Server manager in both nodes  create a cluster between 2 nodes via Failover Cluster Manager  ensure proper windows firewall ports are open  Create quorum using Dynamic quorum model option (or) Create a Witness using file share in Witness server** and ensure that all below objects have sufficient permissions to read and write to the share.  Cluster name, Node1,Node2,Administrators  Do Cluster Validation Test before going with SQL Server installation Conditions:-  The cluster creator must have the following accounts and permissions:  Have a domain account in the domain where the cluster will exist.  Have local administrator permissions on each cluster node.  Have Create Computer objects and Read All Properties permissions in AD DS  Consider deploying the file server as a VM. Ensure that the file share is highly available and that the file server VM (hosting the file share) is highly available. Ref Link:- https://www.mssqltips.com/sqlservertip/2519/sql-server-alwayson-availability-groups--part-1-configuration/ Preparing Windows Nodes Configure AlwaysON
  6. 6.  Install SQL Server 2014 Enterprise Edition on both Nodes (our case need to install 2 Named instances on each node)  Configure SQL servers to run with domain admin account  Enable Always On High Availability option for each SQL instance(All SQL instances in All nodes) .  Open SQL 2014 Management Studio, create a database with a Full recovery model  Take a full backup of selected databases.  Expand Availability Groups and click “New Availablity Group Wizard”.  Specify an availability group name  Add your SQL replicas and choose mode (in our case Asynchronies )  Create listener  An availability group listener is a virtual network name (VNN) to which clients can connect in order to access a database in a primary or secondary replica of an AlwaysOn availability group. An Availability Group Listener is assigned a unique DNS name and one or more IP addresses.  While availability group listeners enable support for failover redirection and read-only routing, client connections are not required to use them. A client connection can also directly reference the instance of SQL Server instead of connecting to the availability group listener  Create a shared folder for first sync. Configuring AlwaysOn Availability Groups  All SQL Server instances must use the same collation  All DB’s which are participants of AG should be configured with full recovery mode  Ensure that the file paths and drive letters are consistent throughout all instances  Database Not belong to any existing availability group  Database Not be configured for database mirroring.  SQL Server Agent jobs, logins, linked servers, and other objects that are stored outside of the availability databases do not fail over with the availability group  Use the same service accounts in both SQL instances Conditions :-
  7. 7. Service Account:- Make you use a Domain account with Domain admins permission for below SQL Services  SQL Server Agent  SQL Server Database Engine  SQL Server Browser Note:-  SQL Server 2014 / 2016 supports up to 8 secondary replicas with up to 2 Synchronous replicas  AlwaysOn is not possible in Standard Edition till SQL server 2014 but it is possible from SQL Server 2016  Some reporting tasks can be offloaded to the replica instances.  Backup operations can be offloaded to the secondary replica instances. This will minimize load on the primary database. Ref Link:- https://www.mssqltips.com/sqlservertip/2518/sql-server-alwayson-availability-groups--part-2-availability-groups-setup/ http://www.careexchange.in/installingconfiguring-sql-2014-always-on-cluster-on-windows-2012-r2-recommended-way/ https://msdn.microsoft.com/en-ca/library/ff878487.aspx
  8. 8. The increased use of solid-state disks (SSDs) has provided users with high-speed hardware, enabling very fast throughput. This however, can be overwhelming to a system trying to write transactions to a secondary server. Because of this, Microsoft has revamped the data synchronization process for AlwaysOn, streamlining the pipeline so that there is better throughput and also less stress on the CPU. Bottlenecks are most likely to occur during the Log Capture and Redo steps. Previously the log-capture and the redo steps used a single thread to process the logs, but now these steps use multiple threads and run in parallel, which greatly improves performance. Better Log Transport Performance Transaction Occurs –> Log Flush –> Log Capture –> Send –> Log Received –> Log Cached –> Log Hardened –> Acknowledgement Sent –> Redo
  9. 9. AlwaysON important enhancement in SQL Server 2014 is the increased maximum number of secondary's. SQL Server 2012 supported a maximum of 4 secondary replicas. With SQL Server 2014, AlwaysOn Availability Groups now supports up to 8 secondary replicas. The additional secondary replicas can be used to distribute read workloads and provide enhanced recoverability.
  10. 10. AlwaysOn with availability groups configuration for DR purposes. It has a primary replica in Data Center 1, and its secondary replica is in Data Center 2. We use the asynchronous mode because of the distance and network speeds.
  11. 11. 19 Configure SQL Server 2012 AlwaysOn Prerequisites System Requirements • Windows Server 2008 SP2, Windows Server 2008 R2 SP1, or Windows Server 2012. • x86 or x64 only. • Each computer must be a node in a Windows Failover Cluster (WFC) • Each node must have a drive with the same letter as the other nodes.
  12. 12. SQL Server Requirements (Some basics you need to know when connecting )  For Kerberos to work – all SQL instances need to run as the same domain account and SPNs must be manually registered.  Enterprise Edition is required.  All instances must use the same collation.  Availability Group Listeners use only TCP/IP. All clients connecting must use TCP/IP  All the cluster nodes must be in the same Active Directory Domain Services (AD DS) domain.  Each availability replica in an availability group must reside on a different node of the same Windows Server Failover Clustering (WSFC) cluster.  The cluster creator must have the following accounts and permissions:  Have a domain account in the domain where the cluster will exist.  Have local administrator permissions on each cluster node.  Have Create Computer objects and Read All Properties permissions in AD DS
  13. 13. Cluster Requirements. VERY IMPORTANT: Losing the quorum in the WSFC configuration the Availability Group is running on has the consequence that the AG will shutdown and the databases within the AG become unavailable. This is similar to losing the Mirror plus the Witness in SQL Server Database Mirroring Permissions :-The key thing to remember about this share is that you must give the cluster computer name read/write permissions to the share at both the Share level and NTFS level permissions. You will need to make sure you give the cluster computer account read/write permissions
  14. 14. quorum is the minimum number of members of a deliberative assembly necessary to conduct the business of that group. In short quorum is minimum number of votes required for majority Quorum vs. Majority Node Set Nodes participating in the windows cluster are connected through a private network and communicate through User Datagram Protocol (UDP) port 3343. The quorum configuration in a failover cluster determines the number of failures (failure of nodes) that the cluster can sustain while still remain online If additional failure happened beyond this threshold, the cluster will stop running . Quorum is designed to handle the Split Brain scenario. Let us assume that we have four node cluster and one instance of sql server is running on each node Once the servers are joined to the windows failover cluster manager, the quorum is set. The vote for the primary nodes are set to 1 and for disaster recovery is set to 0, with a file share having vote to 1. (The file share is kept in order to maintain odd number of votes and node majority in the case of disaster on one of the nodes) In this scenario, Node1 and Node2 try to bring online the Sql instance(resource) owned by Node3 and Node4 When nodes are unable to communicate each other, each node assume that, resource groups owned by other nodes have to brought online. When same resource brought online on multiple nodes at the same time,data corruption can occur. This scenario is called Split Brain  Node1 and Node2 lost the communication with Node3 and Node4  Node1 and Node2 can communicate each other and Node3 and Node4 can communicate each other
  15. 15. Voting Cluster requires more than half of the total votes to achieve the quorum. This is to avoid the tie in the number of votes. In a 8 node cluster , 5 voters must be online and able to communicate each other to have quorum. Because of this logic, it is recommended to always have an odd number of total voters in the cluster and the quorum setting define the the voters in a cluster. This does not necessarily mean an odd number of nodes is needed to form the cluster since both a witness disk (quorum disk) and a file share can contribute a vote, depending on the quorum settings In the same way Node3 and Node4 will try to bring online the Sql instance (resource) owned by the Node1 and Node2, which will lead to disk corruption and many other issues The windows cluster quorum setting is designed to prevent this kind of scenario By having the concept of quorum, the cluster will force the cluster service to stop in one of the subsets of nodes to ensure that there is only one true owner for the particular resource group
  16. 16. Quorum Settings Node Majority Node majority option is recommended for cluster with odd number of nodes. This configuration can handle a loss of half of the number of cluster nodes rounded off downwards. EX:- a five node cluster can handle failure of two nodes. In this scenario three of the nodes (N1,N2,N3) can communicate each other but other two(N4 and N5) are not able to communicate. The group constituted by three node have the quorum (majority) and cluster will remain active and cluster service will be stopped on the other two nodes (N4 and N5). The resource group (sql server instance) hosted on that two nodes goes offline and come online on one of the three nodes based on possible owner settings
  17. 17. The witness disk is a small ( approx 1 GB ) clustered disk. This disk is highly available and can failover between nodes. It is considered as part of the cluster core resource group. In a four node cluster, if there is a partition between two subsets of nodes, one of the subset will have witness disk and that subset will have quorum and cluster will remain online. This means that the cluster can lose any two voters, whether they are two nodes or one node and the witness disk Node and Disk Majority This option is recommended for cluster with even number of nodes. In this configuration every node gets one vote and witness disk (quorum disk) gets one vote which makes total votes a odd number
  18. 18. Node and File Share Majority This configuration is similar to the Node and Disk Majority, but in this case the witness disk is replaced with a file share which is also known as File Share Witness Resource (FSW) This quorum configuration usually used in multi-site clusters (nodes are in different physical location) or where there is no common storage. The File Share Witness resource is a file share in any server in the same active directory which all the cluster nodes have access to. One of the node in the cluster will place a lock on the file share to consider that node as owner of the file share. When this node goes offline or lost the connectivity another node grabs the lock and own the file share. On a standalone sever, the file share is not highly available , however the file share can also put on a clustered file share on an independent cluster, making the FSW clustered and giving it the ability to fail over between node. It is important that, this file share should not put in a node of the same cluster, because losing that node would cause for loosing two votes. A FSW does not store cluster configuration data like witness disk. It contain information about which version of the cluster configuration database is most recent https://clusteringformeremortals.com/2009/09/15/step-by-step-configuring- a-2-node-multi-site-cluster-on-windows-server-2008-r2-–-part-1/
  19. 19. No Majority (disk only) This configuration was available in windows server 2003 and has been maintained for compatibility reason and it is highly recommended not to use this configuration. In this configuration, only witness disk has a vote and there are no other voters in the cluster. That means if all nodes are online and able to communicate , but when witness disk failed or corrupted, the entire cluster will go offline. This is considered as single point of failure Ref:- http://networksandservers.blogspot.in/2011/09/failover-clustering-iii.html
  20. 20. You realized in this case, nothing needed to be changed in the Cluster configuration to accommodate this scenario. The quorum model will remain exactly the same way as it was before. Quorum configuration changes for case of Always on cluster setup In Windows Server 2012 and Windows Server 2012 R2, quorum majority is determined by the set of nodes that are active members of the cluster at a given time. This behavior is called Dynamic Quorum (DQ), and it is enabled for all clusters by default. Unless manually disabled by an administrator, DQ is therefore enabled on all DAGs that are deployed on Windows Server 2012 or Windows Server 2012 R2. With DQ, the cluster dynamically manages the assignment of votes to nodes, based on the state of each node, and dynamically recalculates quorum requirements. It’s pretty simple, actually:  When a node shuts down, crashes, or loses connectivity with the cluster, it loses its quorum vote  When a node rejoins the cluster, it regains its quorum vote If the cluster maintains quorum after a shutdown or failure, and whenever a node rejoins the cluster, the number of votes required to maintain quorum will be recalculated, based on this change in vote count. DQ’s ability to dynamically manage votes is different from the ability an administrator has to manually remove a vote from a node. If an administrator removes a node’s vote by setting it’s NodeWeight property to a value of 0, then DQ does not dynamically give back the vote. Dynamic Quorum
  21. 21. The idea behind DQ is that, by adjusting the assignment of quorum votes and dynamically increasing or decreasing the number of quorum votes required to keep running, the cluster can sustain sequential node shutdowns (or failures) all the way down to a single node (referred to as a “last man standing”). As long as quorum is maintained after a shutdown or a failure, DQ can recalculate quorum requirements, and the cluster will reduce the number of votes needed to maintain quorum. And that is a fundamental requirement for DQ: quorum must be maintained after a shutdown or failure. If quorum is lost, DQ does nothing for you. DQ does not allow a cluster to sustain a simultaneous failure of the majority of voting members. In Windows Server 2012 R2, a cluster configured to use DQ (which is all clusters by default), will also use a feature called Dynamic Witness (DW). In DW, the witness vote is dynamically adjusted based on the number of current votes. Like DQ, the logic used here is pretty simple:  If there are an odd number of votes, the witness does not have a vote.  If there are an even number of votes, the witness has a vote. The witness vote is also dynamically adjusted based on the state of the witness resource. If the File Share Witness resource is Offline or Failed, the cluster sets the witness vote to 0. This is an important change over previous versions of WFC. With DW, the cluster decides whether to use the witness vote based on the number of votes that are currently available in the cluster Dynamic Witness https://blogs.technet.microsoft.com/scottschnoll/2014/02/25/windows-server-2012-r2-and-database-availability- groups/
  22. 22. Comparing Cluster with Always’s on setup If we compare the configuration we started with and the configuration we ended with, there are some considerations:  In the configuration we started with, SQL Server was running on one node only. Now it is running on two nodes and ideally has the same memory and disk configurations on both nodes  There is increased network traffic and CPU consumption on the node which is running the secondary replica. Compared to the configuration where no SQL Server was running  Experienced failover of SQL Server will be faster since the SQL Server instance is up and running already. Also the cache is filled with data pages already (at least the pages which experienced changes) In rough the distribution between the SQL Server part and the WSFC part looks like:  Transaction Log Data replication and network handling in transfer is the part SQL Server does  Creating the necessary cluster services and resources is done by SQL Server when creating an Availability Group or an Availability Group Listener  Delivering logic which detects whether a replica is responding or not responding is part of SQL Server interfacing with the WSFC framework  Administrating and defining Availability Modes or Failover modes, configure readable secondary replicas is all part of SQL Server’s responsibilities  Providing a quorum is part of the logic and configuration WSFC needs to provide  Reacting on the fact that a primary replica in the AlwaysOn configuration doesn’t react anymore, is part of the WSFC  Reacting on the fact that a quorum is lost is part of WSFC responsibility
  23. 23. Up to 10 availability groups is the recommend, but it’s not enforced Up to 100 databases is the recommended part of group, but it’s not enforced Alwayson configuration setup
  24. 24. and make a choice “ add Replica “ When we select the add replica a SQL login screen will popup The Chosen Server is selected and added to secondary. In a cluster there is no automatically failover! When nodes are added, we also need to configure Replicas according to our needs [picture above]: Initial Role - this is how I want my roles to be initiated after this wizard is completed. Later roles can be changed. Synchronous-commit replicas support two settings—automatic or manual. The "automatic" setting supports both automatic failover and manual failover. To prevent data loss, automatic failover and planned failover require that the failover target be a synchronous-commit secondary replica with a healthy synchronization state (this indicates that every secondary database on the failover target is synchronized with its corresponding primary database). Synchronous Commit - by selecting this I’m making synchronization type “Synchronous Commit”.
  25. 25. Whenever a secondary replica does not meet both of these conditions, it supports only forced failover. Note that forced failover is also supported a replicas whose role is in the RESOLVING state.  Asynchronous-commit replicas support only the manual failover mode. Moreover, because they are never synchronized, they support only forced failover Note:- In AlwaysOn 2012 and 2014, you were allowed a maximum of two replicas to designate for automatic failover. AlwaysOn 2016 allows three replicas for automatic failover. You must have synchronous data replication and automatic failover set between the primary and the secondary's
  26. 26. Failover - by selecting this I’m making sure that in case of a Primary node failure the other will take the role automatically. Automatic failover This form of failover occurs without administrator intervention. No data loss occurs during automatic failover. Automatic failover is supported only if the current primary and at least one secondary replica are configured with a failover mode set to AUTOMATIC, and at least one of the secondary replicas set to AUTOMATIC is also synchronized. Automatic failover can occur only if the primary and replica are in synchronous-commit mode Planned manual failover This form of failover is triggered by an administrator. No data loss occurs during planned manual failover. You perform this type of failover when you must perform a type of maintenance on a host instance that requires the instance or the host server to be taken offline or restarted. Planned manual failover can occur only if at least one of the secondary replicas is in a SYNCHRONIZED state. You can perform planned manual failover only if the primary and replica instances are in synchronous-commit mode. Forced manual failover This form of failover involves the possibility of data loss. Use forced manual failover when no secondary replica is in the SYNCHRONIZED state or when the primary replica is unavailable. This type of failover is the only type supported if asynchronous-commit mode is used on the primary, or if the only available replica uses asynchronous-commit mode. To perform manual failover by using SQL Server Management Studio, perform the following steps:  Connect to the server instance that hosts the secondary replica of the availability group that you will make the primary replica.  Right-click the availability group and click Failover. This starts the Fail Over Availability Group Wizard.  On the Select New Primary Replica page, shown in Figure 8-17, select the instance on which to perform failover and then click Next.
  27. 27. Below is a list of limitations about backups • BACKUP DATABASE supports only copy-only full backups of databases, files, or filegroups on secondary replicas • Differential backups are not supported on secondary replicas • BACKUP LOG supports only regular log backups on secondary replicas, the copy-only option is not supported • Secondary replicas must be in SYNCHRONIZED or SYNCHRONIZING state and be able to communicate with the primary Readable Secondary - I want to be able to perform database read operations from the instance that is performing a Secondary role, therefore “Yes” is specified
  28. 28. 38
  29. 29. Readable secondary: • No. This is the default value. This means that the database does not allow read-only workloads when running as a secondary role. Any attempts to run SELECT statements in this database will return an error similar to that of when running queries against a database mirror. • Read-intent only. This setting allows read-only workloads when running as a secondary role only if the application connection string contains the parameter: Application Intent=Read-only. •Yes. This setting allows read-only workloads regardless of whether or not the application connection string contains the parameter: Application Intent=Read-only.
  30. 30. After you have created your AG Listener, you must make sure your clients can connect. Your application connection operates in the same manner it always has, however, instead of pointing towards a specific server in your connection string, you point towards the AG Listener AG Listeners can only be connected to using TCP, and are resolved by your local DNS to the list of IP addresses and TCP ports that are mapped to the VNN. Your client will attempt to connect to each of the IP addresses in turn until it either gets a connection or until it reaches a connection timeout. An important connection string parameter to consider using is MultiSubnetFailover. If this parameter is set to true, the client will attempt the connections in parallel enabling faster connectivity and if necessary, faster client failovers: When a failover occurs, client connections are reset, and the ownership of the AG Listener moves to the SQL Server instance that takes over the primary replica role. The VNN endpoint is then bound to the new IP addresses and TCP ports of the new primary replica instance AG Listener Connection Strings Server=tcp:MyAgListener,1433;Database=Db1;IntegratedSecurity=SSPI; MultiSubnetFailover=True;ApplicationIntent=ReadOnly; Application Intent One of the biggest reasons to implement Availability Groups is to provide the ability to leverage your backup or disaster recovery environments to offload work from your production environment. These servers can now be used for backups, analysis, ad-hoc queries and reporting, or any other operation in which having a read-only copy of the database is sufficient. Server=tcp:MyAgListener,1433;Database=Db1;IntegratedSecurity=SSPI; MultiSubnetFailover=True
  31. 31. SQLCMD -S SQLListener -K ReadOnly -d Finance -Q”select @@servername” -W -S <ListenerName> -K <ReadOnly Intent> -d <DatabaseName> -Q <Query> -W <remove trailing spaces> Listener connectivity
  32. 32. Connection String:- Provider=SQLNCLI11.1;Integrated Security=SSPI; Persist Security Info=False; User ID="";Initial Catalog="";Data Source=AGListner;Initial File Name="";Server SPN="";ApplicationIntent=READONLY Data Source=”TestAGListen,98765“; Initial Catalog=AdventureWorks; Integrated Security=True;
  33. 33. To provide read-only access to your secondary replicas, the Application Intent connection string parameter is used. An optional read-only routing list of SQL Server endpoints can be configured on each replica. This list is used to redirect client connection requests that use the Application Intent=ReadOnly parameter to the first available secondary replica which has been configured with an appropriate application intent filter. Ref:- https://sqlperformance.com/2013/11/system-configuration/ag-connectivity All the options can be set but If you have multiple instances (AlwaysOn FCI ) and installed a local standalone Instance You may need to change the Endpoint Port! the default is 5022. I changed the port to 5023 just to make sure that there is no problem on my server.
  34. 34. 44 One big advantage of SQL Server 2012 AlwaysOn setting is that your secondary replica can be read only and can also handle SQL Server database backup jobs. With AlwaysOn active secondary replicas, you can use secondary hardware to perform backups and other resource intensive read only queries. Idle hardware is no longer a factor when you choose a SQL Server high availability solution SQL Server database backups can be performed on an active secondary replica with the copy_only option for full database, file, filegroup and transaction log backups. You can configure an availability group to specify where backups can be performed. To do this, set the WITH AUTOMATED_BACKUP_PREFERENCE option of the CREATE AVAILABILITY GROUP or ALTER AVAILABILITY GROUP T-SQL statements. You can script these backup jobs for this setting when you choose where your preferred replica backups are executed.
  35. 35. The valid values for WITH AUTOMATED_BACKUP_PREFERENCE options are: Prefer Secondary – Specifies that backups should occur on a secondary replica except when the primary replica is the only replica online. In that case, the backup should occur on the primary replica. This is the default option. Secondary only – Specifies that backups should never be performed on the primary replica. If the primary replica is the only replica online, the backup should not occur. Primary – Specifies that the backups should always occur on the primary replica. This option is useful if you need backup features, such as creating differential backups, that are not supported when backup is run on a secondary replica. Any Replica – Specifies that you prefer that backup jobs ignore the role of the availability replicas when choosing the replica to perform backups. Note backup jobs might evaluate other factors such as backup priority of each availability replica in combination with its operational state and connected state.
  36. 36. you’ll notice the Backup Priority column in the grid at the bottom. This where you set a relative weight for each replica to conduct the backups. Finally, you’ll see the Exclude Replica column. Here you can check off which replicas to exclude from the backup preferences Alternatively, you can configure this option when running the New Availability Group wizard for setting up your availability group. This screen is on the Specify Replicas dialog window and on the Backup Preferences tab. See the image below:
  37. 37. 47
  38. 38. 48 On Listener section select create an availability group listener and Type the Name port and Add an IP address. As you can see from the screenshot above, having a Listener is not a mandatory thing. Why? Because database users can still connect to the Primary database replica [read/write replica] directly by specifying a name of a server that is holding it. But if the Primary replica moves to another node then your users must change the connection address on their end manually. With the Listener all this redirection magic happens automatically.
  39. 39. 50 Full :-Means that the wizard will take a Full Database Backup, and a Transaction Log Backup, and will restore both backups with NO RECOVERY on the Secondary Replicas. This is the preferred option for very small databases, but doesn’t really work with larger databases Join:- Only assumes that the database on the other replicas is restored with the NO RECOVERY option. You can prepare your database for example through Log Shipping or Database Mirroring on the other replicas, and then join it finally to your AG skip initial data synchronization: Manually takes a full backup and log backup of each database.
  40. 40. 51
  41. 41. 52
  42. 42. 53
  43. 43. Synchronizing.  On a primary replica:  For a primary database, indicates that this database is ready to accept a scan request from a secondary database.  On a secondary replica, indicates that there is active data movement going on for that secondary database.  On a secondary replica, indicates that there is active data movement going on for that replica. Indicates whether the availability database is currently synchronized with primary replica. This value is shown by default. The possible synchronization states are: Synchronization State Not synchronizing.  For the primary role, indicates that the database is not ready to synchronize its transaction log with the corresponding secondary databases.  For a secondary database, indicates that the database has not started log synchronization because of a connection issue, is being suspended, or is going through transition states during startup or a role switch. Synchronized.  For a primary database, indicates that at least one secondary database is synchronized.  For a secondary database, indicates that the database is synchronized with the corresponding primary database.
  44. 44. According to MSDN this issue can be caused by the following:  The availability replica might be disconnected.  The data movement might be suspended.  The database might not be accessible.  There might be a temporary delay issue due to network latency or the load on the primary or secondary replica. Errorlog:-The target database, ‘YourDatabase’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online. (Microsoft SQL Server, Error: 976) Solution: Resuming data movement on the database manually ALTER DATABASE [YourDatabase] SET HADR RESUME
  45. 45. Not Synchronizing / Recovery Pending While upgrading the storage in a SQL Server 2014 SP1 (12.0.4422.0) instance we ran in to an issue where two of the databases would not start on the secondary after restarting SQL Server. The server had been offline for a few hours while we installed new (larger) SSDs and copied the data files over to the new volume. When we restarted SQL Server all but two of the databases started synchronizing again. The other two were displayed in SSMS as Not Synchronizing / Recovery Pending. -- Remove database from Availability Group: --Apply t-logs to catch up. This can be done manually in SSMS or via: RESTORE LOG [StackExchange.Bicycles.Meta] FROM DISK = 'ny- back01backupsStackExchange.Bicycles.Meta_LOG_20160217_033201.trn' WITH NORECOVERY; Alter Database [StackExchange.Bicycles.Meta] SET HADR OFF; -- Re-join database to availability group ALTER DATABASE [StackExchange.Bicycles.Meta] SET HADR AVAILABILITY GROUP = [SENetwork_AG]; ALTER DATABASE [StackExchange.Bicycles.Meta] SET HADR RESUME;
  46. 46. Initializing.  Indicates the phase of undo when the transaction log required for a secondary database to catch up to the undo LSN is being shipped and hardened on a secondary replica Note:-When a database is in the INITIALIZING state, forcing failover to the secondary replica will always leave that database in a state in which it cannot be started. Reverting.  Indicates the phase in the undo process when a secondary database is actively getting pages from the primary database. Note:- When a database is in the REVERTING state, forcing failover to the secondary replica can leave that database in a state in which it cannot be started.
  47. 47. 60 Heart Beat: Heart beat is health check mechanism in cluster A single UDP packet sent between nodes in the cluster via the private network to confirm that nodes are still online. By default cluster service will wait five seconds(one heart beat sent in every second) before considering a cluster node to be unreachable. IsAlive check: This process checks and verifies the cached result of the internal IsAlive process in the SQL Server resource DLL. The internal IsAlive process runs every 60 seconds and verifies if SQL server is online on not. The check uses SELECT @@SERVERNAME to verify the state of the SQL Server. In case the query fails , it runs additional retry login to avoid failures. On the event of retry logic failure, the internal IsAlive process shuts down the SQL server service and failover event is triggered. The IsAlive check also known as Thorough resource health check in Windows Server 2008 This check performs a basic verification that the SQL Server service is running on the hosted node which should be online on a given interval of time. The default time that has been set is 5 seconds. In case the check fails, the cluster service performs another check which is very throrugh in nature called IsAlive Check to verify the failure.The LooksAlive check is also known as Basic resource health check in Windows Server 2008. LooksAlive Check:
  48. 48. AlwaysOn Availability Group with Replication Setup What is Supported?  SQL Server replication supports the automatic failover of the publisher, the automatic failover of transactional subscribers, and the manual failover of merge subscribers. The failover of a distributor on an availability database is not supported.  In an AlwaysOn availability group a secondary database cannot be a publisher. Re-publishing is not supported when replication is combined with AlwaysOn Availability Groups.  Distributor and Subscriber (You can choose a completely new server to be the distributor as well, however do not have a distributor on any of the publishers in this case as the failover of a distributor is not supported in this case).
  49. 49.  A publication database can be part of an AG. The publisher instance must share a common distributor. The types of replication that are supported within an AG are Transaction, Merge and Snapshot.  A database in an AG secondary replica cannot be a publisher. Republishing is not supported.  Peer-To-Peer (P2P) bi-directional, reciprocal transactional publications, and Oracle Publishing are not supported.  A database that is enabled for Change Data Capture (CDC) can be part of an AG.  A database enabled for Change Tracking (CT) can be part of an AG.  Sp_redirect_publisher This new stored procedure specifies a redirected publisher for an existing publisher/database pair. If the publisher database belongs to an AG, the redirected publisher is the Availability Group Listener (AGL) name associated with the AG.  Sp_get_redirected_publisher Replication agents use this new stored procedure to query the distributor to determine whether the original publisher has been redirected. This would imply that you have had a failover of the AG on which your publisher is located.  Sp_validate_redirected_publisher This new stored procedure verifies that the current host for the publishing database is capable of supporting replication. It must be run from a distribution database.  Sp_validate_replica_hosts_as_publishers This new stored procedure is an extension of the sp_validate_redirected_publisher. This stored procedure validates an entire AlwaysOn replication topology. Like the stored procedure above,this one also needs to be run from a distribution database. In order for AGs to support Replication, three replication agents were modified. The Log Reader, Snapshot and Merge Agents were modified to use the sp_get_redirected_publisher stored procedure to cope with determining where publisher is located. In order to support Replication with AGs, four new stored procedures have been provided :
  50. 50. Automatic failover of the publisher The following sections build the environment described above:  Configure a remote distributor  The distributor should not be on the current replica of the availability group of which the publishing database is part of  We can have a dedicated server (which is not part of the AG) acting as a distributor or we can have the distributor on the subscriber (provided subscriber is not part of an AG)
  51. 51. Configure the primary replica as Publisher Configure distribution on possible publishers (secondary replicas) Now we will configure distribution on possible publishers. In our case we will configure in Server 62.  Connect Server 62. Right click on ‘Replication‘ folder. Click on ‘Configure Distribution‘. Click ‘Next‘.  In ‘Distributor‘ dialogbox, select ‘Use the following server as…‘. Click on Add button and add Server 63. Click ‘Next‘.
  52. 52. USE distribution; GO EXEC sys.sp_redirect_publisher @original_publisher = ‘SRV1, @publisher_db = ‘MyNorthWind’, @redirected_publisher = ‘AGListener’;  Redirect the Original Publisher to the AG Listener Name  We have already created an AG listener named AGListener. At the distributor (Connect to SRV4) , in the distribution database, run the stored procedure sp_redirect_publisher to associate the original publisher and the published database with the availability group listener name of the availability group. Configure the secondary replica hosts as replication publishers If in any case a secondary replica transform to the primary role, it must be configured to take over after failover. Now we will create linked server to Server 63 to connect to subscriber by publishers. Run below command on possible publishers Server. In over case it is Server 62 EXEC sys.sp_addlinkedserver @Server = 'Server 63  Run the Replication Validation Stored Procedure to verify the Configuration USE distribution; GO DECLARE @redirected_publisher sysname; EXEC sys.sp_validate_replica_hosts_as_publishers @original_publisher = ‘SRV1’, @publisher_db = ‘MyNorthWind’, @redirected_publisher = ‘AGListener’;  Create a Subscription
  53. 53. https://blogs.msdn.microsoft.com/alwaysonpro/2014/01/30/setting-up-replication-on-a-database-that-is-part-of- an-alwayson-availability-group/ https://www.simple-talk.com/sql/database-administration/expanding-alwayson-availability-groups-with- replication-publishers/ http://www.faceofit.com/tag/sql-server-always-on/ http://ronak.extreme-advice.com/20140619-configure-replication-database-part-alwayson-group/ http://www.techbrothersit.com/2015/07/how-to-setup-replication-with-alwayson.html?m=1 The automatic failover of transactional subscribers in AlwaysOn Availability Group USE [master] GO EXEC master.dbo.sp_addlinkedserver @server = N'MyLinkedServer' ,@srvproduct = N'SQL' ,@provider = N'SQLNCLI11' ,@datasrc = N'MyListener' ,@provstr = N'Integrated Security=SSPI;Initial Catalog=Production;Data Source=MyListener;ApplicationIntent=ReadOnly' ,@catalog = N'Production' GO How would you test whether the connection is working fine or not? That’s simple. select * from openquery(MyLinkedServer,'select @@servername')
  54. 54.  Before creating the subscription, add the subscriber database to the appropriate Always On availability group.  Add the subscriber's availability group Listener as a linked server to all nodes of the availability group. This step ensures that all potential failover partners are aware of and can connect to the listener.  Using the script in the Creating a Transactional Replication Push Subscription section below, create the subscription using the name of the availability group listener of the subscriber. After a failover, the listener name will always remain valid, whereas the actual server name of the subscriber will depend on the actual node that became the new primary. -- commands to execute at the publisher, in the publisher database: use [<publisher database name>] EXEC sp_addsubscription @publication = N'<publication name>', @subscriber = N'<availability group listener name>', @destination_db = N'<subscriber database name>', @subscription_type = N'Push', @sync_type = N'automatic', @article = N'all', @update_mode = N'read only', @subscriber_type = 0; GO EXEC sp_addpushsubscription_agent @publication = N'<publication name>', @subscriber = N'<availability group listener name>', @subscriber_db = N'<subscriber database name>', @job_login = null, @job_password = null, @subscriber_security_mode = 1; GO
  55. 55.  If creating a pull subscription:  In Management Studio, on the primary subscriber node, open the SQL Server Agent tree.  Identify the Pull Distribution Agent job and edit the job.  On the Run Agent job step, check the -Publisher and -Distributor parameters. Make sure that these parameters contain the correct direct server and instance names of the publisher and distributor server.  Change the -Subscriber parameter to the subscriber's availability group listener name. Note:- The subscription must be created by using a Transact-SQL script and cannot be created using Management Studio. I have noticed that the subscriptions are created on all nodes where ever the Subscription DB is participated in AllwaysOn Availability Group. The distribution agent is responsible for sending the replicated data to the Subscriber Db which is the Current Primary Replica in the AAG. Further AlwaysOn will take care of distributing data from Primary Replica to the other secondary Replicas. observations:
  56. 56. AG Worker Threads The ‘Max Worker Threads’ configuration option is at the Instance- level and is found in sys.configurations. It can be set or changed using the sp_configure system stored procedure. This configuration option enables SQL Server to create a pool of worker threads that are then available to service a large number of SQL Server processes: This may improve performance. By default this value is set to 0. A value of 0 allows SQL Server to automatically configure the number of worker threads at start up time. Table shows the number of worker threads that are automatically- configured based on the number of CPUs present on the server: A worker thread is what is used by SQL Server to perform some form of processing. A number of factors will determine the number of worker threads available on your system. They are created automatically by SQL Server as required. SQL Server has an instance-level configuration that is used to limit the number of worker threads available to SQL Server processes Worker Thread Exhaustion It is not just with AGs that you’ll see error messages related to Worker Thread Exhaustion, but when you do, it is likely to be a symptom of adding too many databases to your AG environment.
  57. 57. Your system will become unresponsive and, if you try to open a new connection to your instance of SQL Server through SSMS, you will more than likely be met with an error message such as “Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding”. Alternatively you could investigate the SQL Server error log and you may see a message like “New queries assigned to process on Node x have not been picked up by a worker thread in the last x seconds.” If you do experience such an error, you may then have some difficulty in connecting to the affected instance. If you do, you could restart your instance of SQL Server: This will clear all connections and so should allow you to connect. If not, you can still gain access if you are administrator by using the Dedicated Admin Connection (DAC). Once you have a connection you can go looking for the ID of the troublesome session. More than likely, this will be your process that is trying to add the three hundred Databases. 1- First scenario (no availability groups) The first scenario includes an environment with no availability groups. The global number of worker threads is as following: select scheduler_id,current_tasks_count, current_workers_count,active_workers_count,work_queue_count from sys.dm_os_schedulers where status = 'VISIBLE ONLINE' go
  58. 58. This view is not perfect in our case because it includes all the worker threads of the SQL Server instance (hadr worker threads are included to this number). But we will use it as a starting point because there is no activity on my lab and we can admit that the active_workers_count column value will be relatively close than the number of HADR worker threads 2 – Second scenario (availability group with 100 idle databases) The second scenario consists in adding 100 databases to my newly availability group but there is no activity. Let’s have a look at the global number of worker threads: The number of worker threads has increased but this is not a big deal here because the availability databases are not very active. At this point I want to introduce another way to get the number of hadr worker threads by using the extended events and the hadr_thread_pool_worker_start event: Required script is in below slide…………..
  59. 59. create event session HadrThreadPoolWorkerStart on server add event sqlserver.hadr_thread_pool_w orker_start add target package0.event_file ( set filename = N'E:SQLSERVERSQL14backup HadrThreadPoolWorkerStart.x el' ) with ( max_memory = 4096 KB, event_retention_mode = allow_single_event_loss, max_dispatch_latency = 30 seconds, max_event_size = 0 KB, memory_partition_mode = none, track_causality = off, startup_state = on ); go the data extraction script: declare @top_count int; set @top_count = 100; ;with xe_cte as ( select object_name, cast(event_data as xml) as event_data from sys.fn_xe_file_target_read_file ( 'E:SQLSERVERSQL14backupHadrThreadPoolWorkerStart*.xel', null, null, null) ) select top (@top_count) DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_data.value('(/event/@timestamp)[1]', 'datetime2')) AS [timestamp], event_data.value('(/event/data/value)[3]', 'int') as active_workers, event_data.value('(/event/data/value)[2]', 'int') as idle_workers, event_data.value('(/event/data/value)[1]', 'int') as worker_limit, event_data.value('(/event/data/value)[4]', 'varchar(5)') as worker_start_success from xe_cte order by [timestamp] desc;
  60. 60. 3-scenario : case of a worker threads exhaustion scenario in the previous scenarios, we saw that increasing the number of databases can have an impact on an availability group. From my point of view, facing this situation is probably the worst scenario but we have to take into account in your design regarding your context. In this scenario, I will voluntary increase the number of databases up to 500 to reach out more quickly the number of allowed active worker threads.
  61. 61. AG Worker Thread Requirements There are a number of factors to consider when undertaking the capacity-planning phase for your AG environment. One such factor is the amount of resources each replica will require to provide appropriate performance for the system. To work this out, you need to take into account not only the normal activity that your environment will undertake but also the number of databases that will participate in the data transfers from your primary replica to your identified secondary replicas. There are, of course, many other factors as well that need to be taken into consideration, and some of them are covered by Jeremiah Peschka’s article “AlwaysOn Availability Groups: The Average of its Parts” but for this article we are only concerned with working out the scale of the resources in terms of the Worker Threads. AGs have the following worker thread requirements (As quoted from “Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups (SQL Server) – Thread Usage by Availability Groups”):
  62. 62.  An idle instance of SQL Server, AlwaysOn Availability Groups uses 0 threads  The maximum number of threads used by Availability Groups is the configured setting for the maximum number of threads (‘max worker threads‘) minus 40  The availability replicas hosted on a given server instance share a single thread pool (HADR Worker Pool)  Threads are shared on an on-demand basis :  Typically there are 3-10 shared threads, but this can increase depending on the primary replica workload  If a given thread is idle for a while it is released back into the general SQL Server thread pool. Normally, an inactive thread is released after ~ 15 seconds of inactivity. However, depending on the last activity, an idle thread might be retained longer  In addition, availability groups use unshared threads as follows:  Each primary replica uses 1 Log capture thread for each primary database. In addition, it uses 1 Log send thread for each secondary database. Log send threads are released after ~ 15 seconds of inactivity.  Each Secondary replica uses 1 redo thread for each secondary database. Redo threads are released after ~ 15 seconds of inactivity  A backup on a secondary replica holds a thread on the primary replica for the duration of the backup operation. There are other worker thread requirements beyond these. The minimum number of work threads required just to facilitate having AGs configured can be calculated from  The number of AGs you have configured in your instance of SQL Server  The number of availability databases in each of the AGs  The number of availability replicas (2-5 replicas. A maximum of 4 secondary replicas with SQL Server 2012)
  63. 63. Each database participating in an AG, one LCT is used to capture the transactions occurring on the database. One LST is required for each secondary replica in the AG. To allow the process to work at least one MHT is required to handle the communication occurring between replicas. The algorithm we are going to use as outlined by Bob Dorr’s Article – HADRON Learning Series: Worker Pool Usage for HADRON enabled Databases is: Minimum Pool size = (DC x (LCT + (LST x SRC))) + MHT In a worst case scenario for our environment mentioned above all one hundred replica databases are actively being used.  Database Count (DC)  Secondary Replica Count (SRC)  Log Capture Thread (LCT)  Log Send Thread (LST)  Message Handler Thread (MHT) Primary On the primary messages the active log scanner is the log pole. When a secondary is ready to receive log blocks a message is sent to the primary to start the log scanning. This message is handled by a worker in the HadrThreadPool. The startup and tearing down of a log scan operation can be expensive so the request will retain the worker thread, waiting on new log record flushes, until it has been idle for at least 20 seconds, usually 60 seconds before returning the message to the pool for reuse. All other messages acquire a worker, perform the operation and return the worker to the pool. Secondary The expensive path on the secondary is the redo work. Similar to how the primary waits for idle log scan activity the secondary will wait for idle redo activity for at least 20 seconds before returning the worker to the pool. Minimum Pool Size = (100 x (1 + (1 x 1))) + 1 Minimum Pool Size = (100 x (1 + (1))) + 1 Minimum Pool Size = (100 x 2) + 1 Minimum Pool Size = 201
  64. 64. Messages/Task Types There is wide set of messages exchanged between the primary and secondary as depicted in the following diagram. TransportRouting DbMsg Conversation BuildMsgAndSend TransportNotification Timer EndpointChange ArMgrDbOp TransportVersioned ArMgrDbSerializedAccess SyncProgress DbRestart DbRedo EvalReadonlyRoutingInfo LogPoolTrunc NewLogReady Task Types How Do I See The Pool Workers? select * from sys.dm_exec_requests where command like ‘%HADR%’ or command like ‘%DB%’ or command like ‘%BRKR%’
  65. 65. XEvents There are many new XEvents associated with HADRON. The XeSqlPkg::hadr_worker_pool_task allows you to watch which HADRON tasks are executing and completing on your system so you can establish a specific baseline for concurrent task execution levels. Backup and File Streams Impacts A backup activity on a secondary requires a worker from the pool on the primary to maintain the proper locking and backup sequence capabilities. This could be a longer term thread and scheduling of backups can impact the worker pool. The file stream data is not part of the physical LDF file so the actual file stream data needs to be streamed to the secondary. On the primary the log block is cracked (find all File Stream records and send proper requests to secondary) and the file stream data is sent in parallel with the log blocks. The more file stream activity the database has the more likely additional threads are necessary to handle the parallel file stream shipping activities on the primary and secondary (receive and save). Max Usage The formula uses a 2x factor calculation. For a database that is under heavy activity, backups frequently active and file stream activity a 5x factor would be max use case calculation at full utilization. Again, the database activity is key to the worker use and reuse. File Steam Worker – Per database worker that is held long term Backup – Per database worker that is held long term (duration of backup) Cap The HardThreadPool is capped at the sp_configure ‘max worker threads’ minus 40 level. To increase the size of the HadrThreadPool increase the max worker thread setting. Note: increasing the max worker thread setting can reduce the buffer pool size.
  66. 66. Idle Workers in HadrThreadPool A worker in the HadrThreadPool, in an idle state for more than 2 seconds can be returned to the SQL system pool Availability Group Automatic Seeding Automatic Seeding for an Availability Group (AG) is a new addition to the way databases can be added into an AG in SQL Server 2016 Since the introduction of AG, adding a database into an AG involves a database backup and restore operation quite similar to configuring database mirroring. As part of the backup and restore process, the database backup needs to reside on a (shared) folder accessible to all SQL replicas for the restore operation. Starting with SQL Server 2016, Automatic Seeding is introduced as a new way to add databases into an AG. Automatic Seeding reads the database files directly and streams the bytes to the secondary using the database mirroring endpoints without requiring an explicit backup and restore of the database during the process. This also means the I/O overhead involved with backup and restore operation to a physical file can now be avoided. During Automatic Seeding, the Dynamic Management View (DMV) sys.dm_exec_requests exposes some information such as the percent_complete of the streaming. These are background processes which means it is scheduled internally by SQL Server. Transaction log truncation will be blocked during the Automatic Seeding activity. So, this is an important consideration if workloads are allowed on the database prior to the completion of the seeding process.
  67. 67. Automatic Seeding is a replica level setting and applies to all the databases in the AG. A documented trace flag 9567 can be turned on for the primary SQL instance during the automatic seeding process to enable compression of the data stream. This trace flag can significantly reduce the transfer time, but at the same time increases the load on the server CPU utilization. There are two DMVs to view information on Automatic Seeding activity https://www.mssqltips.com/sqlservertip/4537/sql-server-2016-availability-group-automatic-seeding/
  68. 68. SQL Server 2016 Edition Usage
  69. 69.  Multi database failover  Multiple secondary's (A secondary is conceptually similar to a Mirror in Database Mirroring)  Max of 4 secondary's  Synchronous and asynchronous data movement  Support for 2 synchronous secondary's for additional data protection  Built in compression and encryption of transport  Automatic, Manual and Forced Failover  Flexible failover policy  Automatic Page Repair  Active Secondary  Readable secondary  Secondary backup  Automatic application redirection using virtual name  Configuration Wizard for simplified deployment  Always On Dashboard  System Center Integration through new Management Pack (by the way, Database Mirroring and Replication Monitoring for SQL Server 2008 R2 will also be available J)  Automation using Powershell  Rich diagnostic infrastructure  DMV, Perfmon Counter, Xevents etc… Benefits of AlwaysON
  70. 70. Terms and Definitions
  71. 71. Limitations in SQL Server 2016 Standard edition  Limited to two nodes only (a primary and a secondary)  We can configure Synchronous or Asynchronous commit mode, This is a difference compared to database mirroring where you could only use synchronous commit mode in SQL Server Standard Edition  Like mirroring, you can’t read from the secondary, nor take backups of it  But like database mirroring, you can take snapshots of the secondary for a static reporting copy  Each database is in its own Availability Group Ref Link :- https://www.brentozar.com/archive/2015/06/how-to-set-up-standard-edition-alwayson-availability-groups-in-sql- server-2016/ https://blogs.technet.microsoft.com/msftpietervanhove/2016/05/10/how-to-set-up-basic-availability-groups-in-sql- server-2016/ SQL Server Limitations in Resource utilization SQL Server 2016 + SP1 Enterprise Standard Web Express Developer Maximum number of cores Unlimited 24 cores 16 cores 4 cores Unlimited Memory: Maximum buffer pool size per instance Operating system max 128 GB 64 GB 1410 MB Operating system max Memory: Maximum columnstore cache Operating system max 32 GB 16 GB 352 MB Operating system max Memory: Maximum in- memory data Operating system max 32 GB 16 GB 352 MB Operating system max Maximum database size 524 PB 524 PB 524 PB 10 GB 524 PB
  72. 72. Optional setting to fail over based on database failure – in 2012 and 2014, failover is determined almost entirely at the instance level. If a database goes offline, suspect, or corrupt, the AG keeps humming along. In SQL Server 2016, you will be able to have certain database metrics to initiate failover for the entire group. Distributed Transaction Coordinator support – in current versions, MSDTC is not supported for AG databases, but it will be fully supported in SQL Server 2016 (it will require an operating system update as well – it is possible that you will need the most recent version of Windows Server for full support across all scenarios). Group Managed Service Accounts are fully supported – these "worked" in SQL Server 2012/2014, but were not fully supported, and had some issues (see background information here, here, here, and here). Load Balancing for Readable Secondaries – you will be able to use a round-robin mechanism for routing read-only requests through the listener to take balanced advantage of all secondaries, versus the current approach of requests always going to the "first" available secondary. Additional automatic failover targets – you'll be able to specify up to three total secondaries for automatic failover; this matches the number of synchronous replicas allowed. Improved log transport performance – this entire pipeline was overhauled and refactored for lower CPU usage and higher throughput. Basic Availability Group – this has finally been confirmed as of CTP 3.2 to be an official option for Standard Edition customers in SQL Server 2016. For feature details and limitations, see Overview of AlwaysOn Basic Availability Groups . Domainless Availability Groups – as Microsoft has described here, you will be able to host AGs across domains with no trust and with no domain at all. (Note that this change requires Windows Server 2016 as well.) SQL Server 2016 : Availability Group Enhancements
  73. 73. New Feature in 2016-Always Encrypted
  74. 74. WAITS TYPES IN SQL SERVER ALWAYSON OR HADRON WAITS  HADR_AG_MUTEX Occurs when an AlwaysOn DDL statement or Windows Server Failover Clustering command is waiting for exclusive read/write access to the configuration of an availability group. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_AR_CRITICAL_SECTION_ENTRY Occurs when an AlwaysOn DDL statement or Windows Server Failover Clustering command is waiting for exclusive read/write access to the runtime state of the local replica of the associated availability group. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_AR_MANAGER_MUTEX Occurs when an availability replica shutdown is waiting for startup to complete or an availability replica startup is waiting for shutdown to complete. Internal use only. Note – Availability replica shutdown is initiated either by SQL Server shutdown or by SQL Server handling the loss of quorum by the Windows Server Failover Clustering node. Availability replica startup is initiated either by SQL Server startup or by SQL Server recovering from the loss of quorum by the Windows Server Failover Clustering node. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_ARCONTROLLER_NOTIFICATIONS_SUBSCRIBER_LIST The publisher for an availability replica event (such as a state change or configuration change) is waiting for exclusive read/write access to the list of event subscribers. Internal use only. Applies to: SQL Server 2012 through SQL Server 2014.
  75. 75.  HADR_BACKUP_BULK_LOCK The AlwaysOn primary database received a backup request from a secondary database and is waiting for the background thread to finish processing the request on acquiring or releasing the BulkOp lock. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_BACKUP_QUEUE The backup background thread of the AlwaysOn primary database is waiting for a new work request from the secondary database. (typically, this occurs when the primary database is holding the BulkOp log and is waiting for the secondary database to indicate that the primary database can release the lock). Applies to: SQL Server 2012 through SQL Server 2014.  HADR_CLUSAPI_CALL A SQL Server thread is waiting to switch from non-preemptive mode (scheduled by SQL Server) to preemptive mode (scheduled by the operating system) in order to invoke Windows Server Failover Clustering APIs. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_COMPRESSED_CACHE_SYNC Waiting for access to the cache of compressed log blocks that is used to avoid redundant compression of the log blocks sent to multiple secondary databases. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DATABASE_FLOW_CONTROL Waiting for messages to be sent to the partner when the maximum number of queued messages has been reached. Indicates that the log scans are running faster than the network sends. This is an issue only if network sends are slower than expected. Applies to: SQL Server 2012 through SQL Server 2014.
  76. 76.  HADR_DATABASE_VERSIONING_STATE Occurs on the versioning state change of an AlwaysOn secondary database. This wait is for internal data structures and is usually is very short with no direct effect on data access. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DATABASE_WAIT_FOR_RESTART Waiting for the database to restart under AlwaysOn Availability Groups control. Under normal conditions, this is not a customer issue because waits are expected here. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING A query on object(s) in a readable secondary database of an AlwaysOn availability group is blocked on row versioning while waiting for commit or rollback of all transactions that were in-flight when the secondary replica was enabled for read workloads. This wait type guarantees that row versions are available before execution of a query under snapshot isolation. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DB_COMMAND Waiting for responses to conversational messages (which require an explicit response from the other side, using the AlwaysOn conversational message infrastructure). A number of different message types use this wait type.:Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DB_OP_COMPLETION_SYNC Waiting for responses to conversational messages (which require an explicit response from the other side, using the AlwaysOn conversational message infrastructure). A number of different message types use this wait type. :Applies to: SQL Server 2012 through SQL Server 2014.
  77. 77.  HADR_DB_OP_START_SYNC An AlwaysOn DDL statement or a Windows Server Failover Clustering command is waiting for serialized access to an availability database and its runtime state. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DBR_SUBSCRIBER The publisher for an availability replica event (such as a state change or configuration change) is waiting for exclusive read/write access to the runtime state of an event subscriber that corresponds to an availability database. Internal use only. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DBR_SUBSCRIBER_FILTER_LIST The publisher for an availability replica event (such as a state change or configuration change) is waiting for exclusive read/write access to the list of event subscribers that correspond to availability databases. Internal use only. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_DBSTATECHANGE_SYNC Concurrency control wait for updating the internal state of the database replica. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_FILESTREAM_BLOCK_FLUSH The FILESTREAM AlwaysOn transport manager is waiting until processing of a log block is finished. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_FILESTREAM_FILE_CLOSE The FILESTREAM AlwaysOn transport manager is waiting until the next FILESTREAM file gets processed and its handle gets closed. :Applies to: SQL Server 2012 through SQL Server 2014.
  78. 78.  HADR_FILESTREAM_FILE_REQUEST An AlwaysOn secondary replica is waiting for the primary replica to send all requested FILESTREAM files during UNDO. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_FILESTREAM_IOMGR The FILESTREAM AlwaysOn transport manager is waiting for R/W lock that protects the FILESTREAM AlwaysOn I/O manager during startup or shutdown. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_FILESTREAM_IOMGR_IOCOMPLETION The FILESTREAM AlwaysOn I/O manager is waiting for I/O completion. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_FILESTREAM_MANAGER The FILESTREAM AlwaysOn transport manager is waiting for the R/W lock that protects the FILESTREAM AlwaysOn transport manager during startup or shutdown. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_GROUP_COMMIT Transaction commit processing is waiting to allow a group commit so that multiple commit log records can be put into a single log block. This wait is an expected condition that optimizes the log I/O, capture, and send operations. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_LOGCAPTURE_SYNC Concurrency control around the log capture or apply object when creating or destroying scans. This is an expected wait when partners change state or connection status. Applies to: SQL Server 2012 through SQL Server 2014.
  79. 79.  HADR_LOGCAPTURE_WAIT Waiting for log records to become available. Can occur either when waiting for new log records to be generated by connections or for I/O completion when reading log not in the cache. This is an expected wait if the log scan is caught up to the end of log or is reading from disk. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_LOGPROGRESS_SYNC Concurrency control wait when updating the log progress status of database replicas. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_NOTIFICATION_DEQUEUE A background task that processes Windows Server Failover Clustering notifications is waiting for the next notification. Internal use only. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_NOTIFICATION_WORKER_EXCLUSIVE_ACCESS The AlwaysOn availability replica manager is waiting for serialized access to the runtime state of a background task that processes Windows Server Failover Clustering notifications. Internal use only. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_NOTIFICATION_WORKER_STARTUP_SYNC A background task is waiting for the completion of the startup of a background task that processes Windows Server Failover Clustering notifications. Internal use only. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_NOTIFICATION_WORKER_TERMINATION_SYNC A background task is waiting for the termination of a background task that processes Windows Server Failover Clustering notifications. Internal use only. :Applies to: SQL Server 2012 through SQL Server 2014.
  80. 80.  HADR_PARTNER_SYNC Concurrency control wait on the partner list. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_READ_ALL_NETWORKS Waiting to get read or write access to the list of WSFC networks. Internal use only. Note –The engine keeps a list of WSFC networks that is used in dynamic management views (such as sys.dm_hadr_cluster_networks) or to validate AlwaysOn Transact-SQL statements that reference WSFC network information. This list is updated upon engine startup, WSFC related notifications, and internal AlwaysOn restart (for example, losing and regaining of WSFC quorum). Tasks will usually be blocked when an update in that list is in progress. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_RECOVERY_WAIT_FOR_CONNECTION Waiting for the secondary database to connect to the primary database before running recovery. This is an expected wait, which can lengthen if the connection to the primary is slow to establish. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_RECOVERY_WAIT_FOR_UNDO Database recovery is waiting for the secondary database to finish the reverting and initializing phase to bring it back to the common log point with the primary database. This is an expected wait after failovers.Undo progress can be tracked through the Windows System Monitor (perfmon.exe) and dynamic management views. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_REPLICAINFO_SYNC Waiting for concurrency control to update the current replica state. Applies to: SQL Server 2012 through SQL Server 2014.
  81. 81.  HADR_SYNC_COMMIT Waiting for transaction commit processing for the synchronized secondary databases to harden the log. This wait is also reflected by the Transaction Delay performance counter. This wait type is expected for synchronized availability groups and indicates the time to send, write, and acknowledge log to the secondary databases. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_SYNCHRONIZING_THROTTLE Waiting for transaction commit processing to allow a synchronizing secondary database to catch up to the primary end of log in order to transition to the synchronized state. This is an expected wait when a secondary database is catching up. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_TDS_LISTENER_SYNC Either the internal AlwaysOn system or the WSFC cluster will request that listeners are started or stopped. The processing of this request is always asynchronous, and there is a mechanism to remove redundant requests. There are also moments that this process is suspended because of configuration changes. All waits related with this listener synchronization mechanism use this wait type. Internal use only. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_TDS_LISTENER_SYNC_PROCESSING Used at the end of an AlwaysOn Transact-SQL statement that requires starting and/or stopping an availability group listener. Since the start/stop operation is done asynchronously, the user thread will block using this wait type until the situation of the listener is known. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_TIMER_TASK Waiting to get the lock on the timer task object and is also used for the actual waits between times that work is being performed. For example, for a task that runs every 10 seconds, after one execution, AlwaysOn Availability Groups waits about 10 seconds to reschedule the task, and the wait is included here. Applies to: SQL Server 2012 through SQL Server 2014.
  82. 82.  HADR_TRANSPORT_DBRLIST Waiting for access to the transport layer’s database replica list. Used for the spinlock that grants access to it. :Applies to: SQL Server 2012 through SQL Server 2014.  HADR_TRANSPORT_FLOW_CONTROL Waiting when the number of outstanding unacknowledged AlwaysOn messages is over the out flow control threshold. This is on an availability replica-to-replica basis (not on a database-to-database basis). Applies to: SQL Server 2012 through SQL Server 2014.  HADR_TRANSPORT_SESSION AlwaysOn Availability Groups is waiting while changing or accessing the underlying transport state. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_WORK_POOL Concurrency control wait on the AlwaysOn Availability Groups background work task object. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_WORK_QUEUE AlwaysOn Availability Groups background worker thread waiting for new work to be assigned. This is an expected wait when there are ready workers waiting for new work, which is the normal state. Applies to: SQL Server 2012 through SQL Server 2014.  HADR_XRF_STACK_ACCESS Accessing (look up, add, and delete) the extended recovery fork stack for an AlwaysOn availability database. Applies to: SQL Server 2012 through SQL Server 2014.
  83. 83. Always on working Scenarios SQL Server 2012 has been released with a new license model. With SQL Server 2012 AlwaysOn’s ability to have multiple secondary's you need to take into account the licensing when you are going to be implementing multiple secondary's. The license model requires you to license your Active (Primary) SQL Server in your AlwaysOn Cluster. You are allowed one Passive (Secondary) server that you do not need to license. If you have more than one secondary server, you need to license that server whether it is active or passive Licensing https://www.derekseaman.com/2014/09/sql-2014-always-ag-pt-1-introduction.html
  84. 84. Question & Answers SQL Server Licensing Questions  Q: If using virtual machines and clustering / failing over at that level (not sql server) is there any reason that SQL Server Standard Edition won’t work? Someone once told us in a sql class that Enterprise Edition was necessary for this.  Answer from Brent: don’t you just love those “someone once told us” things? You’ll want to get them to tell you why. Standard Edition works fine in virtual machines. It may not be cost-effective once you start stacking multiple virtual machines on the same host, though, because you have to pay for Standard Edition for every guest.  Q: Hi, with mirroring being deprecated and AlwaysOn AG only available with Enterprise Edition – what are our HA options going to be with Standard Edition in the future? Any ideas if AlwaysOn synchronous will make it into Standard?  Answer from Jeremiah: You have a few HA choices with SQL Server 2012 Standard Edition and beyond. Even though mirroring is deprecated, you could feasibly use mirroring in the hope that something new will come out. Obviously, this isn’t a viable option. The other HA option is to use clustering. SQL Server Standard Edition supports 2 node clusters, so you can always use it for HA.
  85. 85. How to Manage AlwaysOn Availability Groups  Q: Did you experience or know “split brain scenario” in AlwaysOn Availability Groups that when secondary node is up to take over primary role, the transaction becomes inconsistent? And how to avoid it?  Answer from Brent: Ooo, there’s several questions in here. First, there’s the concept of split brained clusters – when two different database servers both believe they’re the master. Windows Server Failover Clustering (WSFC) has a lot of plumbing built in to avoid that scenario. When you design a cluster, you set up quorum voting so that the nodes work together to elect a leader. In theory, you can’t run into a split brain scenario automatically – but, you can most definitely run into it manually if you go behind the scenes and change cluster settings. The simple answer here: education. Learn about how the quorum process works, learn the right quorum settings for the number of servers you have, and prepare for disaster ahead of time. Know how you’ll need to react when a server (or an entire data center) goes down. Plan and script those tasks, and then you can better avoid split brain scenarios.  Q: Can you recommend any custom policies for monitoring AlwaysOn? Or do the system policies provide thorough coverage? Thank you!  Answer from Brent: I was a pretty hard-core early adopter of AlwaysOn Availability Groups because I had some clients who needed it right away. In that situation, you have to go to production with the monitoring you have, not the monitoring you want. The built-in stuff just wasn’t anywhere near enough, so most of my early adopters ended up rolling their own. StackOverflow’s about to share some really fun stuff there, so I’d keep an eye on Blog.ServerFault.com. You should also evaluate SQL Sentry 7.5’s new AlwaysOn monitoring – it’s the only production monitoring I’m aware of, although I know all the other developers are coming
  86. 86.  Q: Is it wise to have primary availability groups in one server of the nodes and have primary groups on another of the servers that form the cluster. Or is it better to have all primary groups on server 1 and secondary on server 2?  Answer from Brent: If you split the primaries onto two different nodes, then you can do some load balancing.  Q: Would you consider Always-ON AG Read-Only replicas as a method to offload or load balance reporting? Looks like the Read Intent option acts like a load balancer for reading off of those DBs, right?  Answer from Brent: Offload yes, load balance no. The read intent options give you the ability to push read-only queries to a different replica, but there’s no load balancing. Your clients just hit the first server in the list. If you need true load balancing, you’ll want to put all of the read-only replicas behind a real load balancing appliance. Sharding and Mirroring Questions  Q: I have a peer to peer replication with 3 nodes (all bidirectional). Very beneficial but a big pain to maintain. Is that what the industry feels?  Answer from Jeremiah: SQL Server peer-to-peer replication solves a very specific need – the ability to have multiple active SQL Servers where writes can occur and where you can have near real-time updates to the other servers. While peer-to-peer replication meets that need, it has a relatively heavy price tag in terms of DBA expertise, support, and licensing costs. Even experienced teams want to have multiple DBAs on staff to deal with on call rotations and, let’s face it, while peer-to-peer replication hasn’t been deprecated, it’s a difficult feature to work with.
  87. 87.  Q: I’ve implemented db sharding on Oracle in several environments. Is there an applicable tech in SQL Server?  Answer from Jeremiah: Sharding is just a buzzword for horizontal partitioning. In a sharded database, either the application or a load balancing router/reverse proxy is aware of the sharding scheme and sends reads and writes to the appropriate server. This can be accomplished with SQL Server, Oracle, MySQL, or even Access. There are no technologies from Microsoft and I’d be wary of anyone attempting to sell something that Just Works® – database sharding is time consuming, requires deep domain knowledge, and adds additional database overhead.  Q: Currently using SQL 2008 Mirroring. Planning a move to 2012. Your thoughts about jumping 2012 and going straight to 2014 Always On technologies?  Jes here. There were no major changes to Database Mirroring in SQL Server 2012, and I don’t foresee any coming in 2014. Eventually – we don’t have a specific version yet – Mirroring will be deprecated. Read our AlwaysOn Availability Groups Checklist to get an idea of the work involved in setting these up – it’s much more complicated than Mirroring – before you decide to jump in.
  88. 88. IT’S QUESTION(S) TIME

×