Welcome everyone to todays webcast, my name is David Ring and I work as part of the strategic solutions engineering group, working on Microsoft midrange applications. I am joined by Michael Morris, who also worked on this solution.
Our presentation details the solution:
EMC MULTISITE DISASTER RECOVERY FOR MICROSOFT SQL SERVER 2012 enabled by
EMC VNX5700
EMC FAST Cache
SQL Server 2012 AlwaysOn Availability Groups
During this presentation we will cover the following topics:
EMC Proven Solutions
SQL Server 2012 overview
Solution overview
Architecture design of the solution
test results
Summary
Followed by a
Q&A Session
Proven solutions are based on real-world requirements that are based on customer demand and feedback. EMC designs and tests proven solutions that are based on emerging technologies and demonstrates the best way to combine these technologies, and design useable and cost-effective architectural solutions.
By applying strict feasibility guidelines and reviews, EMC can define use cases that answer the challenges that customers are facing. Our job is to champion the customer and test solutions you would like.
As part of our solution we create a solutions pack which consists of :
White papers
Articles posted on ECN
AND
DEMOS which are published to our EMC Proven Solutions YOUTUBE Channel
This slide details all of SQL Server 2012âS new features.
The feature we are showcasing in this presentation is SQL Server 2012 AlwaysOn Availability Groups. Microsoft has provided critical enhancements to High Availability with the introduction of new AlwaysOn features, particularily AlwaysOn Availabiliy Groups which provide the next evolution for SQL Server transactional replication.
SQL Server offers administrators several options to configure high availability for both servers and databases. These high availability configurations have until now included:
Database mirroring
AND
Log shipping
SQL Server 2012 introduces two high availability configurations as part of SQL Server AlwaysOn, which provides availability at either the application database or instance level:
AlwaysOn Failover Clusteringâfor instance level protection
AND
AlwaysOn Availability Groupsâfor database level protection
As stated: SQL Server High Availability and Disaster Recovery can be implemented at SQL Server database level or SQL Server instance level.
A database-level High Availability and Disaster Recovery feature provides more flexibility in managing which databases should, or should not, be moved to the secondary server. AlwaysOn Availability Group is an example of a database-level solution.
SQL Server 2012 AlwaysOn Failover Cluster is an example of a SQL Server instance level solution.
Before SQL Server 2012, having too many HA features in SQL Server could be confusing to customers. You may wonder which solution is better for your application and what are the pros and cons for each HA solution.
With AlwayON microsoft has evolved its HA features, simplifying the choice for customers . For database level protection microsoft recommends the use of Avaialabilty Groups over traditional Log Shipping and Database mirroring.
With the AlwaysON Failover Cluster a single SQL Server instance is installed across multiple Windows Server Failover Cluster nodes. WSFC functionality provides high availability at the instance level, by presenting a failover cluster instance to the network as a single computer accessible through the clusterâs virtual name. This configuration is an enhancement to the SQL Server FCI functionality available in previous versions of SQL Server.
It is very much like todays FCI but more resilient in terms of varying networks.
Our current testing involves using ALWAYSON FCI with Recoverpoint and Cluster enabler.
Significant improvements have been delivered to the multisite failover clustering technology making it a viable option for HADR for many use cases and specifically multi-subnet failover clustering implementation.
Two major enhancements which support multi-subnet clustering are:
1. Cluster Setup supportâcan intelligently detect a multi-subnet environment, and automatically set the IP address resource dependency to OR. as shown in the slide
2. SQL Server Engine supportâTo bring the SQL Server resource online, the SQL Server Engine startup logic skips binding to any IP address that is not in an online state.
Moving on to AlwaysAn Availabilty Groups. SQL Server 2012 Availability Groups are similar in concept to an EXCHANGE DAG type implementation.
AlwaysOn Availability Groups support a failover environment for a specific set of user databases, known as availability databases, these databases failover together.
Like AlwaysOn Failover Clustering, AlwaysOn Availability Groups require the SQL Server instances to be configured on nodes of the same cluster, but with the instances remaining and being presented to the network as separate computers.
Availability groups support a set of primary databases and one to four sets of corresponding secondary databases. An availability group fails over at the level of an availability replica and, optionally, secondary databases can be made available for read-only access and some backup operations.
Availability groups consist of a set of two or more failover partners referred to as availability replicas. Each availability replica is hosted on a separate instance of SQL Server
Each availability replica hosts a copy of the availability databases in the availability group. Each availability replica is assigned an initial role as either the primary role or the secondary role
The purpose of this solution was to showcase the ability of the EMC VNX storage array to easily support heavy SQL Server OLTP workloads and
To characterize a geographically dispersed SQL Server 2012 environment protected by AlwaysOn technology, and highlight multi-subnet support at both synchronous and asynchronous distances.
EMC VNX5700 storage array offers a simple, efficient, and powerful platform for enterprise-class SQL Server 2012 infrastructures.
The testing of this solution validated the ability of the VNX5700 storage array to support SQL Server 2012 instances running OLTP-like workloads that generated over 50,000 IOPS.
This slide shows the overall physical architecture of the environment.
We had 2 SQL Server physical instances:
One production SQL Server instance which is the primary replica
And One read-only secondary SQL Server instance which is the secondary replica
We had Four mission-critical, active OLTP databases, totaling 1.8 TB of data, that are replicated to the secondary site using SQL Server 2012 AlwaysOn Availability Groups.
This solution was provisioned by the VNX5700 with 641 GB of FAST Cache of which 60% was hot and featured the AlwaysOn Availability Group replica on the secondary site.
EMC FAST Cache technology automatically placed the most frequently accessed data on high-performing Flash drives.
The solution was based on a multi-subnet environment and tested at synchronous and asynchronous distances of 80 km, 800 km, and 4,000 km.
Tests involved:
Comparing AlwaysOn Availability Groups in the following availability modes:
Synchronous-commit mode with Automatic failover
Synchronous-commit mode with Manual failover
Asynchronous-commit mode with Forced failover
This slide shows the SQL Server layout.
As you can see the four databases totaled 180,000 users
We had:
1 * 50GB 5,000 user database
1 * 250GB 25,000 user database
1 * 500GB 50,000 user database and
1 * 1TB 100,000 user database
The OLTP-like workload generated 50,000 IOPS, with a read/write ratio of approximately 9:1
In this slide you can see our Production array storage configuration.
In this solution transaction logs and Tempdb files are segregated to dedicated spindles, hosting traditional RAID 1/0 RAID groups.
The best practice for SQL Server log files is to use RAID 1/0. Therefore, the configuration of a Raid Group with 8 x 2.5â 10k SAS drives was best suited for the log file location.
Also, it is best practice to isolate the log from data at the physical disk level.
Performance may also benefit if Tempdb is placed on RAID 1/0 configuration. Because Tempdb has a high write rate - RAID1/0 is the best configuration to use.
The virtually provisioned pool was created with RAID 5 protection. This was created as a homogeneous pool with 40 SAS drives. With 40 drives for a RAID 5 pool, Virtual Provisioningâą creates eight five-drive (4+1) RAID groups.
Consideration was also given to the impact of FAST Cache in significantly reducing the volume of mechanical spinning disks required by VNX storage arrays to service the target workload.
The DR Storage configuration had the same configuration as production minus FAST CACHE.
This slide shows the storage design for SQL Server at the production site. The DR design was a copy of this layout.
Best practices for SQL Server were followed in laying out our storage. Reasons for having multiple files per database are:
Very active databases perform better with multiple datafiles
And also
spreading the datafiles across disks in the pool helps to avoid contention.
As can be seen from the database file design, data files should be of equal size for each OLTP database. This is because SQL Server uses a proportional fill algorithm that favors allocations in files with more free space.
We will now go through our results from the solution.
Throughput was measured using the Microsoft Performance Monitor counter: Avgerage Disk Transfers per second. (IOPS)
The primary replica is represented by the yellow line. And:
The secondary replica is represented by the blue line.
During baseline testing with 40 SAS disks in the storage pool, transactional I/O throughput on the primary replica produced approximately 11,500 IOPS and the secondary replica produced 1,400 IOPS.
After 30 minutes with FAST Cache enabled on the storage pool, we saw an immediate effect on performance. I/O throughput increased to over 19,000 IOPS on the primary replica and to 2,300 on the secondary.
After just two hours of FAST Cache running, we saw throughput increase to over 50,000 IOPS on the primary replica, while at the same time providing amazing low database latency of no more than 3 ms for reads and 2 ms for writes.
During this period of FAST CACHE being in a steady state we changed the mode from synchronous to asynchronous and increased the distance from 80km to 4000kmâs. Latency was maintained at <3ms and IOPS increased slightly as we removed the impact of maintaining a synchronous state.
Â
As an example of the read/writes being replicated between the primary and secondary replicas, perfmon counters were analyzed for a point in time during the FAST Cache steady state for synchronous-commit mode at 80 km for the 1 TB OLTP_1 database, as shown in this slide.
As you can see the Primary replica is on the left and the secondary on the right.
It can be seen that only 3.7 percent of the primary replica read activity occurs on the secondary replica, compared to 89.51 percent of the write activity. During this period, transactions per second (TPS) for both primary and secondary replicas was 119. This highlights that, with no read access on the secondary replica, the major activity on the secondary is the writes being replicated.
As shown in this slide, there is negligible impact on SQL Server CPU utilization when synchronous-commit mode was used up to 80 km. A small rise of 4 percent CPU percentage utilization occurred by using asynchronous-commit mode up to 4,000 km. In all synchronization states, minimal CPU utilization occurs on the secondary replica as no additional activity is occurring on the secondary replica databases.
Here is a graphical representation of our Perfmon data for Disk transactions per sec (TPS) for both primary and secondary replicas
THESE results were taken during the same test as the IOPS slide.
The slide shows the transactional performance boost received from the introduction of EMC FAST Cache to our environment.
Â
The ability to service Transaction PER second increased from 4,900 to over 25,000 TPS on the production databases.
Using our EMC VNX Unisphere Analyser, our Peformance Analysis tool, we could see how the storage pool was initially I/O bound, having reached the limit of its ability to service I/O requests.
Â
Initial disk utilization on the storage pool hosting the primary replica on production site was too high at 90 percent. Which is represented by the yellow line.
We see minimal impact on secondary during initial baseline.
Improvements were seen after EMC FAST Cache was enabled. FAST Cache was able to reduce pressure on the SAS pool because frequently accessed data from the pool was placed in cache. After a two-hour warm up period, disk utilization on production reduced from 90 percent to just 57 percent.
As FAST Cache boosts storage performance for SQL Server 2012, allowing the primary replica to service increased I/O levels, pressure on the storage pool hosting the data files for the secondary replica also increases as the writes on the primary are replicated to the secondary. This highlights the importance of correctly sizing the secondary replicas storage.
THE VNX5700 Storage processor utilization was measured after analyzing the Unisphere NAR files.
In this graph we see the SP Utilizations being represented by red and green for production and blue and yellow for DR.
The results for the production storage array show how SPA and SPB storage processor utilization increases as the array works to automatically boost performance through EMC FAST Cache technology.
AS:
The SPs are analyzing and promoting the frequently accessed data.
SP utilization for the DR storage array increases slightly as disk utilization rises due to the increase in write data being replicated from primary to secondary databases.
Creation of the availability groups can be done through scripting or in SQL Server Management Studio by either completing details for a New Availability Group or following the New Availability Group Wizard. The wizard also has an option to generate the script during the steps.
To test creation times we generated a script file to create the availability groups.
This slide shows a simplified process flow for creation of an availability group. During testing it was found that if adding multiple databases, it was best to backup, restore, and add secondary replicas to the availability groups one at a time before looping back to add additional databases.
An important consideration during Availability group creation is provisioning a shared storage space.
A shared storage space is required when performing a Full Initial Data Synchronization as part of availability group creation. In order to minimize duration of the seeding and reseeding process, users should consider the storage used for these database backups and restorations.
The backup process has a high bandwidth requirement from storage, as it is a sequential write/read workload. RAID 1/0 would best suit the seeding/reseeding of databases for AlwaysOn Availability Group creation.
The following table outlines database creation times at:
80 km synchronous-commit
800 km asynchronous-commit mode
4,000 kmasynchronous-commit mode
Note: These timing are for databases already populated and running a full workload of approximately 50,000 IOPS.
The timings demonstrate that, as expected, as distance increases the time taken for creation of the availability groups increases.
SQL Server 2012 AlwaysOn Availability Groups provides the flexibility to protect specific databases, individually or collectively, in either synchronous or asynchronous availability modes. These configurations allow SQL Server 2012 replicate data to a secondary replica over distance.
The solution clearly shows the ability of EMC FAST Cache to significantly boost performance of the VNX series storage array. AS Testing showed how enabling FAST Cache on a heavily utilized storage pool not only alleviated pressure, but allowed the same storage pool to service over four times the I/O - from approximately 11,000 IOPS to over 50,000 IOPS, while returning incredibly low SQL Server datafile average latency times of 3 ms and less for reads and writes. The ability of FAST Cache to automatically react to the changes in OLTP workload I/O patterns is an invaluable tool for administrators.
Results clearly show the benefits of using EMC Technology with SQL Server 2012.
Thank you all for attending. I will hand over to Michael for Q&A Session.