This document discusses business continuity and disaster recovery capabilities for SQL Azure Database. It describes the different service tiers (Basic, Standard, Premium) and their features like point-in-time restore, geo-restore, standard geo-replication, and active geo-replication. It also covers demonstrations of changing service tiers, point-in-time restores, geo-restores, creating replicated secondaries, and performing database failovers. Pricing and capabilities differ based on the service tier.
2. Š 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other
countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond
to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the
date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION
Sourabh K. Agarwal
Sr. Premier Field Engineer
Blogs : www.SQLUninterrupted.com
Twitter: @SQLSourabh
Facebook â https://www.facebook.com/groups/SQLBangalore/
3. SQL Azure
Database
Service Tiers
⢠5 DTUâs
⢠Max 2 GB DatabaseBasic
⢠S0, S1, S2, S3
⢠10-100 DTUâs
⢠250 GB Max DB Size
Standard
⢠P1, P2 and P3
⢠125-1000** DTUâs
⢠500 GB Max DB Size
Premium
6. Local
Redundancy
Reads are completed at the primary
Writes are replicated to secondary's
DB
Single Logical
Database
⢠Transparent automatic failover
⢠Uptime SLA of 99.95%
P
SS WriteWrite
AckAck
ReadValue Write
Ack
7. High
availability
under the
hood
â˘Critical capabilities:
ďźCreate new replica
ďźSynchronize data
ďźStay consistent
ďźDetect failures
ďźFail over
Primary Manager Node
Partition Manager
Fabric
Data Node
103
P
S
S
Data Node
104
S
S
Data Node
102
P
S
S
Data Node
105
P
S
S
Data Node
101
S
P
P
P
S S
S
Global
Partition
Map
Node down
Which
replica
lost?
Promotetoprimary
Reconfigure
S
8. Point In Time Restore
⢠Automatic Backup
â Full backups once a week, diff backup once a day,
log backups every 5 min
â Backups automatically uploaded to geo-redundant Azure
Storage on a daily basis.
⢠Self-service restore
â REST API, PowerShell or Portal
â **Creates a new database in the same logical server**
⢠Tiered Retention Policy
â Basic - 7 days, Standard - 14 days, Premium - 35 days
sabcp01bl21
sabcp02bl21
sabcp03bl21
Restore as a
new database
from local
backups
LS XYZ
Copy backups to Azure Storage
DB
DB1
RA-GRS
Backups
Backups
10. Geo-restore
US East
US West
sabcp01bl21
sabcp02bl21
sabcp03bl21
LS ABC
Restore to any
server when
needed
US West
DB
sabcp01bl21
sabcp02bl21
sabcp03bl21
LS XYZ
Automatic
copies of daily
backups
DB
RA-GRS
RA-GRS
⢠Self-service restore API
⢠Restores last daily backup
⢠ERT < 12h, RPO < 1h
⢠No extra cost, no capacity guarantee
⢠Database URL will change after restore
12. Standard Geo-replication
East US
US West
LS ABC
Self-service
activation of
secondary
(during incident) West US
DB
LS XYZ
DB
⢠ERT < 30s, RPO < 5s
⢠REST and PowerShell API to opt-in and failover
⢠Automatic data replication and synchronization
⢠DMV+REST to monitor and guide failover decisions
⢠Single secondary in the DR paired region with matching performance level
⢠No Choice in choosing the Region (Pre-Defined DR Paired Regions)
13. Active Geo-replication
LS ABC
South Central US
West US
Self service
failover to the
secondary
database
East US
DB1
LS XYZ LS OPQ
⢠ERT < 30s, RPO < 5s
⢠REST and PowerShell API to opt-in and failover
⢠DMV+REST to monitor and guide failover decisions
⢠Automatic data replication and synchronization
⢠User controlled placement of up to 4 secondary's
⢠Creates secondary database with matching performance level
⢠Choice of Server/Region is with the customer
DB1 DB1
16. BCDR
Scenarios in
Service tiers
Scenario Basic Standard Premium
Local failures ďź ďź ďź
Azure DB upgrades and maintenance ďź ďź ďź
Accidental data corruption ďź ďź ďź
Regional disaster ďź ďź ďź
DR Drill ďź ďź ďź
Online application upgrade ďź ďź
Online application relocation ďź
Load balancing ďź
17. BCDR
Capabilities
Basic Standard Premium
Uptime SLA's 99.99%
Performance Benchmarks 16,600 txns/hour
upto 5100
txns/min
up to 735 txns/sec
Point In Time Restore
Up to millisecond
within last 7 days
Up to millisecond
within last 14
days
Up to millisecond
within last 35 days
Geo Restore
ERT < 12h, RPO <
1h
ERT < 12h, RPO <
1h
ERT < 12h, RPO <
1h
Standard Geo Replication (One
Non-Readable Secondary)
Not Available
ERT < 30s, RPO <
5s
ERT < 30s, RPO < 5s
Active Geo Replication (upto 4
readable secondaries, in diff
regions)
Not Available Not Available ERT < 30s, RPO < 5s
18. BCDR Pricing Basic Standard Premium
Point In Time Restore No Additional Cost (**)
Geo Restore No Additional Cost (**)
Standard Geo Replication (One
Non-Readable Secondary)
Not Available
75% of Primary (Same Performance
Level)
Active Geo Replication (upto 4
readable secondaries, in diff
regions)
Not Available Not Available
Same as Primary,
per secondary
Performance Objectives for each of the Tiers
Basic â Transactions Per Hour
Standard â Transaction Per Minute
Premium â Transactions per second
On Our Azure Documentation, the P3 tier indicates 1000 DTUâs. But while configuring the tier on the Portal, they only show up as 100, 200 and 800.
Steps in the Demo
Create a Sample database in the basic tier using the Portal
Change the service tier to standard S2 using the Portal
Change once more to P1 using PowerShell
Use PowerShell to show the change in progress (optional).
Highlight the pricing point -- If you create a Basic database and then immediately upgrade it to Standard S1 you will be charged at the Standard S1 rate for the first hour. If you delete a database and then create another with the same name, your bill will reflect a charge for two separate databases within that hour.
http://azure.microsoft.com/en-us/documentation/articles/sql-database-business-continuity/
Human Error
Site Disasters
DR Drill
Updates/Upgrades
http://blogs.msdn.com/b/jackgr/archive/2011/10/22/high-availability-on-the-azure-platform.aspx
Local data redundancy and operational recovery are standard features for Azure SQL Database. Each database possesses one primary and two local replica databases that reside in the same datacenter, providing high availability within that datacenter.
Every database has three replicas: one primary and two secondary's. All reads and writes go to the primary, and all writes are replicated synchronously to the secondary's. Also, every transaction commit requires a quorum, where the primary and at least one of the secondaries must confirm that the log records are written before the transaction can be considered committed. Most production data centers have hundreds of SQL Server instances, so it is unlikely that any two databases with primary replicas on the same machine will have secondary replicas that also share a machine.
What is Point in Time Restore?
The Azure SQL Database service protects all databases with an automated backup system. These backups are retained for 7 days for Basic, 14 days for Standard and 35 days for Premium. Point-in-time restore is a self-service capability, allowing customers to restore a Basic, Standard or Premium database from these backups to any point within the retention period. Point-in-time restore always creates a new database.
The database backups are taken automatically with no need to opt-in and no additional charges. You only incur additional cost if you use the restore capability. The new database created by restore is charged at normal database rates.
Together, the automated backup system and point-in-time restore provide a zero-cost, zero-admin way to protect databases from accidental corruption or deletion, whatever the cause.
Â
Understanding Automatic Backups
All Basic, Standard, and Premium databases are protected by automatic backups. Full backups are taken every week, differential backups every day, and log backups every 5 minutes. The first full backup is scheduled immediately after a database is created. Normally this completes within 30 minutes but it can take longer. If a database is âborn bigâ, for example if it is created as the result of database copy or restore from a large database, then the first full backup may take longer to complete. After the first full backup all further backups are scheduled automatically and managed silently in the background. Exact timing of full and differential backups is determined by the system to balance overall load.
Backup files are stored locally in the same data center as your databases with local redundancy. When you restore a database, the required backup files are retrieved and applied. The latest weekly and daily backups are also copied to the paired region in the same geo-political area for disaster recovery purpose.
Use the portal and PowerShell to show the last restorable time and the current time.
Use Portal to perform a Point in Time restore. Deleted Databases can only be restored on their original servers.
Use PowerShell to restore to a Point in time.
Backup files are stored locally in the same data center as your databases with local redundancy. When you restore a database, the required backup files are retrieved and applied. The latest weekly and daily backups are also copied to the paired region in the same geo-political area for disaster recovery purpose.
The time taken to restore the database depends on many factors, including the size of the database, the time point selected, and the amount of activity that needs to be replayed to reconstruct the state at the selected point. For a very large and/or active database restore may take several hours.
Restoring a database always creates a new database on the same server as the original database (unless doing a Geo-Restore), so the restored database must be given a new name. The database is restored using the service tier that was applicable at the restore point with its default performance level. You need to ensure you have sufficient DTU quota on the server bearing in mind that the restore creates a new database and that the service tier and performance level of the restored database may be different to the current state of the live database. Once complete, the restored database is a normal fully accessible online database charged at normal rates based on its service tier and performance level.
If you are restoring the database for recovery purposes you can treat the restored database as a replacement for the original database, or use it to retrieve data from and then update the original database.
If the restored database is intended as a replacement for the original database you should verify the performance level and/or service tier are appropriate and scale the database if necessary. You can rename the original database and then give the restored database the original name using the ALTER DATABASE command in T-SQL.
Perform Geo Restore using Portal.
Sign in to Azure Management Portal using your Microsoft account and select SQL Databases.
On the SQL Databases page, select SERVERS.
Select the server that contains the database you want to restore.
On the server page, select BACKUPS.
On the BACKUPS page, select the database you want to restore.
At the bottom of the page, click Restore.
Specify a new database name in the Database Name field.
Specify the target server name. The target server is the server you want to contain the database after restore.
Click Submit to submit the restore request.
DR Paired Regions - http://msdn.microsoft.com/en-us/library/azure/dn758204.aspx
The failover cannot be done manually. The Failover is controlled internally by the Azure system and is initiated only when the Primary Data center is down.
The only thing which can be tested is stopping the replication from the primary or from secondary (and the only option is to Stop immediately, resulting in data loss).
Due to the high latency of wide area networks, continuous copy uses an asynchronous replication mechanism. This makes some data loss unavoidable if a failure occurs. However, some applications may require no data loss. To protect these critical updates, an application developer can call the sp_wait_for_database_copy_sync system procedure immediately after committing the transaction. Calling sp_wait_for_database_copy_sync blocks the calling thread until the last committed transaction has been replicated to the online secondary database. The procedure will wait until all queued transactions have been acknowledged by the online secondary database. sp_wait_for_database_copy_sync is scoped to a specific continuous copy link. Any user with the connection rights to the primary database can call this procedure.
Unlike the local HA replication model, geo-replication from the primary to the secondary is asynchronous. Transactions applied to the primary are copied to and applied to the secondary but the primary is not blocked while waiting for this to occur. Changes are buffered making the replication system resilient to temporary connection problems or high-latency when replicating to a distant location.
Replication relationships are manually managed, allowing you to terminate a relationship at any point. If you terminate from the primary then you can choose whether to terminate immediately and lose any pending transactions or to terminate after applying all pending transactions.
In the case of a datacenter outage affecting the primary, failover is still a manual task, allowing you full control of if and when this is done. Terminating the relationship is done from a secondary database as the primary database will be unavailable. Terminating from the secondary is always immediate and will lose any transactions that had not been replicated at the point the primary became unavailable. How much if any data you might lose will depend on how active the primary was at the point it failed and what if any buffering of transactions was occurring across the connection.  A decision to terminate the replication relationship should balance your concern for possible data-loss and your desire to get applications back up again.
Use management Portal to create a secondary for the (StandardS2 Databases running in Standard Tier). Show that the secondary region is automatically chosen. If there are any servers in that region which belong to the user, that server can be used, else a new server would need to be created.
Also show that for the Active Geo Replication, there is a choice for choosing the Region.
User PowerShell to configure Active-Geo Replication for the PrimiumP2 Database. Create two secondary databases one in the West Europe Region and another in the Brazil South Region.
Azure Documentation ď http://azure.microsoft.com/en-us/documentation/articles/sql-database-business-continuity/
Benchmark Studies ď https://msdn.microsoft.com/en-us/library/azure/dn741327.aspx
Estimated Recovery Time (ERT): The estimated duration for the database to be fully available after a restore or failover request.
Recovery time objective (RTO) â maximum acceptable time before the application fully recovers after the disruptive event. RTO measures the maximum loss of availability during the failures.
Recovery point objective (RPO) â maximum amount of last updates (time interval) the application can lose by the moment it fully recovers after the disruptive event. RPO measures the maximum loss of data during the failures.
http://azure.microsoft.com/en-us/pricing/details/sql-database/
Backup storage is the storage associated with your automated database backups that are used for Point-In-Time-Restore and Geo-Restore. Microsoft Azure SQL Database provides up to 200% of your maximum provisioned database storage of backup storage at no additional cost. For example, if you have a Standard DB instance with a provisioned DB size of 250 GB, you will be provided with 500 GB of backup storage at no additional charge. If your database exceeds the provided backup storage, you can choose to reduce the retention period by contacting Azure Support or pay for the extra backup storage billed at standard Read-Access Geographically Redundant Storage (RA-GRS) rate. For more information on RA-GRS billing, see Storage Pricing Details.
Stimulate a scenario to show how to perform a failover to a Active Geo Secondary.
Use PowerShell
Bring down the Primary Database
Stop the Continuous copy
Make the Secondary Database as primary.
Connect to the secondary database.