The document outlines a presentation by Paul Bertucci on high availability options for SQL Server databases. It provides an agenda that will discuss what high availability is, how to assess requirements, review Microsoft SQL Server high availability options like clustering, database mirroring and log shipping, and how each option provides high availability. It also includes example slides on topics like availability assessment, configuration examples, and a decision tree approach for choosing an option.
2. Paul Bertucci
• Founder Database Architechs – www.dbarchitechs.com
– Specializing in HA, Database Design, Data Architecture, Data Replication, and P&T for SQL
Server, Sybase, DB2 and Oracle
– Over 28+ years experience in Data Base industry
• Co-Author of SQL Server 2000 Unleashed! (SAMS)
• Co-Author of SQL Server 2005 Unleashed! (SAMS)
• Co-Author of SQL Server 2008 Unleashed! (SAMS) – Summer 2009 !
• Co-Author of ADO.NET in 24 hours (SAMS)
• Author MS SQL Server High Availability (SAMS)
• Author Sybase Performance & Tuning
• Author Sybase Physical DB Design
• Veritas SQL Server Performance Series
• Former Chief Data Architect Symantec Corporation
• Current Chief Architect Autodesk Corporation
pbertucci@dbarchitechs.com
Copyright 2009 – Database Architechs
www.dbarchitechs.com
3. Agenda
What is High Availability?
How do you assess your HA Requirements?
What are the MS SQL Server related options for HA?
How each option delivers HA…
Performance and Tuning is critical too – SQL Shot!
Q&A
Copyright 2009 – Database Architechs
www.dbarchitechs.com
4. Test
1. What is the quickest way to test if your SQL Server Clustering
configuration is failing over properly?
2. What is the SQL Server feature in SQL Server 2005/2008
that replaces Log Shipping?
Copyright 2009 – Database Architechs
www.dbarchitechs.com
5. What is Availability?
Application
Availability
Failure causes:
Planned Unplanned - Human
Uptime - Hardware
Downtime Downtime - Software
Recoverable Disaster
Copyright 2009 – Database Architechs
www.dbarchitechs.com
6. The cost of Un-Availability
Airline Reservation Systems - $67K to $112K per hour
ATM Service Fees - $12K to $17K per hour
Brokerage (Retail) - $5.6M to $7.3M per hour
What is your cost of downtime?
Copyright 2009 – Database Architechs
www.dbarchitechs.com
7. Across all layers of your systems
Network
Application
Middleware
Database
HARDWARE
Operating System Network Components
Servers
Disk Systems
Memory
Copyright 2009 – Database Architechs
www.dbarchitechs.com
8. Availability across planned operation
100%
Availability Goals
90%
Availability (%)
Feb 14-28 Mar 1 – Apr 15 Apr 16 – 20
Starting Date Date of Failure Days Hours Minutes MBU (minutes) TU Avail %
Period 1 2/14/2008 2/28/2008 15.00 24.00 60.00 21600.00 38.00 99.82407
Period 2 3/1/2008 4/15/2008 46.00 24.00 60.00 66240.00 68.00 99.89734
Period 3 4/16/2008 4/20/2008 5.00 24.00 60.00 7200.00 442.00 93.86111
Overall 2/14/2008 4/20/2008 66.00 24.00 60.00 95040.00 548.00 99.4234
Copyright 2009 – Database Architechs
www.dbarchitechs.com
9. Availability Continuum
Characteristic Availability Range
Extreme Availability Near zero downtime! (99.5% - 100%)
High Availability Minimal downtime (95% - 99.4%)
Standard Availability With some downtime (83% - 94%)
tolerance
Acceptable Availability Non-critical Applications (70%-82%)
Marginal Availability Non-production Applications (up to 69%)
Availability Range describes the percentage of time relative to the “planned” hours of operations
8,760 hours/year | 168 hours/week | 24 hours/day
525,600 minutes/year | 7,200 minutes/week | 1,440 minutes/day
Copyright 2009 – Database Architechs
www.dbarchitechs.com
10. Applications and Availability Extreme Availability
High Availability ATM
Zero Standard Availability
Planned eCommerce 911
Downtime email
Acceptable Availability
Inventory
Accounting Mgmt
Marginal Availability
HR
Marketing
Mailers
Zero
Unplanned
Downtime
Five 9’s (99.999%) ~ 6 minutes/year downtime
Copyright 2009 – Database Architechs
www.dbarchitechs.com
11. What do you need?
It’s as simple as 1, 2, 3 +
Step One – Launch of a brief “Phase 0” HA Assessment
Step Two – Complete an HA Primary Variables gauge
Step Three – Match your need to the optimal HA solution
Step + (optional) – Determine the ROI of the HA solution
Copyright 2009 – Database Architechs
www.dbarchitechs.com
12. Assessing HA with Primary Variables
0% 100%
Uptime Requirement
Long Short
Time to Recover
High Low
Tolerance of Recovery Time
Low High
Data Resiliency
Low High
Application Resiliency
Low High
Degree of Distributed Access/Synchronization
Often Never
Scheduled Maintenance Frequency
Low High
Performance/Scalability
Low High
Cost of Downtime ($$ lost/hr)
Low High
Cost of the High Availability Solution ($$)
Copyright 2009 – Database Architechs
www.dbarchitechs.com
13. Development Methodology
“With High Availability built in”
Assessment
Assessment
(scope)
(scope) 3. Code & Test
- Code Development/Unit Testing
- Fully integrate the HA solution with the application
0. Assessment Requirements
Requirements 4. System Test & Acceptance
- Project Planning - Full system Test/User Acceptance
- Project Sizing - Full HA Test/Validation/Acceptance
- Deliverables Identified (SOW)
- Schedules/milestones
Design
Design 5. Implementation
- Production Build/Implementation
- High-Level Requirements (scope)
- Production HA build/monitoring begins
- Estimate HA Primary Variables (gauges)
1. Requirements Code & Test
Code & Test
- Detail Requirements (process/data/technology)
- Early Prototyping (optional)
- Detailed HA Primary Variables System Test &
System Test &
- Detailed Service Level Agreements/Rqmts Acceptance
- Detailed Disaster Recovery requirements
Acceptance
2. Design Implementation
Implementation
- Detail Design (data/process/technology)
- Choose and design the matching HA
solution for the application Copyright 2009 – Database Architechs
www.dbarchitechs.com
14. Spiral/Rapid Methodology
Iterative approach 0. Initial assessment
- Project Planning
7. Implementation - Project Sizing
- Production Build/Implementation - Deliverables Identified (SOW)
- Production HA build/monitoring begins - Schedules/milestones
- High-Level Functions (scope)
Transition Inception -Estimate HA Primary
Variables (gauges)
3. Requirements
6. System Test & Acceptance - Detail Requirements
- Full system Test/User (process/data/technology)
Acceptance - Detail HA Primary Variables
-Full HA Test/Validation - Detailed SLA/Rqmts
/Acceptance - Detailed Disaster
Recovery requirements
5. Code & Test 1. High-level Rqmts/Prototyping
- Code Development/Unit Testing - High-level requirements
-Fully integrate the HA solution (process/data/technology)
with the application - High-level HA Primary Variables
Construction Elaboration/Prototype
2. Early Code & Test 4. Design
- Early code and testing of apps/DI - Detail Designs (process/data/technology)
(process/data/technology) Copyright 2009 – Database Architechs - Choose and design the matching HA solution
- Prototyping of HA options www.dbarchitechs.com for the application (verified via prototypes)
15. Valid High Availability Options
Disk Other Cluster Data SQL DB Log
Methods HW Services Replication Clustering Mirroring Shipping
Disk
Methods
Other
HW
Cluster
Services
Data
Replication
SQL
Clustering
DB
Mirroring
Log
Shipping Copyright 2009 – Database Architechs
www.dbarchitechs.com
16. MSCS Cluster Services
C:
Local
Binaries
Windows 2003
Node A Enterprise Edition Cluster Group
Resources
D:
Shared
SCSI
Disk
Q: Quorum
Windows 2003
Node B Enterprise Edition C:
Copyright 2009 – Database Architechs Local
www.dbarchitechs.com Binaries
17. SQL
SQL Clustering
Connections COLTST1
C:
Local
Binaries
Windows 2003
Enterprise Edition Cluster Group
Resources
SQL Server 2008 (physical)
E: MS DTC
Master DB
SQL Server 2008 SCSI TempDB
(Virtual SQL Server) Appl 1 DB
VSQLDBARCHVSQLSRV1
Q: Quorum
Disk
SQL Server 2008 (physical) SQL Agent
Windows 2003 COLTST2
Enterprise Edition C:
Copyright 2009 – Database Architechs Local
www.dbarchitechs.com Binaries
18. Data Replication
Can be used as a Warm Standby
and/or for Reporting needs
“Primary”
SQL Server 2008
“Replicate”
SQL Server 2008
Publication Subscription
Adventure Server
Server
Works AdventureWorks
Distribution
Server distribution
“Replicate”
SQL Server 2008
SQL Server 2008 Subscription
Server
Central Publisher/ AdventureWorks
Remote Distributor
Replication model
Copyright 2009 – Database Architechs
www.dbarchitechs.com
19. Database Mirroring
Client Client Client Client
Network
A
SQL Server 2008 SQL Server 2008
Database Mirror Server
MirroringAdventure
Principal
Works DB
D Adventure
D
Server B Works DB
translog translog
C
SQL Server 2008
Witness
MSDB DB
Server
20. Database Mirroring
with DB Snapshots
SQL Server 2008 SQL Server 2008
Mirror Server
20FIG34 AdventureWorks AdventureWorks
Principal
DB DB
Server
translog translog
Database Snapshot
SQL Server 2008
rk
rk
s
oo
Witness
er
tw
tw
Us
MSDB DB
Ne
Server
Ne
ng
rti
po
Re
21. Log Shipping
“Source”
SQL Server 2008 “Destination”
SQL Server 2008
TxnLog
Primary CallOne DB backups Secondary
Server Server
translog
CallOne DB
BackupCallOne_tlog_200405141120.TRN
Last log shipped
Delay Answer
LogShareCallOne_tlog_200405141120.TRN TxnLog
TxnLog
Copies Restores
“Monitor”
SQL Server 2008 Delay between
logs loaded
Delay Answer
Monitor
MSDB DB
Server
Copyright 2009 – Database Architechs
www.dbarchitechs.com
22. RAID Disk I/O Summary
RAID Level Fault Tolerance Logical Physical I/Os Logical Physical
Reads per Writes I/Os per Write
Read
RAID 0 None 1 1 1 1
RAID 1 or 10 Best 1 1 1 2 writes
(Optimal for OLTP)
RAID 5 Moderate 1 1 1 2 reads + 2 writes
(Optimal for mostly (that’s 4 per write!)
READ ONLY
systems)
NOTE:
Several RAID vendors are now showing RAID 5 and RAID 10
performance almost equivalent now via Cache/Buffer
advancements on their RAID controllers
Copyright 2009 – Database Architechs
www.dbarchitechs.com
23. Fault Tolerance and SQL DB Files
Description Fault Tolerance
Quorum Drive The quorum drive used with MSCS should be RAID 1 or
isolated to a drive by itself (very often mirrored RAID 10
as well for maximum availability)
SQL Server For OLTP (online transaction processing) systems, RAID 10
Database the database data/index files should be placed
files (OLTP) on a RAID 10 disk system.
SQL Server For DSS (Decision Support Systems) systems that RAID 5
Database are primarily READ ONLY, the database
files (DSS) data/index files should be placed on a RAID 5
disk system.
Temp DB Highly volatile disk I/O (when not able to do all it’s RAID 10
work in cache)
SQL Server The SQL transaction log files should be on their own RAID 10
Transaction mirrored volume for both performance and Or
Log files database protection. (for DSS systems, this RAID 1
could be RAID 5 also).
Copyright 2009 – Database Architechs
www.dbarchitechs.com
24. Example DB data files configuration
E:
RAID 5
Master DB
log
DSS - DB F:
(read only)
TempDB
RAID 10
log
G:
OLTP X - DB
log
OLTP Y - DB H:
RAID 1 or
RAID 10
Q: Quorum
Copyright 2009 – Database Architechs
www.dbarchitechs.com
25. Decision Tree approach
Condition/Question
Case A Case B Case C Case D . . . Case n
Action Action Action Action . . . Action
V W X Y Z
Disk Other Cluster Data SQL Database Log Distributed
Methods HW Services Replication Clustering Mirroring Shipping Transactions
Copyright 2009 – Database Architechs
www.dbarchitechs.com
Database Snapshots
26. Decision-Tree Path Traversal
SQL
Clustering Database
Cluster Snapshots
Services
1a2c3 1a2d3
1a2e3
1a2b3 Database
b c d Mirroring
a 1a2 e
1a2a3
HA
a b 1b2
1e2
e 1 c
Not Needed d
1c2 Log
1d2 Shipping
HW/Disk
Redundancy Data
Distributed Replication
Transactions
Copyright 2009 – Database Architechs
www.dbarchitechs.com
27. Decision-Tree: ASP Questions 1-3
1
What % of availability must your application have?
A% <= 70% 70% < A% < =83% 83 < A% < =95% 95% < A% < =99.5% A%> 99.5%
Marginal Acceptable Standard High Extreme
Availability Availability Availability Availability Availability
2
How much tolerance of downtime by end-users?
Very High High Medium Low Very Low
Not Low Standard High Extremely
Critical Criticality Criticality Criticality Critical
3
What is the per hour cost of downtime for this application?
$C<= $3K $3K < $C < =$7K $7K < $C < =$12K $12K < $C < =$20K $C > $20K
Very Low Low Moderate High Very High
Cost Cost Cost Cost Cost
Copyright 2009 – Database Architechs
www.dbarchitechs.com
28. Decision-Tree: ASP Questions 4-6
4
How long does it take to get the application back online?
Very Long Long Average Short Very Short
Marginal Acceptable Standard Fast Extreme
Recoverability Recoverability Recoverability Recoverability Recoverability
5
How much of the application is distributed?
None A Little Medium A Lot All
Non- Low Moderately High Extremely
Distributed Distribution Distributed Distribution Distributed
6
How much data inconsistency can be tolerated?
Very Little A Little Medium A Lot Very Much
Very High High Moderate Low Minimal
Consistency Consistency Consistency Consistency Consistency
Copyright 2009 – Database Architechs
www.dbarchitechs.com
29. Decision-Tree: ASP Questions 7-9
7
How often is scheduled maintenance required?
Very Often Often Average Not Often Rarely
Very High High Reasonable Low Minimal
Downtime Downtime Downtime Downtime Downtime
8
How important is high performance and scalability?
Not Very Somewhat Moderately Very Much Extremely
Very low Low Reasonable High Extreme
Performance Performance Performance Performance Performance
9
How important is the application connection to the end-user?
Not Very Somewhat Moderately Very Much Extremely
Connection Connection Connection
Not Establish new
Re-established Retry process Fail-over
Needed Connection easily
Copyright 2009 – Database Architechs
www.dbarchitechs.com
30. Decision-Tree: ASP Question 10
10
What is the estimated cost of the HA Solution (budget)?
C$ < $10K $10K <= C$ < $100K $100K <= C$ < $250K $250K <= C$ < $500K C$ >= $500K
Very Low Low Moderate High Extreme
Cost Cost Cost Cost Cost
1. 1e Extreme Availability goal
2. 1e+2d Very low tolerance of downtime
3. 1e+2d+3e $15k/hr cost of downtime (High Cost)
4. 1e+2d+3e+4c Average recovery time
5. 1e+2d+3e+4c+5a No distributed components or synchronization
6. 1e+2d+3e+4c+5a+6b A little data inconsistency can be tolerated
7. 1e+2d+3e+4c+5a+6b+7c Average amount of scheduled downtime
8. 1e+2d+3e+4c+5a+6b+7c+8d Performance is very much important
9. 1e+2d+3e+4c+5a+6b+7c+8d+9b Connection can be re-established
10. 1e+2d+3e+4c+5a+6b+7c+8d+9b+10c Moderate HA Cost/Good budget
Best fitting HA Solution (together)
Disk Other Cluster SQL
Copyright 2009 – Database Architechs
Methods HW Services
www.dbarchitechs.com Clustering
31. Basic “one-two” Punch approach
1 Build the proper foundation first
Hardware/Network Disk Backups Vendor Training, QA, Software
Redundancy DB Backups SLA’s & Standards Upgrades
Then, build within the appropriate HA solution
2
that your application requires
Disk Other Cluster Data SQL Database Log Distributed
Methods HW Services Replication Clustering Mirroring Shipping Transactions
Database Snapshots
Copyright 2009 – Database Architechs
www.dbarchitechs.com
32. ASP – Scenario #1 with SQL Clustering
ASPProd1
C:
Local
Binaries
Windows 2003 MSCS
Enterprise Edition Cluster Group
Resources
JRUN/WebServices/IIS
SQL Server 2005 (physical)
Master DB
E:
Network
TempDB
SQL Server 2005 SCSI F:
(Virtual SQL Server)
ASQLASPSERV1 HOE DB
G:
MS DTC
Q: Quorum
SQL Server 2005 (physical) SQL Agent
Disk
Windows 2003
Enterprise Edition MSCS C:
ASPProd2
Active/Passive Copyright 2009 – Database Architechs Local
Binaries
Configuration www.dbarchitechs.com
33. Log Shipping
“Source”
SQL Server 2000 “Destination”
SQL Server 2000
TxnLog
Primary CallOne DB backups Secondary
Server Server
translog
CallOne DB
BackupCallOne_tlog_200405141120.TRN
Last log shipped
Delay Answer
LogShareCallOne_tlog_200405141120.TRN TxnLog
TxnLog
Copies Restores
“Monitor”
SQL Server 2000 Delay between
logs loaded
Delay Answer
Monitor
MSDB DB
Server
Copyright 2009 – Database Architechs
www.dbarchitechs.com
34. North America
(Reporting & “warm/hot” spare)
SQL Server 2000
Headquarters (Santa Clara) Subscription
SQL Server 2000 Server
Live REPL MktgDB
Publication
solution
Server MktgDB
Europe (Reporting)
SQL Server 2000
Subscription
Distribution Server
Server distribution MktgDB
SQL Server 2000
Far East (Reporting)
SQL Server 2000
Central Publisher/ Subscription
Remote Distributor Server
Replication model MktgDB
Copyright 2009 – Database Architechs
www.dbarchitechs.com
35. SQL Server 2000
Central Publisher SQL Server 2000
Subscription (default option) Subscription
Server Server
Northwind SQL Server 2000 Northwind
Publication
Server Northwind
Distribution
Server distribution
Oracle SQL Server 7.0
Subscription Subscription
Server Server
Northwind Northwind
Copyright 2009 – Database Architechs
www.dbarchitechs.com
36. Central Publisher
SQL Server 2000 SQL Server 2000
Subscription Remote Distributor Subscription
Server Server
Northwind SQL Server 2000 Northwind
Publication
Server Northwind
Distribution
Server distribution
SQL Server 2000
Oracle SQL Server 7.0
Subscription Subscription
Server Server
Northwind Northwind
Copyright 2009 – Database Architechs
www.dbarchitechs.com
37. Data Access Latency Autonomy
Distributing Data
Sites
(locations)
Frequency Network Machines Owner Other REPLICATION
Read Only Each site only Central Publisher
short high many high fast/ 1 1 OLTP needs regional Transactional repl
Reporting stable server/site
Database Mirroring site data filter by region
Read Only Each site only Central Publisher
fast/ 1 Snapshot repl
Reporting long high many low 1 OLTP needs regional
stable server/site
Database Mirroring site data filter by region
Read Mostly Regional updates Central Publisher
fast/ 1 Transactional repl
A few updates short high < 10 medium 1 OLTP on one table
stable server/site site Updating Subs
Read Mostly Regional update Central Publisher
slow/ 1 All Merge repl
A few updates medium high < 10 medium
unreliab server/site update
all tables
Read equal Regional update Peer-to-Peer
fast/ 1 All Transactional
Equal updates short high < 10 medium all tables
stable server/site update repl
Inserts 1 Each site only Central Subscriber
short high many high fast/ 1 Transactional repl
(new orders) report needs regional
stable server/site data
site
Central Publisher
Hot/Warm Very fast/ 1 Fail-over Remote Distributor
high <2 high 1 OLTP
Spare short stable server/site Transactional repl
site
Database Mirroring
38. Foundation, Foundation, Foundation
Piecing it together
Hardware/Network Disk Backups Vendor Training, QA, Software
Redundancy DB Backups SLA’s & Standards Upgrades
Network
Application
Middleware
Database
HARDWARE
Operating Network Components
System Servers
Disk Systems
ck Memory
Sta
ystem
S Copyright 2009 – Database Architechs
www.dbarchitechs.com
39. ROI
Calculati
on
Copyright 2009 – Database Architechs
www.dbarchitechs.com
40. Database Mirroring
Transparent Client Redirect
Network
A
SQL Server 2008 SQL Server 2008
Mirror Server
Principal Applx DB D
Applx DB D
Server B
translog translog
C
“Copy-on-Write” technology
SQL Server 2008
Witness
MSDB DB
Server
Copyright 2009 – Database Architechs
www.dbarchitechs.com
44. Database Mirroring with
DB Snapshot
SQL Server 2005 SQL Server 2005
Mirror Server
Principal Applx DB
Applx DB
Server
translog translog
Database Snapshot
SQL Server 2005
rk
rk
s
oo
Witness
er
tw
tw
Us
MSDB DB
Ne
Server
Ne
ng
rti
po
Re
Copyright 2009 – Database Architechs
www.dbarchitechs.com
45. Snapshot
Source Data Users
Pages
SELECT …..data…….
FROM AdventureWorks
SNAPSHOT
04
SQL Server 2008
SQL
Server
Snapshot
AdventureWorks AdventureWorks
DB DB
System Catalog Sparse
of changed pages File
Pages
46. Instance: SQL2008xyz PH Topology With Snapshots
Endpoint Name: “endpoint4mirroring”
Role: PARTNER Critical
Report
Network
Network
Users
SQL Server 2008
Principal Instance: SQL2008zzz
Endpoint Name: “endpoint4mirroring”
Server
Role: PARTNER
SQL Server 2008
OLTP Application
Active Mirror Server
Clustered
Replication
Adventure Adventure
Works DB Works DB
translog translog
Database
Passive Snapshot
PH Topology Network
Network
Principal
Server
SQL Server 2008
Less Critical
Reporting Users
47. Publisher The Combo Pack
SQL Server 2008 SQL Server 2008
Principal Mirror SQL Server 2008
Server Server
Subscriber
Distributor
SQL Server 2008 SQL Server 2008
Principal Mirror
Server Server
SQL Server 2008
SQL Server 2008
Witness
Server
Subscriber
Copyright 2009 – Database Architechs
www.dbarchitechs.com
48. Restart
Stage
Rolled Forward
time
Transactions
SQL Server
SQL Server 2000 2005/2008
SQL SQL
Server Server
Transactions
Rolled Back
SQL Server 2005/2008
database is available
SQL Server 2000
database is available
complete
Restart
DB Availability
Improvement !
Copyright 2009 – Database Architechs
www.dbarchitechs.com
52. Fail-Over via Move Group
ANSWER to question #1
Copyright 2009 – Database Architechs
www.dbarchitechs.com
53. Distributed Transactions
“Primary Location”
SQL Server 2000
Try primary first
Reads Northwind 1
If
no
ta
va
ila
ble
, try “Secondary Location”
MS DTC
sec SQL Server 2000
on
Updates da
ry
Northwind 2
Must succeed
together, or be
both rolled back
(two-phase commit)
Copyright 2009 – Database Architechs
www.dbarchitechs.com