KoprowskiT_SQLSat152_Bulgaria_HighAvailabilityOfSQLintheContextOfSLA

HIGH AVAILABILITY OF SQL SERVER
IN THE CONTEXT OF SLA
Tobiasz Janusz Koprowski

SELECT {BIO}
 Polish SQL Server User Group Leader
 Microsoft Certified Trainer
 MCP, MCSA, MLSS, MLSBS, MCTS, MCITP, MCT
 SQL Server MVP from 2010
 Friends of RedGate PLUS
 PASS SQL Azure Virtual Chapter Co-Founder
 Blogger, Influencer, Technical Writer
 Last 7 years (living) in Data Center in Wrocław
 Generally about 12 years in IT/banking area
 GITCA Technical Lead & Vice-Chair EMEA Board
 Speaker at SQL Server Community Launch, Time for SharePoint,
CodeCamps, SharePoint Community Launch, CISSP Day, InfoTRAMS,
SQLSaturday, SQLBits, CarreerCon,
 Autor of few articles on TechNet (PL) and WSS.pl portal
 Deep Dives Co-Author:
High availability of SQL Server in the context
of Service Level Agreements (Chapter 18th)
 Working for MS Subject Matter Expert and MS Terminology
community (Windows 7, 8 & Visualstudio 2010,2011

Agenda
 Back to the school:
 What is High Availability
 What is Service Level Agreement
 Using HA in SQL Server 2008
 HA solutions in SQL Server 2008 that means:
Enterprise, Enterprise
 Why SLA and DBA
 Dependency of SLA and HA
 Case Studies
 Q&A

What is High Availability?
 High Availability (HA) to ensure the
continued operation of equipment and
systems for the purposes of (usually) in an
enterprise production environment.
 Is designed to prevent data loss as a result of:
 software bugs,
 manufacturing defects
 hardware failure
 natural disasters
 human error
 other unforeseen events

Two kinds of monster:
PSO > USO > SLA
 PSO Planned System Outages – Planned System Unavailability
 Minimum planned unavailability, due to the need to carry out
modernization work, installing patches, replacement / extension
of hardware,
 Agreed/accepted by/with the client and not affecting the
provisions of the HA, and SLA, until
 ...USO Unplaned System Outages – Unplanned System Unavailability
 an error that prevents a partial or total work environment in a
tangible, measurable customer
 resulting in high costs if you need repairs, as well as penalty
payments for non-SLA

Performance metrics (HA)

 What it really is the availability of the order of 99.99%?
 Availability 99.99% to 0.01UNAVAILABILITY in a
requested period (eg annual), which ...
 How much is that in terms of the unavailability of the
server / environment / database:

Availability = MTBF / MTBF + MTTR
 MTBF -> Mean Time Between Failures
 MTTR -> Mean Time To Repair

Unavailability in minutes, hours, days, weeks...

Downtime Downtime Downtime
Availability %
per year per month* per week
90% 36.5 days 72 hours 16.8 hours
95% 18.25 days 36 hours 8.4 hours
98% 7.30 days 14.4 hours 3.36 hours
99% 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 min
99.8% 17.52 hours 86.23 min 20.16 min
99.9% ("three nines") 8.76 hours 43.2 min 10.1 min
99.95% 4.38 hours 21.56 min 5.04 min
99.99% ("four nines") 52.6 min 4.32 min 1.01 min
99.999% ("five nines") 5.26 min 25.9 s 6.05 s
99.9999% ("six nines") 31.5 s 2.59 s 0.605 s

What isSLA?
 SLA - Service Level Agreement.
 The origins date back to 1980 and the agreements between
operators and end customers.
 Mutually negotiable contract for the provision of services (not
just IT, but these in particular)
 It must be concluded formally, though legally permissible is an
informal agreement
 Including the level and range of services provided by means of
measurable indicators (level of accessibility, usability,
performance)
 The contract should have specified minimum and maximum
range for each subject to its services

Metrics of SLA
There is no specific SLA measurement WITHOUT indicators!
SAMPLE CALL CENTER / SERVICE DESK:

 ABA (Abandonment Rate): Percentage of calls abandoned while waiting for
a response.
 ASA (Average Speed to Answer): Average time (usually in seconds) required
for the connection of boards help.
 TSF (Time Service Factor): Percentage of calls answered in precise time
frame, such as 80% in 20 seconds.
 FCR (First Call Resolution): Percentage of calls where the problem was
solved without having to switch to another expert
 TAT (Turn Around Time): The time it takes to complete certain tasks.

High Availability in SQL Server 2008
Microsoft SQL Server 2008 oferuje:
• Database Mirroring
• Database Snapshots
• Windows Clustering
• SQL Server Replication
• Hot-add memory and CPU
• Online Index Operations
• Table and Index Partitioning
• Failover Clustering
• Peer-To-Peer Replication

Solutions for HA for SQL Server
DATABASE FAILOVER TRANSACTIONAL
AREA LOG SHIPPING
MIRRORING CLUSTERING REPLICATION
some data loss
Data Loss no data loss no data loss some data loss possible possible

Automatic Failover YES (in HA mode) YES no no
YES, connect to same
Transparent To Client YES, autodirect IP no, NLB helps no, NLB helps

20 seconds or more + seconds plus time to
Downtime < 3 seconds time to recovery seconds recovery

Standby Ready Access Yes, with db snapshots no data loss YES

Data Granularity DB only all systems and db's table or view DB only

Masking of hdd failure YES No, shared disk YES YES
NO, duplicate NO, duplicate NO, duplicate
Special hardware recommended Cluster HCL recommended recommended
Complexity Some More More More

High
Why High Availability? Availability

 Businesses need to work around the clock to meet customer demands
 When systems are not running, businesses are losing revenue, opportunities,
customers and reputation
 High availability reduces the impact of required maintenance on
day-to-day operations and helps recover quickly from disasters
 Businesses need flexibility to easily build high availability solutions that meet
business and technology needs
Online operations
Multiple instance clustering

Prevent Unplanned
Downtime Live Migration
Automatic page repair with
database mirroring Reduce Planned
Downtime
Hot-add CPU and RAM

Database snapshots
Peer-to-peer replication

High
Prevent Unplanned Downtime Availability

Multiple-Instance Database
Clustering
Applications &
Business Logic 1100101
00101
0010111
1100101
0010100
1100101
00101
1100101
• More than one passive node is
available to host instances from
00101
101 00101
110010
110010 110010

multiple failovers on active nodes
• Having multiple failover nodes
provides greater availability
• Multiple instances can share the
Active Failover Offline
Active Active
same failover node, which reduces
hardware costs
• Simplified setup reduces
administrative costs

Because of the critical nature of the G4S application,
CASON sets up the servers in a failover cluster to
ensure high availability.
—CASON Case Study

High
Enhanced Database Mirroring Availability

High Performance Mirroring
• Increase performance through
asynchronous mirroring
Automatic Page Repair
Applications &
• Automatically detects page corruption
Business Logic and retrieves data from the mirror
• Reduces downtime and
management costs
• Minimizes application changes to
correctly handle I/O errors
Reporting from Mirror
Principal Mirror • Increase utilization of mirror server
• Reduce need for reporting servers

“This is a really powerful enhancement because prior
to this… you would have to run DBCC CHECKDB...
and that would likely mean taking downtime… With
SQL Server 2008 Database Mirroring you can avoid
the effort and downtime.”

High
Help Recover From User Errors Availability

1100101
00101
1100101
00101
110010
Database Snapshots
• Provide a read-only static view of
Applications & the database at a point in time
Business Logic
• Revert to a point in time before
user error

Snapshot Source • Data loss is limited to changes after
1100101
00101
1100101
00101
the snapshot
110010

• Run reports from a snapshot
1100101
00101
1100101
00101
110010

created on the mirror server in a
mirror to better utilize resources

“Database snapshots allow you to create read-only
databases for reporting and can also be useful in your
data recovery efforts in the event of a disaster.”
—Tim Chapman, SQL Server Database Administrator

High
Maintain Databases Without Downtime Availability

1100101
00101
1100101
Online Operations
00101
110010

• Allow routine maintenance without
corresponding downtime
Applications &
Business Logic ‒ Online index operations

‒ Online page and file restoration

‒ Online configuration of peer-to-peer
Table Index
0
5
nodes
Deleted
1
Deleted
4
Deleted
2 • Users and applications can access
23
Deleted
3
74
data while the table, key, or index is
5
05 being updated
6
3
7

We recommend performing online index operations for
business environments that operate 24 hours a day,
seven days a week, in which the need for concurrent
user activity during index operations is vital.
— SQL Server Books Online

High
Minimize Planned Downtime and Increase Efficiency Availability

Live Migration
• Move running instances of VMs
between host servers
Applications &
11001010
11001010
11001010
0101
0101
0101
0101
11001010
11001010
11001010
11001010
• Virtual machines can be moved for
0101

maintenance or to balance
0101
0101
0101
110010
110010
110010
110010

workload on host servers
11001010 11001010
11001010

• Perform maintenance on physical
11001010
0101
0101 0101
0101
11001010
11001010 11001010
11001010
0101
0101 0101
0101
110010
110010 110010
110010

machines without any downtime
• Requires Windows Server 2008 R2
Hyper-v

“This server already runs on our cluster solution with
high availability, but after we have tested live migration
on the new hardware, we’ll move it over to ensure
optimal performance and reliability”
—Rodrigo Immaginario, IT Manager, Universidade Vila Velha

Minimize Planned Downtime
High
Availability

Hot-Add CPU and RAM
• Dynamically add memory and
Applications & processors to servers without
100101
110010
100101 incurring downtime
110010 110010
100101 100101
110010 110010

• Requires hardware support for
110010
100101
110010
110010
100101
110010
either physical or virtual hardware
100101 100101
110010 110010

Hot-add CPU is the ability to dynamically add CPUs to
a running system. Adding CPUs can occur physically
by adding new hardware, logically by online hardware
partitioning, or virtually through a virtualization layer.
—SQL Server Books Online

High
Access Data Seamlessly Across Servers Availability

Peer-to-Peer Replication
• Increases reliability by replicating
Applications & data to multiple servers
Business Logic
• Provides higher availability in case
1100101
0010110
00101
0101100
1100101
1011001
00101
01
110010

of failure or to allow maintenance
at any of the participating nodes
110010
100101

• Offers improved performance for
110010
100101
110010

each node with geo-scale
1100101
00101
1100101
00101
110010

architecture
• Add and remove servers easily
without taking replication offline,
by using the new topology wizard

“[Microsoft] SQL Server 2008 replication proved to be
very predictable and reliable in our testing. This helps
us to create flexible and scalable replication solutions.
Reliability must be at the foundation of all that we do.”
— Sergey Elchinsky, Leading System Engineer, Baltika Breweries

Database Mirroring
 Mirroring, which is a mirror image of the data
 Available only for two bases (principal, mirror)
 The desired function of a witness (witness)
 Requirements:
 principal, mirror - only SQL Server Enterprise
 witness - can be SQL Server Express
 Availability for the database:
 copy of the database on a different physical server and / or virtual
 Availability for the system:
 A copy of the entire environment on a different physical server
and / or virtual

Database Mirroring Refresher Synchronous Mode

KEY POINT: mirror
database is an EXACT
copy of the principal
1 Acknowledge
Commit
7 Acknowledge
6
Constantly
2 redoing on
mirror
2 Transmit to mirror 4
Write to
local log Committed Write to
3 in log remote log
5
DB
DB
Log Log

Hot-add memory and CPU
 In SQL Server 2005 added the ability to use memory to be added "on
the fly"
 In SQL Server 2008 extends the dynamic capabilities of SQL Server
work, allowing you to hot add CPU
 "Hot-add" is the ability to connect the RAM / CPU to the computer
while the computer is running, and then by refreshing the SQL Server
to use the new equipment ONLINE
 The equipment must support hot-add (of course!)
 Supported only in the Enterprise Edition running on a 64-bit version of Windows
Server 2008 Datacenter / Enterprise
 SQL Server does not automatically start using the new processor / memory
 The need to reconfigure run
 Already running query will not use the newly added memory / processor.

Hot-Add CPU: Affinity Masks
 Affinity masks control which CPUs are used by SQL Server, and for
what purpose
 Any affinity masks will need to be updated after hot-adding new
CPUs
 If the affinity mask is set to non-zero, you will need to update it so
that SQL Server knows it can use the new CPUs.
 On systems with > 32 CPUs, you will need to set the affinity64
mask to pick up the new CPUs
 If you want to use the new CPUs for IO only, you must add the
relevant bits to the affinity I/O (or affinity64 I/O) mask
 If questioned about affinity masks
 All zeroes means that Windows decides which CPUs are used
 Non-zero: single bit per CPU, if bit is 1, SQL Server will use it
 bit cannot be set in affinity AND affinity I/O mask

Fast Manual Failover
 High Security mode (synchronous mirroring without witness),
manual failover is always used
 SQL Server 2005, if there is an emergency situation, the
database on the mirror is closed and restarted to force the to
recover non-commited transaction log
 This can greatly increase the failover time
 Consider a database with hundreds of files, which all have to be opened
to start the sequence database
 SQL Server 2008 removes this step, thus speeding up and
reducing the use of emergency shutdown

Peer-to-Peer Topology (?)

 In SQL Server 2005 introduces the ability to use solution peer-to-peer
(or "two-way") Transactional Replication
 A great way to scale the resources needed to work
 Partialy as a way to have "undue copy"
 One major drawback - changing the topology of peer-to-peer
required to stop ALL activity on the servers in the topology tree
 In SQL Server 2008,
 these restrictions have been removed (in most cases),
 Setup Wizard also upgraded peer-to-peer network in SSMS
 Switching partitions can be repeated

Topology Wizard
 The wizard now is graphical, with drag-n-drop functionality for making topology
connections

SQL Server 2012 & AlwaysOn | marketing

 Help reduce planned and unplanned downtime with the new
integrated high availability and disaster recover solution, SQL Server
AlwaysOn.
 Simplify deployment and management of HA requirements using
integrated configuration and monitoring tools.
 Improve IT cost efficiency and performance using Active Secondary.
 Reduce planned downtime with Windows Server Core.

SQL Server 2012 & AlwaysOn | technical

AlwaysOn Failover Cluster Instances
As part of the SQL Server AlwaysOn offering, AlwaysOn Failover Cluster Instances leverages Windows Server Failover
Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level—a
failover cluster instance (FCI). An FCI is a single instance of SQL Server that is installed across Windows Server
Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. On the network, an FCI appears to be an
instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another
if the current node becomes unavailable.

AlwaysOn Availability Groups
AlwaysOn Availability Groups is an enterprise-level high-availability and disaster recovery solution introduced in SQL
Server 2012 to enable you to maximize availability for one or more user databases. AlwaysOn Availability Groups
requires that the SQL Server instances reside on Windows Server Failover Clustering (WSFC) nodes.

Database mirroring
Avoid using this feature in new development work, and plan to modify aplications that currently use this feature. We
recommend that you use AlwaysOn Availability Groups instead. Database mirroring is a solution to increase
database availability by supporting almost instantaneous failover. Database mirroring can be used to maintain a
single standby database, or mirror database, for a corresponding production database that is referred to as the
principal database. For more information, see Database Mirroring (SQL Server).

Log shipping
Like AlwaysOn Availability Groups and database mirroring, log shipping operates at the database level. You can use
log shipping to maintain one or more warm standby databases (referred to as secondary databases) for a single
production database that is referred to as the primary database. For more information about log shipping, see
About Log Shipping (SQL Server).

SLA - what does this have to do
with the DBA
 Production hours:
 Hours in which the partition / table / database must be available
 May be different for different parts of a database, for example, depending on the
application
 The percentage of time the service:
 The percentage of time within (time range) when the service / partition / table /
database is available
 Hours reserved for downtime:
 These advance hours of downtime (technical break) facilitate the work of users
 Methods Customer Support
 The response time from the HelpDesk
 DBA response time for an event

SLA - what does this have to do
with the DBA
 Number of users on the system
 Number of transactions processed per unit of time
 Acceptable performance levels for access to the various operations
 Minimum time required to replicate the different servers
 Deadline for data recovery from failures
 Accidental deletion of data
 Damage to the database
 SQL Server Crash
 OS Server Crash
 Time it takes to read the data on the web (eg read / write table sales)
so that it was possible to continue the sale
 Maximum amount of space
 Maximum amount of tables / databases
 Number of users in specific roles

Why SLA is so important?
 In fact, it's more than just a signed agreement between the client and
your boss.
 It is also a contract that YOU need to meet
 If it's signed an agreement to zero downtime and zero data loss
(abstraction?) Then you need to make sure that if corruption can fulfill
this contract (change / delete data on purpose by the authorized
user).
 If you can not meet the SLA, the business is exposed to downtime
and data loss
 The end result is to submit your CV to a recruitment agency ...

Do you think you can meet your Service Level Agreement?

 You need to know what are the conditions / requirements for
SLA if you meet them
 As you can accomplish if you do not know that there is an SLA?
 As you review the contract if you did not invite anyone to the
meeting on the creation of a Service Level Agreement?
 The end result is to submit your CV to a recruitment agency ...

Do you think you can meet your SLA?

 The recovery plan looks great on paper - but if ever you test it?
 Suppose this situation:
 We allow 15 minutes is not available for database size of 100 GB.
 We are able to within the last 15 minutes substitute a copy of the user
database
 What will you do in case of damage to the database?
 What will you do in the event of disk failure?
 What will you do in case of burning the motherboard?
 What do you do when cutting the cable FC?
 How much time it will take to recover from a backup?
 How much time it will take to bring ribbons with backup from a second
location 25 kilometers away in the city center at 14?

Do you still meet the SLA 15 minutes of downtime?

Summary
 Database mirroring
 Log Shipping
 Hot-add CPU
 Transactional Replication
 Failover clustering enhancements
 Peer-to-peer replication enhancements

 Chmury (Google, Azure, Amazon...)

Summary

 You need to know about the existence of SLA
 You must take part in a Service Level Agreement
(requirements / features / technology)
 You need to have contingency plans - TESTED
 You must have knowledge of their
responsibilities
 You must be able to meet the technical SLA

Resources
 Database mirroring
 http://www.sqlskills.com/blogs/paul/2007/10/11/SQLServer2008PerformanceBoostForDatabase
Mirroring.aspx
 http://www.sqlskills.com/blogs/paul/2007/10/01/SQLServer2008NewPerformanceCountersForDa
tabaseMirroring.aspx
 http://www.sqlskills.com/blogs/paul/2007/09/27/SQLServer2008AutomaticPageRepairWithDatab
aseMirroring.aspx
 Backup compression
 http://www.sqlskills.com/blogs/paul/2008/01/09/SQLServer2008BackupCompressionCPUCost.a
spx
 http://www.sqlskills.com/blogs/paul/2007/09/20/SQLServer2008BackupCompression.aspx
 Hot-add CPU
 http://www.sqlskills.com/blogs/paul/2008/01/10/SQLServer2008HotAddCPUAndAffinityMasks.as
px
 DBCC CHECKDB
 http://www.sqlskills.com/blogs/paul/CategoryView,category,CHECKDB%2BFrom%2BEvery%2B
Angle.aspx
 Failover clustering
 http://www.microsoft.com/windowsserver2008/failover-clusters.aspx
 Peer-to-peer replication
 http://www.sqlskills.com/blogs/paul/2007/12/07/SQLServer2008ConfiguringPeertoPeerReplicatio
n.aspx

AFTER SESSION {next contact}
 MAIL: KoprowskiT@windowslive.com
 MSG: KoprowskiT@windowslive.com
 SKYPE: tjkoprowski
 TWITTER @KoprowskiT

 SlideShare (post-sessions): http://www.slideshare.net/Anorak

BLOGS:
 ITPRO Anorak’s Vision: http://itblogs.pl/notbeautifulanymore/ [PL/EN]
 Volume Licensing Specialites: http://koprowskit.eu/licensing/ [PL/EN]
 My MVP Blog: http://koprowskit.eu/geek/ [PL/EN/ES]

PLEASE RATE MY SESSION

THANK YOU

KoprowskiT_SQLSat152_Bulgaria_HighAvailabilityOfSQLintheContextOfSLA

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie KoprowskiT_SQLSat152_Bulgaria_HighAvailabilityOfSQLintheContextOfSLA

Ähnlich wie KoprowskiT_SQLSat152_Bulgaria_HighAvailabilityOfSQLintheContextOfSLA (20)

Mehr von Tobias Koprowski

Mehr von Tobias Koprowski (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

KoprowskiT_SQLSat152_Bulgaria_HighAvailabilityOfSQLintheContextOfSLA