Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Sql server 2012 ha and dr sql saturday boston
1. SQL Server 2012
High Availability and
Disaster Recovery
SQL Saturday #203
6 April 2013
Boston, MA
Joey D’Antoni
2. @jdanton
Joedantoni.wordpress.com
jdanton1@yahoo.com
About Me
Resources from Today’s Presentation
http://bit.ly/SQLSatRVAjd
3. Disaster Recovery—It’s All About Risk
Management
Understanding High Availability
Today’s
Presentation Ok, I get that, now how do I protect my
databases?
Understanding Availability Groups
5. Recovery Time Objective – How long can your
systems be down before your business is
impacted?
Disaster Recovery Point Objective—How much data can
Recovery your business lose before being impacted?
Terms
These will vary highly by your industry, and
your business model, but they apply to every
application
6. "How fast do you want to go? How much do you
want to spend?“ –attribution unknown
Risk
Management
7. In a nutshell, preparing a DR policy is just like
buying insurance
Based on your firms tolerance for risk, business
model, and geography
Extremely high levels of availability and
Risk protection are available, at a very expensive
Management cost
Very reasonable levels of protection and
availability can be had at a low cost
If you use a cloud provider—you still need to
think about this!
8. AlwaysOn Availability Groups
Database Mirroring
DR Solutions in Log Shipping
SQL Server Multi-site Replication
Multi-site Clustering
Virtualization Multi-site failover
10. High availability is a system design approach
and associated service implementation that
ensures a prearranged level of operational
performance will be met during a contractual
measurement period. --wikipedia
High
• System Design allows for minimal downtime in the event of
Availability hardware and operating system failures
11. SQL Server Failover Cluster Instances
High
VMWare vMotion/Hyper-V Live Migration
Availability in
Both of these technologies have a single point
SQL Server of failure in shared storage
12. Review
AlwaysOn Availability Groups
Database Mirroring
SQL Server HA Failover Cluster Instances
and DR Log Shipping
Options Replication
13. • Transaction Log Backups take place
on primary
• External Process ships logs to
secondary server(s) Secondary
• Data can be read on secondary Server
(except during t-log apply) DB
(S)
Primary
Log Shipping Server
DB
(P) Secondary
Server
Log
DB
Backup
(S)
Optional
14. Pros
• Standard Edition
• Supports Multiple Targets
• Can Read Secondary Copies
Log Shipping
Pros/Cons
Cons
• Dependent on Backup on Primary
• Manual Failover Process
• Reasonably High Complexity
15. • This is a really high level view
of replication
• There are numerous
topologies and options
involved in replication
• This is the nuts and bolts of it
Replication
Image Credit—MS Books Online
16. Pros
• Can Replicate to Multiple
Servers
• Replicate subset of data
• Standard Edition
(transactional)
Replication
Pros/Cons Cons
• Manual Failover
• Unknown RPO
• Can be fragile
• Re-sync process can be ugly
• Also requires connection
change for failover
18. Shared Storage (SAN or SMB Share*)
Windows Cluster (Windows Server 2012
Failover Standard Edition)
Cluster
SQL Server Standard Edition (Two Node
Instances Limit)
Requirements
Cluster Network
Quorum Disk
19. Pros
• Connections are transparent
• Failover is automatic
• Allows for whole instance
SQL Server protection
• Multiple servers can be involved
Failover
Cluster
Instances Cons
• Setup is complex
Pros/Cons • Hardware can sit idle in some
configs
• Single storage doesn’t allow for
data protection*
*More on this later
20. • Database transactions are compressed
and shipped to secondary server (2008+)
• The optional witness server facilitates
automatic failover
• Transfer may be sync or async*
Database Primary Secondary
Server Server
Mirroring Mirror Mirror
DB DB
Witness
Server
*Enterprise Edition Only
21. Pros
• Automatic Failover (w/witness)
• Configuration is fast and easy
• Failover happens quickly
• Corrupted pages get fixed on
secondary
Database
Mirroring
Pros/Cons Cons
• Is per database—multiple DB
failovers need scripting
• Async only available in EE
• Marked as deprecated in SQL
2012
• Secondaries are inaccessible
(except for snapshots)
22. Instance Instance
1 2
AG AG
(P) (S)
AlwaysOn Node A Node B
Availability
Groups
Washington Chicago
Listener Name (AD VCO)
Windows Cluster
23. Requires SQL Server Enterprise Edition
Windows Cluster
All servers in same Windows Domain
Databases Failover as a group
AlwaysOn No Shared Storage Needed
Availability Async and Sync Modes
Groups Automatic and Manual Failover
Supports up to 4 replica copies
Replicas can be read
Backups on secondary copies
24. Pros
• Readable secondaries allow for
load distribution
• No shared storage can reduce
hardware costs
• Multiple databases failing
together is great for complex
AlwaysOn apps
• Connection string handled
Availability gracefully by listener
Groups • Administration all through
SSMS
Cons
Pros/Cons • Config is easy • Large topologies lead to $$$
license costs
• Enterprise Edition only
• New feature, so some growing
pains
• Changes in application code
needed
25. Can cluster using SMB shares—becomes more
viable option with SMB 3.0 in Windows Server
2012
SQL 2012 Failover Process is changed—isAlive and
LooksAlive go away. Replaced with
What’s New sp_server_diagnostics
(Clustering) Multi-subnet clustering is now available—this is
designed for stretch clustering using SAN
replication
26. Availability Groups
SQL Server Mirroring is marked as deprecated
Not sure the long term impact of this for
2012 DR New standard edition and DR
Features
No real changes to replication or log
shipping
28. Great concept—allows for clusters to be
automatically rebooted
Windows
Works perfectly with SQL Server Failover
Server 2012 Cluster Instances
Cluster Aware Doesn’t work with AlwaysOn Availability
Updating Groups, at the moment
32. Understand your business need before
designing a HA and DR strategy
DR is just like buying insurance—you
Summary don’t need it until you do.
Lots of good options for HA and DR in
SQL Server for many price points
Always have a plan!
34. @jdanton – Twitter
jdanton1@yahoo.com – Email
Contact Info Joedantoni.wordpress.com – Blog
Resources from today:
http://bit.ly/SQLSat_Tampa
Hinweis der Redaktion
Hello and Welcome to SQL Saturday in Tampa I’m Joey D’Antoni and today we will be talking about High Availabilityb and Disaster Recovery in SQL Server—some 2012, some of this will apply to older versions of the software.
A little bit about myself. I’m @jdanton on Twitter---how many of you are on Twitter? It’s a really great resource for the SQL Community—we have a lot of interaction and discussion there, and additionally there is a great hashtag called SQLHelp. Where you can get questions answered by experts. My blog is at joedantoni.wordpress.com—I have posts on a lot of the topics we will talk about today, and have instructions on setting up an AlwaysOn environment there. Lastly you can reach me by email at jdanton1@yahoo.com. I have a blog post with my slides and additional resources from today’s presentation up at this bit.ly URL.Lastly, stop me at anytime if you have questions for me, I’ll do my best to answer, or direct you to an answer.
To start, we’re going to talk a good bit about disaster recovery. How many of you know if your organization has a disaster recovery plan? It should—even if it’s as simple as saying we move back to a paper based system, if our computers break, that’s a plan that can be followed when everyone is freaking out. Or your company may be an e-commerce site that immediately starts losing money the second something goes down. Then you need a different strategy.Next we’re going to talk about high availability—what it means, and several different ways to implement it within your infrastructure. How many of you have worked with clustering? After talking about the high level stuff we need to talk about, we will discuss all of the different options for data protection in SQL Server—we will discuss their pros and cons, costs, and complexity. And don’t worry, I will cover options in both standard and enterprise edition.Lastly, I will detail what you need to know about one of SQL 2012’s cornerstone features—AlwaysOn Availability Groups, and we will do a live demo and build a new AG.
So, when talking about disaster recovery, we have to talk about disasters. This first disaster happened just recently in Springfield, MA. A gas worker was responding to a gas leak, and accidentally damaged a pipe. He followed his procedure though and quickly evacuated all of the buildings in the area. These actions were all according to plan and as result no fatalities happened Unfortunately, a gentleman’s club was destroyed in the process, and the resulting cloud of glitter could be seen for several days. The next picture is hurricane Sandy. Growing up in New Orleans and starting my professional career in North Carolina, I’ve been through a lot of hurricanes and written disaster recovery plans to cover these situations. When I moved to the northeast, I thought it became less of a consideration. However, twice in the last two years, we’ve had major storms hit the eastern seaboard. Some firms had really good DR plans, and continued operating as normal. Others, however had the fuel tanks for their generators in the basement and had to organize bucket brigages to run fuel to the generators.The third picture is one I use in my SAN presentations to describe RAID 0. A car hitting a tree—this is here more to show the human aspect to DR, and to remind ourselves that a very important part of the process is to have human backups, and well written documentation.The last picture is of another classic disaster scenario—a building fire. In this case employees of this firm Inintech, had been stealing money via a computer system, but they weren’t able to track them down, because they didn’t have a DR plan.
So before one gets started on a disaster recovery plan, there are a couple of things you need to know. Depending on the size of you company and the nature of you business this can get pretty complicated. Even in medium size business you will probably want to split systems based on criticality. How do we determine the criticality—RTO and RPO.Recovery time objective is how long your systems can be down, before your company starts losing money. For a customer facing e-commerce site, this is basically instantly. So you are going to want to dedicate a lot of DR resources to that system. However, a back office HR reporting system would take several days to have impact, so maybe that doesn’t get clustering or mirroring.Recovery Point Objective aligns with this pretty closely—it’s how much data you can lose before the business is impacted. Similarly, you wouldn’t want to start losing orders and invoices—so those systems need a high level of protection.In most of my experience doing this work, I’ve grouped systems into tiers—usually 3 or 4, based on application needs. This is a really good first step DR exercise to do, even if you aren’t planning on implementing any ha or dr into your environment. It justs gives you an idea of which systems are most critical to recover.
One more thing about myself, I really like auto racing. Formula 1 specifically—talk about a highly available environment. Anyway, I’ve always seen this quote in terms of racing—How Fast do You Want to Go? How much do you want to spend? The first car here is the Red Bull RB8, it won both the driver’s and constructors championship in Formula 1 this year. It is custom developed for each race, can corner with a force 4x gravity, and the team has a budget of about $400M/yr, just to build two of these cars and race them.The second car, is the Tata. It’s an Indian car that costs less than $5000. It’s top speed is under 70 mph.I use these illustrations to demonstrate something—both of these vehicles can get you from point A to point B. Just in a different fashion. Some businesses will need extremely available systems with multi-site clusters and tertiary systems. While other companies will feel comfortable with shipping their backups offsite.
Just like buying an insurance—a DR plan is really nothing more than an insurance policy. You may never need it, but when you do you will be really thankful.Most of my experience is in the health care industry, and those firms tend to have pretty low tolerance for data loss. Financial services firms also have a low tolerance for data loss and downtime. It tends to cost money in a hurry. Another consideration is the actual location of your business and what sort of natural disasters can impact you. It’s no accident that Google, Facebook and Apple have all built data centres in Western North Carolina, and Oregon. Those places tend to be out of the way of most disasters. I used to think the mid-Atlantic was pretty safe, but….I can guarantee you 5 9s of availability. But it’s going to cost a lot of money—redundant SANs, enterprise SQL Server licenses, secondary data centers—these things all cost money. A lot of money.However, if you don’t work for Goldman Sachs, fear not—there a some decent options even with Standard Edition for data protection. It may not be as automatic, and you may lose a bit of data. But you can still protect yourself. Even if it’s only from hardware failure.Since I’m mentioning the word cloud, everybody drink. Ok. That’s better—seriously though if you are implementing a solution on Amazon or Azure, think about DR. Amazon in Reston has had several outages, and customers who only were running in that data center had outages. The ones who spanned Amazon DCs stayed up.
So what are the DR options within SQL Server? Starting at the top we have AlwaysOn Availability Groups—this is only available in Enterprise Edition of 2012. Database mirroring is a very similar option, and is available in standard edtion (synchrous only). It started in SP1 of SQL Server 2005. Log shipping has been around forever, and is available in all editions, as is replication. Multi-site clustering is not a cheap option—we’ll talk a little bit more about it later, but there is a great deal of expense involved in setting up a multi-site failover cluster.Lastly, both Hyper-V and VMWare offer the ability to migrate a guest operating system from one location to another. These tend to be also pretty expensive, and they aren’t really in the hands of the DBA, so we won’t talk to much more about them here.
A little bit about high availability—these are two main things I think of High Availability solutions—hardware failure, in this case complete meltdown of a server, or operating system failures. IE the blue screen of death. Or some combination of the two. I’ve had memory fail and lead to blue screens of death—fortunately it was in a cluster, so my downtime was minimal.
This is how Wikipedia defines High Availability. In my mind high availability is generally local and doesn’t necessarily provide DR. One of my favorite horror stories from an old job relates to this a bit. We had pretty highly available systems—clusters, VMWare clusters, that were all running on a single storage array. I was on call, and got paged on a database being down. I logged into the server and only saw the c: drive—it was SAN attached. I called my boss, and ask if she knew what was going on? She said, oh sorry, I was going to call—HP came into decommission a SAN and they took the wrong one. So about a week later we finally got everything back. Total mess.Even with HA—there tends to be a single point of failure at this storage layer. It’s pretty common in most cluster solutions, which is why they provide HA and not DR. Don’t confuse the two!
So what are the major HA technologies in SQL? The most common on is Failover Cluster Instances. One nice thing to know about Failover Clusters is that traditionally they have been dependent on the Enterprise version of Windows. Starting with 2012, standard edition has failover clustering built in. Also starting with SQL 2012, SMB file shares are supported as cluster disk, so you might not even need a SAN.The other options I have listed here are VMWare vMotion and Hyper-V live migration. Both of these solutions are completely transparent to SQL Server (you don’t have to do anything), but neither offer protection against any OS failures. But they work really well for hardware failures.Like I mentioned in the previous slide, both of these options do have a single point of failure with storage.
Just to summarize our Native Options that we are going to explore in detail here.