5. high availability & disaster recovery
critical factor in any SharePoint deployment
however…
6. high availability
protecting against component failures
• server hardware
• operating system
• service applications
• application pools
• custom development
• …
7. number of nines
• often an important part of a service level agreement (SLA)
• usually only unplanned downtime
11. it’s all about the business
• involve all stakeholders when planning
• don’t neglect the business impact
• analyze data & systems
• consider non-technical elements
business continuity planning
12. key concepts of bcp
• Risk assessment
• Business Impact Analysis
• Business Continuity Plan
• Disaster Recovery Plan
13.
14. key parameters
Recovery Time Objective (RTO)
When will my system be available again?
Recovery Point Objective (RPO)
How much data can I afford to lose?
Recovery Level Objective (RLO)
To what level am I able to restore?
15. outage at 08:00
last backup at 20:00 full recovery at 12:00
time
RPO12h RTO4h
16. reality check
• What are acceptable RTO & RPO times?
outage at 08:00
last backup at 07:55 full recovery at 08:15
time
RPO5m RTO15m
• Is RTO and RPO 0 possible at all?
• What about the costs?
17. context is king
pitfalls when designing a SharePoint HA/DR solution
• enterprise infrastructure
• technical skills
• operational readiness
• backup/restore
• documentation
• dependencies on other systems
• 3d party tools
• …
18. additional considerations
establish recovery targets
• What should be restored and what not?
• What can be restored and what not?
• Is some data more important than other?
• How must the restored system behave?
• Balance costs & risks when designing a solution
21. SharePoint options
how can you make SharePoint highly available?
• adding servers for redundancy
• splitting services across servers
• using load balancing techniques
• highly available SQL Server
• virtualization
23. service applications
how to distribute service applications throughout your farm?
SharePoint takes care of the load balancing for you
24. important considerations
• user profile synchronization service only on 1 server
• search service application can be made fully redundant now
what about disaster recovery?
26. keep it simple
• recycle bin
• unattached content database
• native backup/restore
27. rebuild farm
?
• never simply dismiss this option
• serious drawbacks however
• backup/restore data
• documentation is essential
• script your install
29. warm / hot standby farms
• completely separate farm
• near identical configuration
• same customizations
• separate datastores
• involves some kind of data replication
• replicating service app data has its
limits
• manual failover & client redirection
31. stretched farm
a special case…
a lot of dependencies…
some complexity involved…
major design constraints
• network throughput
• network latency
• redundant access infrastructure
• data replication
32.
33. clustering
two flavors
• high availability
• same datacenter
• 2 or more nodes
• shared storage
• automatic failover
• SharePoint is unaware
• high availability or disaster recovery
• multiple datacenters
• 2 or more nodes
• no shared storage
• automatic failover
• SharePoint is unaware
• data replication needed
35. mirroring
essentials
• high availability scenarios
• no shared storage
• SharePoint is aware !
nice to know
• full recovery model
• configured per database
• only one secondary possible
• secondary cannot be accessed
• automatic failover possible
• network constraints
• sync or async
• RBS (SQL filestream) not supported
38. log shipping
essentials
• disaster recovery scenarios
• no shared storage
• backup/restore based
nice to know
• full recovery model
• configured per database
• multiple secondary's possible
• secondary can be read from
• no automatic failover possible
• rpo will generally not be 0
40. SQL 2012 Availability Group
the newest kid on the block
essentials
• clustering & mirroring evolved
• at the instance level
• no shared storage
• for ha & dr
• simple configuration
nice to know
• automatic failover across single or multiple datacenters
• multiple databases fail over together
• no need for aliases or AddFailoverServiceInstance in SharePoint
• multiple (readable) secondaries possible
• full recovery model
• RBS support
44. single farm / one datacenter
• multiple web servers with load balancing
• multiple application servers
• clustering or mirroring for ha or dr
• consider SQL 2012 availability groups!
45. single farm / two datacenters
• fully redundant network infrastructure
• <1ms latency between datacenters
• load balancing across datacenters
• multiple web servers
• multiple application servers
• mirroring or geo cluster with data replication for ha & dr
• consider SQL 2012 availability groups!
46. two farms / two datacenters
• fully redundant network infrastructure
• log shipping between data centers for dr
• manual failover
• manual client redirect (network routing, dns)
• sometimes DR farm is read-only
• warm / hot standby
• consider SQL 2012 availability groups!