The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
ITCamp 2011 - Paul Roman - High Availability for Exchange 2010
1. High Availability for Exchange
2010
Paul Roman, MVP
Managing Partner, PRAS Consulting
E-mail: paul.roman@pras.ro
Blog: paulroman.pras.ro
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
2. IT Camp 2011
• Thanks for coming!
• ITCamp is made possible by our sponsors:
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
3. Session agenda
• Discuss different HA design dimensions:
– Infrastructure design
– Database Availability Group design
– Client experiences
• Implementation Examples
• Q&A
• Feedback & prizes
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
4. How should you design your IT infrastructure for Exchange HA
INFRASTRUCTURE DESIGN
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
5. Infrastructure Design
Active Directory Sites
• Active Directory site assignment controls the association
of CAS to Mailbox and Hub to Mailbox
– CAS/HUB service local mailbox servers, “mostly”
– Could be for multiple DAGs
• DAGs can span subnets without special action
– IP address for each MAPI subnet used by DAG
– Configured on DAG object
• Question : When would an AD site span datacenters?
– Answer: When datacenters have LAN quality communication
• Follow Active Directory guidance for AD site definition
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
6. Infrastructure Design
Cross-Datacenter Network Configuration
• For site resilience configurations use DHCP to assign
addresses for replication network
– Enables delivery of the typically required static routes
– If using static IP addresses, use netsh instead of route for
configuring static routes
• In terms of latency requirements, Exchange 2010 was
designed with a target round-trip latency of 250ms or less
– Remember, the higher the latency, the more impact to replication
• Configure a DNS TTL on “service access connection records”
that is consistent with your SLA
– E.g. ~5 minutes for a one hour RTO SLA
– Direct association between this time and recovery
– Remember the records might be in different zones!
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
7. Infrastructure Design
Namespace Planning (Site Resilience)
• Each datacenter should be considered active
when planning for namespaces
• Each datacenter needs the following
namespaces
– OWA/OA/EWS/EAS namespace
– POP/IMAP namespace
– RPC Client Access namespace
– SMTP namespace
• In addition, one of the datacenters will
maintain the Autodiscover namespace
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
8. Infrastructure Design
Leverage Split-brain DNS
Best Practice: Use “Split DNS” for
Exchange hostnames used by clients
Goal: minimize number of hostnames
mail.contoso.com for Exchange connectivity
on intranet and Internet
mail.contoso.com has different IP addresses in
intranet/Internet DNS
Important – before moving down this
path, be sure to map out all the host
names (outside of Exchange) that you will
want to create in the internal zone
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
9. Infrastructure Design
What does the namespace design look like?
External DNS
External DNS
Mail.contoso.com
Mail.region.contoso.com
Pop.contoso.com
Pop.region.contoso.com
Imap.contoso.com
Imap.region.contoso.com
Autodiscover.contoso.com
Smtp.region.contoso.com
Smtp.contoso.com
ExternalURL = ExternalURL =
mail.contoso.com mail.region.contoso.com
CAS Array = Datacenter 1 Datacenter 2 CAS Array =
outlook.contoso.com outlook.region.contoso.com
OA endpoint = OA endpoint =
mail.contoso.com mail.region.contoso.com
Internal DNS CAS HT HT CAS Internal DNS
Mail.contoso.com Mail.region.contoso.com
Pop.contoso.com Pop.region.contoso.com
Imap.contoso.com Imap.region.contoso.com
Autodiscover.contoso.com Smtp.region.contoso.com
Smtp.contoso.com Outlook.region.contoso.com
Outlook.contoso.com AD MBX MBX AD
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
10. Infrastructure Design
Certificate Planning
Best practice: minimize the number of certificates
1 certificate for all CAS servers + reverse proxy +
Edge/Hub
Use “Subject Alternative Name” (SAN) certificate
which can cover multiple hostnames
If leveraging a certificate per datacenter, then
ensure that the Certificate Principal Name is the
same on all certificates
Outlook Anywhere won’t connect if the Principal Name on
the certificate does not match the value configured in
msstd: (default matches OA RPC End Point)
Set-OutlookProvider EXPR -CertPrincipalName
msstd:mail.contoso.com
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
11. Infrastructure Design
Site Resilience Models
There are two key models you have to take into
account when designing site resilient solutions
Datacenter / Namespace Model
User Distribution Model
As mentioned, when planning for site resilience,
each datacenter needs to be considered active
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
12. Infrastructure Design
User Distribution Models
The locality of the users will ultimately determine
your site resilience architecture
Are users primarily located in one datacenter?
Are users located in multiple datacenters?
Is there a requirement to maintain user population in a
particular datacenter?
Active/Passive user distribution model
Database copies deployed in the secondary datacenter, but
no active mailboxes are hosted there
Active/Active user distribution model
User population dispersed across both datacenters with
each datacenter being the primary datacenter for its
specific user population
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
13. Infrastructure Design
Client Access Arrays
1 CAS array per AD site
Multiple DAGs within an AD site can use the same CAS array
FQDN of the CAS array needs to resolve to a load-balanced virtual IP
address in DNS
Should only resolve in internal DNS structure
CAS Array does not provide any load balancing -> you need a load
balancer!
Set the databases in the AD site to utilize CAS array via Set-
MailboxDatabase RPCClientAccessServer property
By default, new databases will have the RPCClientAccessServer value
set on creation
If database was created prior to creating CAS array, then it is set to
random CAS FQDN (or local machine if role co-location)
If database is created after creating CAS array, then it is set to the CAS
array FQDN
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
14. How should you design your DAGs
DATABASE AVAILABILITY GROUP
DESIGN
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
15. DAG Design
Database Copies
• Each DAG member can host 1 copy of
each mailbox database
• Maximum number of copies within a 16
member DAG:
– 1 copy – 1600 databases
– 2 copies – 800 databases
– 3 copies – 533 databases
• Two types of database copies
– HA database copies
– Lagged database copies
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
16. DAG Design
Lagged Database Copies
• Lagged copies are only for point-in-time protection
– Logical corruption and/or mailbox deletion prevention
scenarios
– Provide a maximum of 14 days protection
• When should you deploy a lagged copy?
– Useful only to mitigate a risk
– Not needed if deploying a third-party backup solution
(e.g. DPM 2010)
• Lagged copies are not HA database copies
– Lagged copies should never be activated!
• Lagged copies have storage implications
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
17. DAG Design
Controlling Database Copy Activation
• Various scenarios:
– Don’t want to activate database copies on servers in
standby because…
– Want to preclude activation of copies on server X because
of hardware issue or lagged copies…
– Block activation of database copies on a server during
upgrade
• Two ways to activation block copies
– Set-MailboxServer <Server> -
DatabaseCopyAutoActivationPolicy
<Blocked,IntrasiteOnly,Unrestricted>
– Suspend-MailboxDatabaseCopy <DBServer> -
ActivationOnly
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
18. DAG Design
Sizing
• Question: How many members should be in a DAG?
– Answer: It depends (maximum would be 16)
• The larger the DAG, better resiliency
– Consider the implications of a three copy/ six server DAG vs. two
DAGs with three servers and three copies of each database
– Larger DAGs continue to provide as much service as they can after
more failures
• The larger the DAG, the better efficiency of the hardware
– Distribute active load across all members
• For server count, consider a multiple of the number of copies
you are deploying
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
19. DAG Design
Sizing
• Question: How many DAGs should I deploy?
– Answer: It depends
• Obviously you will need to deploy multiple DAGs if
you need more than 16 servers
• You may also need multiple DAGs depending on
your site resilience architecture
– If deploying an Active/Active user distribution
architecture, then you should consider deploying 2+ DAGs
– allows you to control locality and not perform a site
activation in the event of a network failure between
datacenters
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
20. DAG Design
Active/Active User Distribution Sizing
Secondary Datacenter
Primary Datacenter
Outlook Outlook
DAG1 HT2010 CAS-Pri CAS-Sec HT2010
FSW
DAG1 Active Active
MBX-A MBX-B MBX-C MBX-D
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
21. DAG Design
Active/Active User Distribution Sizing
Secondary Datacenter
Primary Datacenter
Outlook Outlook
DAG1 HT2010 CAS-Pri CAS-Sec DAG2 HT2010
FSW FSW
DAG1 Active Passive
MBX-A MBX-B MBX-C MBX-D
DAG2 Passive Active
MBX-E MBX-F MBX-G MBX-H
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
22. DAG Design
Two Failure Models
• Design for all database copies activated
– Design for the worst case - server architecture
handles 100 percent of all hosted database copies
becoming active
• Design for targeted failure scenarios
– Design server architecture to handle the active
mailbox load during the worst failure case you plan to
handle
• 1 member failure requires 2 or more HA copies and 2 or
more servers
• 2 member failure requires 3 or more HA copies and 4 or
more servers
– Requires Set-MailboxServer <Server> -
MaximumActiveDatabases <Number>
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
23. DAG Design
It’s all in the layout
• Consider this scenario
– 8 servers, 40 databases with 2 copies
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
24. DAG Design
It’s all in the layout
• If I have a single server failure
– Life is good
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
25. DAG Design
It’s all in the layout
• If I have a double server failure
– Life could be good…
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
26. DAG Design
It’s all in the layout
• If I have a double server failure
– Life could be bad…
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8
DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36
DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37
DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38
DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39
DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40
DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’
DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’
DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’
DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’
DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
27. DAG Design
It’s all in the layout
• Now let’s consider this scenario
– 4 servers, 12 databases with 3 copies
Server 1 Server 2 Server 3 Server 4
DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12
DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’
DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’
– With 1a single Server 2
Server
server failure: 3
Server Server 4
DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12
DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’
DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’
– With 1a double server failure:
Server Server 2 Server 3 Server 4
DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12
DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’
DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
28. DAG Design
It’s all in the layout – Over Subscription
• If you plan to over subscribe the servers then:
– Don’t plan to be perfect!
– Set soft threshold for number of active databases per
server
• In some circumstances databases will fail to mount because
of limit
– Put processes in place for redistributing databases per
server
• After hardware maintenance
• After software maintenance
• Periodically – because of random failures
– SP1 includes a script to provide automated load
balancing
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
29. DAG Design
It’s all in the layout – Over Subscription
• If you plan to over subscribe the servers then:
– Educate your operations team on implication of over subscription
– Periodically validate you are not too over subscribed
• Run in your worst case scenario for a period of time
– Have a plan on how you handle being too over subscribed
• Reminders:
– Design storage subsystems to handle all database copy I/O and
capacity
– Design CPU and memory to handle the max active database copies
and the passive copies
– Design memory to handle the max active database copies
– Design network subsystem to handle the throughput required to
sustain the active load, the number of target copies, and CI updates
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
30. DAG Design
It’s all in the layout
• Consider physical hardware situations
where practical (JBOD in particular)
– If servers in DAG are in multiple racks then
spread copies across racks
– If servers are in different rooms in datacenter
then factor that into distribution
– If servers reside on the same network
switch/router, then a network failure can take
out multiple servers
– In summary, minimize possible single points
of failures on Microsoft’s Dev and ITPro technologies
Premium conference @itcampro / #itcampro
31. DAG Design
Storage Architecture
• Deployment on RAID or JBOD will be based
on several factors
– Cost
– Hardware
– Number of copies
– Types of copies
– Single or multi-datacenter
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
32. DAG Design
Storage Architecture
2 HA Copies 3+ HA Copies 2+ HA Copies 1 Lagged 2+ Lagged
(Total) (Total) / Datacenter Copy Copies /
Datacenter
Primary RAID RAID or JBOD RAID or JBOD RAID RAID or JBOD
Datacenter
Servers
Secondary RAID RAID RAID or JBOD RAID RAID or JBOD
Datacenter
Servers
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
33. DAG Design
Replication Concerns
• Replication is always from source to target
– Remember if you have multiple copies in a remote datacenter, you
will have multiple log streams being shipped across the wire
• Exchange 2010 offers compression for log shipping
– Controllable setting for the DAG
– Default is inter-subnet
– MSIT sees 30% compression, but can vary for each customer based
on message profile
• Also have to factor in content indexing
– While an index exists for every copy, the index for a passive copy is
updated by getting changes from active copy’s index
– This communication is not compressed
• How do I size for replication and content indexing impact?
– Use the Exchange 2010 Mailbox Server Role Requirements
Calculator
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
34. DAG Design
Replication Networks
• Single network DAG members fully supported
– Recommendation: have minimum of two networks on each
member server
• Initial DAG network configuration is based on the
enumeration of cluster networks
– Cluster enumerates networks based on subnet
– One cluster network is created for each subnet / port
– Recommendation: Collapse into single MAPI and Replication
DAG networks
• MAPI network may be replication disabled
– Network will be utilized for replication if no other valid
replication path exists
• There is no preference order to replication networks –
chosen at random by Replication service
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
35. DAG Design
Small Scale Architectures
• Small scale / branch office architectures that require high
availability
– 2-4 servers typically
– Requires Windows Server Enterprise Edition
• There are many different options:
Hardware Licensing
2 physical servers (all-in-one)* Requires Hardware Load Less licenses
Balancer
2 physical server architecture Less hardware More Exchange licenses
utilizing Hyper-V (role
separation via VMs)*
4 physical servers (role More hardware More Exchange and
separation – 2 MBX, 2 HT/CAS) Windows licenses
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
36. How should you design your DAGs
CLIENT EXPERIENCES
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
37. Client Experiences
Typical Outlook Behavior
• All Outlook versions behave consistently in a single
datacenter HA scenario
– Profile points to Client Access Server array
– Profile is unchanged by failovers or loss of CAS
• All Outlook versions should behave consistently in a
datacenter failover scenario
– Primary datacenter Client Access Server DNS name is
bound to IP address of standby datacenter’s Client Access
Server
– Autodiscover continues to hand out primary datacenter
CAS name as Outlook RPC endpoint
– Profile remains unchanged
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
38. Client Experiences
Cross-Site DB Failover Redirect (Outlook
Outlook 2003 can’t
Versions)
Autodiscover detects profile change
update if source CAS is
and updates client
unavailable
Outlook 2003 Outlook 2007 Outlook 2010
Outlook 2003
updates due to
ecWrongServer
Secondary Datacenter
Primary Datacenter
CAS-Sec HT2010
HT2010 CAS-Pri
Autodiscover detects
profile change and
updates client DAG
MBX-A MBX-B MBX-C MBX-D
Key
Active Preferred Database Site = PDC
Passive (RPCClintAccessServer = CAS-PRI)
Cross Site Connections = Not Allowed
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
39. Client Experiences
Other Clients
• Other client behavior varies per
technology and scenario:
In-Site *Over Scenario Out-of-Site *Over Datacenter Switchover
Scenario
OWA Reconnect Manual Redirect Reconnect
Active Sync Reconnect Redirect or proxy Reconnect
POP/IMAP Reconnect Proxy Reconnect
EWS Reconnect Autodiscover Reconnect
Autodiscover N/A Seamless Reconnect
SMTP / Powershell N/A N/A Reconnect
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
43. Conclusion
• There are many different design
dimensions that have to be considered
when designing for high availability and
site resilience with Exchange 2010
• The choices you will make will determine
the number of copies and hardware you
deploy
– Design choices should be based on customer
requirements
– Exchange 2010 allows you to take advantage
of new options which can lower costs
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
44. Q&A
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
45. Don’t forget!
Get your free Azure pass! We want your feedback!
• 30+15 days, no CC req’d • Win a WP7 smartphone
– http://bit.ly/ITCAMP11 – Fill in your feedback forms
– Promo code: ITCAMP11 – Raffle: end of the day
Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro