2. Agenda
DR Challenges & VMware Site Recovery
Manager
New features in SRM version 5
NetApp Value in VMware SRM environments
System and Software Requirements
SRM Workflows and Array Interaction
Best Practices and Configuration Rules
SRM and SRA Configuration Workflows
Limitations
2
4. Traditional Disaster Recovery
Involves:
– Complex processes and infrastructure
– Precise training, documentation, and execution
Requires:
– Dedicated, identical hardware
– Significant consumption of time and resources
– 2x to 3x the capacity used for production
– Unacceptable levels of WAN utilization
Results in:
– Inability to test or frequently failed tests
– Recovery times of days or weeks
– Ability to protect only a few important workloads
4
5. VMware Site Recovery Manager
Advanced workflow automation for DR setup,
testing and failover, and failback
vCenter™ SRM
VMware® ESX®
vSphere™
SRM vCenter
VMware ESX
vSphere
− Allows dual purposing
of hardware for
production or test/dev
− Protects more of
the environment
for less cost
− Integrates with NetApp
SnapMirror and
NetApp FlexClone®
Recovery SiteProtected Site
NetApp
SnapMirror®
5
6. VMware SRM Failover
Configure protection groups at primary site
Build recovery plans at the DR site
After disaster execute recovery plan at DR site
SnapMirror® break automatically performed
Protected Site Recovery Site
Protection Groups Recovery Plan
®
NetApp
SnapMirror
NetApp
SnapMirror®
6
7. NetApp
SnapMirror®
VMware SRM DR Testing
SRM DR testing: verifies that DR plan is
reliable without interrupting production
Automatically creates private network and
FlexClone® volumes for testing
Protected Site Recovery Site
Protection Groups Recovery Plan
7
®
8. VMware Site Recovery Manager
SRM is bidirectional
− Sites can protect each other
Protected / Recovery Site Protected / Recovery Site
Protection
Group
Recovery
Plan
Protection
Group
Recovery
Plan
NetApp
SnapMirror®
8
®
9. Site Recovery Manager Major Features
Protect
Test failover
Failover (unplanned)
Centralized administration
vSphere™ replication (host-based replication)
Failover performance improvements
Test failover with storage synchronization
Planned failover with storage synchronization
Automated failback
New in
SRM 5
9
10. Automated Failback in SRM 5
Reverses the SnapMirror® replication relationships
Resynchronizes storage replication in opposite direction
Reverses the roles of the two sites (only for the VMs in
the affected recovery plan)
Then failback is simply the planned failover workflow
NetApp FAS
Controller
Recovery Site
NetApp® FAS
Controller
Protected Site
FlexVol
Volume 3
LUN4
FlexVol
Volume 4
LUN5
FlexVol
Volume 7
FlexVol
Volume 8
LUN4
LUN5
NetApp
SnapMirror
NetApp
SnapMirror
FlexVol®
Volume 3
LUN4
FlexVol
Volume 4
LUN5
NetApp
SnapMirror
10
11. Centralized SRM 5 Administration
SRM 5 administration for both sites can be
performed by connecting to either site’s
vSphere™ client
Protected Site Recovery Site
vCenter™ SRM
VMware ESX
vSphere
SRM vCenter
VMware ESX®
vSphere
vSphere
Client
SRM
Administrator
11
12. vSphere Replication in SRM 5
12
SnapMirror
vSphere
Replicatio
n
Per VM granularity of replication
Datastore granularity of replication
Support for automated failback
Supports ESX hosts of different versions
Supports Physical Mode RDMs
Supports Fault Tolerance, Linked Clones
Supports powered off VMs
Can be used in same environment
13. SRM 5 Performance Improvements
VM reconfiguration step removed from
prepare storage step
– VMs can start power on as soon as each VM is
reconfigured
Multiple VMs powered on with one request
– Improves serialization of VM startup
New method for reconfiguration of VM IP
addresses
– Does not require additional reboots of VMs
13
15. NetApp® FAS Array
RDM RDM
Vol2Vol1
APP
OS
ESX® Cluster
VMFS
APP
OS
RDM Pointers
F: L:
NFS
NetApp FAS/V-Series Storage Replication
Adapter
Multiprotocol support
for FC, iSCSI, and
NFS in one adapter
Fully thin-provisioned
FlexClone® DR test
environments
Support for MultiStore®
vFiler® units as SRM
storage arrays
15
16. SnapMirror and FAS Deduplication
FAS deduplication on primary storage
Only unique data is replicated to the DR site
Protected Site Recovery Site
New Data Written
Before Dedupe
After
Dedupe
Data Deduplication
NetApp
SnapMirror®
16
17. SnapMirror Network Compression
SnapMirror® native compression
reduces WAN utilization
Recovery SiteProtected Site
After
Dedupe
Compression Decompression
NetApp
SnapMirror®
17
18. FlexClone: Space-Efficient DR Testing
NetApp FlexClone®
– Allows frequent nondisruptive testing
– Reduces capacity needed for DR testing to
only that written during tests
Aggregate capacity
Storage used by replicated datastores
Storage used for FlexClone volume creation
(metadata only)
Storage used for writes during DR testing
18
19. Virtual Storage Tiering with
NetApp Flash Cache
Provides the performance boost needed
during critical recover times
19
VMware® ESX®
Faster boot time
Less physical disks
required
No SSD required
Less disk I/O performed
Virtual tiering without
configuration overhead
21. VMware Requirements for SRM in vSphere
Installed at both protected and recovery sites:
− A vSphere™ vCenter™ Server
− A vSphere Site Recovery Manager Server
− SRM 4.1 requires vCenter Server 4.1
− SRM 5.0 requires vCenter Server 5.0
− ESX® Servers
− Multiple ESX versions from 3.5UX to 5.0 with a
mix of update releases are supported with both
SRM 4 and 5; see compatibility matrix for
appropriate SRM version at
www.vmware.com/support/pubs/srm_pubs.html
21
22. NetApp Adapter Requirements
The NetApp® Storage Replication Adapter
(SRA) is free software available to VMware®
SRM customers. Obtain the SRA from:
Software download page on now.netapp.com or
VMware SRM download page
www.vmware.com/go/download-srm
NetApp licenses required on protected and
recovery site storage
− SnapMirror®
− iSCSI, FCP, or NFS
− FlexClone®
22
23. NetApp Adapter Requirements
All NetApp® FAS and V-Series platforms qualified with
VMware® vSphere™ are supported
– See supported NetApp platforms at
www.vmware.com/resources/compatibility:
select Storage/SAN from What are you looking for box, select NetApp from
Partner Name box, and click the Update button
For SRM storage support per SRM version, see
www.vmware.com/pdf/srm_storage_partners.pdf
23
24. NetApp Adapter Requirements
NetApp Data ONTAP® version support
− 7.2.4 or greater required
− 7.3.2 or greater required for MultiStore® vFiler®
support
− Includes NetApp Data ONTAP 8 operating
in 7-Mode
Support for NetApp Data ONTAP operating in Cluster-
Mode is planned for future release of the NetApp
adapter
24
25. Data ONTAP 7-Mode and Adapter Version
Dependencies
NetApp Adapter
Version
Minimum Data
ONTAP* Version
Supported
SRM Version
1.4 NAS 7.2.2 4.x
1.4.2 SAN 7.2.4 4.x
1.4.3 (unified) 7.2.4 4.x
1.4.3 (using vFiler®) 7.3.2 4.x
2.0** (unified) 7.2.4 5.0
2.0** (using vFiler) 7.3.2 5.0
Current as of September 2011. Please check latest documentation for up-to-date support.
* 7-Mode only, including version 8. Support for Cluster-Mode is planned for a future version of the NetApp® SRA.
** SRA 2.0 requires SRM 5 and cannot be used with SRM version 4.
25
26. Replication Software Support
Supported Replication Products
– Volume SnapMirror®
– Qtree SnapMirror
Unsupported Replication Products
– SnapVault®
– Failover between MetroClusterTM nodes is not
supported however MetroCluster can be the
source or destination for SnapMirror with SRM
– Support for NetApp Data ONTAP operating in
Cluster-Mode is planned for future release of the
NetApp® adapter
26
27. Upgrading from SRM 4 to SRM 5
VMware® supports upgrade from SRM 4 to
SRM 5
– It is not an upgrade process, but a remove-
and-import process. Uninstall SRM 4, install
SRM 5, use import utility to import configuration
into SRM 5
In a NetApp® environment the SRM 4 adapter
must be uninstalled before uninstalling SRM 4
– Otherwise later uninstall of SRM 4 adapter will
fail and require manual uninstall
27
29. Test Failover with Storage Update
Test Recovery Workflow
– SRM optionally requests update of replication
– NetApp® SRA performs SnapMirror® update as
requested
– SRM requests a temporary copy of replica images
– NetApp SRA creates FlexClone volumes
– SRA adds LUNs to igroups or creates NFS exports
29
30. Planned Failover with Storage Update
Planned Failover Workflow
− SRM requests SnapMirror® update of replication
− SRM shuts down VMs at protected site
− SRM requests second update of replication
− SRM requests promotion of replica images
− SRA breaks SnapMirror relationships, making
storage writable
− SRA adds LUNs to igroups or creates NFS exports
− SRM recovers VMs at protected site
30
31. Reprotect for Automated Failback
Reprotect Workflow (to prepare for failback)
− SRM requests reversal of replication
− SRA performs SnapMirror® resync in reverse
direction (which synchronizes replication)
− SRM reverses roles of protected and recovery sites
for affected protection groups
− SRM administrator may now do planned failover to
fail back to original site
31
33. SRM Best Practices
Following SRM best practices means following
required practices, described below, to have a
successful SRM test failover
– The first few tests usually fail
– Follow the prescribed setup workflows
– Make configuration checking part of setup
before attempting test failover
Clone AD servers for DR testing
– Microsoft best practice is to not replicate AD
servers
33
34. Upgrading from SRM 4 to SRM 5
VMware® supports upgrade from SRM 4 to
SRM 5
– It is not an upgrade process, but a remove-
and-import process. Uninstall SRM 4, install
SRM 5, use import utility to import configuration
into SRM 5
In a NetApp® environment the SRM 4 adapter
must be uninstalled before uninstalling SRM 4
– Otherwise later uninstall of SRM 4 adapter will
fail and require manual uninstall
34
35. Required Practices for NetApp Adapters
Source volume must be replicated to only
one destination
− Volume fanout with SnapMirror® is not supported
− Failover to second or further destination in a
SnapMirror cascade relationship is not supported.
For example: In A B C cascade, failover
between A and B is supported, failover between A
and C is not supported.
35
36. Required Practices for NetApp Adapters
MultiStore® vFiler® support requires zapi
option enabled on physical controller
>options vfiler.vol_clone_zapi_allow on
36
37. Required Practices for NetApp Adapters
LUNs at source must be in igroup of type
“vmware”
Note: RDMs use LUN type of Guest OS, igroup type of “vmware”
Adapter 1.4.x and earlier requires igroups
preexist at recovery site
– Don’t forget about creating igroups in destination vFiler® units
Adapter 2.0 for SRM 5 automatically creates
igroups during failover and test failover
Replicated LUNs must not be preadded to
igroups; SRM adds them for test and failover
37
38. Required Practices for NetApp Adapters
Exports must be in /etc/exports file
− Temporary manual exports are not discovered
Exports must use values in RW security field
− Exports RW to all are not discovered
Discoverable: /vol/vol1 -rw=192.168.2.0/24,root=192.168.2.0/24
Not discoverable: /vol/vol1 -rw,root=192.168.2.0/24
Datastores must have VMs in them to be
discovered
38
39. Each NetApp controller or vFiler® unit is a
separate array in Site Recovery Manager
A VM must
have data on
only one array
in each site
NetApp® FAS
Array A
VM
5
FAS HA Pair
NetApp FAS
Array B
VM
6
NetApp FAS
Array C
VM
5
FAS HA Pair
NetApp FAS
Array D
VM
6
Supported Replication Layouts
Protected Site Recovery Site
39
40. NetApp® FAS
Array A
VM
5
FAS HA Pair
NetApp FAS
Array B
NetApp FAS
Array C
FAS HA Pair
NetApp FAS
Array D
Unsupported Replication Layouts
A VM with data on more than one array at
either site cannot be protected with SRM
Protected Site Recovery Site
VM
5
RDM RDM
40
41. Using Qtrees with SRM
If using volume SnapMirror (VSM) with
multiple qtrees exported as NFS datastores or
each containing LUNs
– Single qtree failover is possible but not
recommended, use one recovery plan for all qtrees
– Failback of one qtree in a volume with multiple
qtrees is not supported as this could affect other
VMs at the failback target site
Using VSM replication with volume-level
export, but qtree in volume as mount point is
not supported
41
42. Using Qtrees with SRM
Recommendation
Use same level for replication and
datastore
– If using VSM, export and mount the volume
or store LUN in the volume
– If using QSM, export and mount the qtree or
store LUN in the qtree
42
43. Multiple LUNs In One Volume
With multiple LUNs in one volume all LUNs in
that volume should be failed over in the same
recovery plan
– Failback of one LUN in a volume with multiple
LUNs is not supported as this could affect
other VMs at the failback target site when the
VSM relationship is reversed
43
44. Mixed iSCSI and FC Environments
Supported: Failover in either direction
between sites where one site is using FC and
the other site is using iSCSI is supported
Not Supported: Failover to ESX® hosts having
a mix of iSCSI and FC in same cluster or
recovery group is not supported by VMware®
or NetApp®
44
46. Prerequisites and Recommendations
1. There is VMware® infrastructure at each site
– vCenter™ server and ESX® servers
– VMware licensing
2. Install VMware SRM application at each site
– Typically installed on its own VM
– Can share a database server with vCenter
– Enable HTTP access between SRM servers (port 80)
3. Install SRA on the SRM server at each site
4. Supporting infrastructure at each site
– Active Directory for authentication
– DNS for name resolution
– Create a VM placeholder datastore at each site
46
47. Configuration Workflows
Perform configuration checking as a part of
the setup workflow
At protected site:
1. Verify LUNs are in igroup of type “vmware”
2. Verify NFS exports have –rw security entries
3. Verify proper SnapMirror® relationships exist
47
NetApp
48. Implementation Workflows
At recovery site:
1. Verify controller (or vFiler®) has igroup with OS
type “vmware” (not needed for version 5)
2. Verify proper SnapMirror® relationships exist
3. Verify storage network connectivity between
NetApp® storage ports and ESX® VMkernel ports
(Ethernet VLANs, FC zoning, etc)
4. Provision storage for placeholder VMs
5. Create private DR testing network if required
6. Check host VM ownership if not using DRS
(if not using VMware® DRS, VMs are started on the ESX
host that owns the placeholder VM)
NetApp
vCenter
48
49. Implementation Workflows
SRM 5 has clickable workflows in the vSphere™
client interface on the SRM Getting Started tab
49
Follow the steps
in order for a
successful SRM
setup
50. Using the NFS IP Addresses Field
When adding the NetApp® controller in the
Array Manager, enter the controller NFS
addresses in the NFS IP addresses
See network layout example on following slide
50
51. Using the NFS IP Addresses Field
192.168.50.50
192.168.51.50
FAS Controller
Storage IPs
Private Storage
Network
FAS Controller
Admin IP
192.168.10.50
NAS Shared
Storage
Admin Network
51
Enter into NFS IP
Addresses field
52. Volume Filtering in NetApp SRA 2.0
In SRM 5, replicated volumes that are not part
of the VMware® environment may be reported
with an error or warning in the SRM interface
In the above example the vmcoe volumes are
not a desired part of this SRM environment
52
53. Volume Filtering in NetApp SRA 2.0
The volume filter fields on the array manager
configuration screen can be used to include or
exclude certain volumes from SRM discovery
53
Volumes containing
the text “vmcoe” are
excluded
54. SnapMirror by IP Address with SRA 2.0
If SnapMirror® relationships are created on
the destination controller using source IP
address as shown here:
At protected site SnapMirror status shows:
Source Destination State Lag Status
f3170a:volsrc f3170c:voldst Source 00:05:04 Idle
At recovery site SnapMirror status shows:
Source Destination State Lag Status
10.72.192.75:volsrc f3170c:voldst Snapmirrored 00:09:29 Idle
54
IP address instead of host
name of source controller
55. SnapMirror by IP Address with SRA 2.0
Then you must configure the
use_ip_for_snapmirror_relation option in the
ontap_config.txt file at each site
And configure the IP address to hostname
mapping in the ip_hostname_mapping.txt file
at each site as shown here:
f3170a = 10.72.192.75
f3170c = 10.72.192.78 (entries are case sensitive)
Configuration files are by default at
C:Program Files (x86)VMwareVMware vCenter Site Recovery
ManagerstoragesraONTAP
55
57. Limitations
Automated Storage DRS Considerations
– SRM 5 is not yet integrated with vSphereTM 5
Automated Storage DRS
– If Storage DRS performs a migration of a VM
from a replicated datastore to a non-replicated
datastore the migrated VM will no longer be
protected
57
58. Limitations
When reversing SnapMirror® relationships
SRA will configure same replication schedule
on new destination
– However, currently, compression and tcp window
size cannot be set by SRA and must be set
manually after reversal if nondefault setting is
required
58
59. Limitations
After reversing SnapMirror® relationship, SRA
2.0 does not remove SnapMirror Snapshot™
copies that were used for replication in the
other direction
– After replication reversal administrator can remove
snapshots (see process in notes)
– A solution is being planned for a future SRA
release
59
60. Limitations
iSCSI initiators should be disabled in the
ESX® recovery hosts if those hosts are also
using FC and ALUA
– If an FC connected ESX host has the iSCSI initiator
enabled, then SRM will include both the FC and
iSCSI initiators in the failover connection request
– Data ONTAP® does not support adding a LUN to
an iSCSI igroup and an FC ALUA-enabled igroup
at the same time
– This configuration is also not supported by
VMware® SRM
60
61. Limitations
Non-quiesced SVMI snapshot recovery
feature
– Not available in SRM 5 adapter
– Supported only with SRM 4 adapter 1.4.3
– Has very limited use cases
– Has specific configuration requirements
(See appendix of TR-3671 and notes below)
61
62. Field Resources
SE Technical Presentation on Field Portal
https://fieldportal.netapp.com/viewcontent.asp?qv=1&docid=36857
– Describes NetApp capabilities, values, best
practices, requirements, and limitations in a SRM
environment
– Contains links to matrices, docs, and articles
Customer Presentation on Field Portal
https://fieldportal.netapp.com/viewcontent.asp?qv=1&docid=24728
– Sales enablement presentation covering NetApp
SnapMirror integration with SRM
– Contains a subset of SE deck slides
62
63. Resources
NetApp SRA Administration Guide and Release
Notes in SRA package and on the NOW® site
SRM compatibility matrices for SRM, VC, ESX/ESXi
www.vmware.com/support/pubs/srm_pubs.html
For SRM storage support per SRM version
www.vmware.com/pdf/srm_storage_partners.pdf
VMware SRM download page
www.vmware.com/go/download-srm
Supported NetApp platforms
www.vmware.com/resources/compatibility:
select Storage/SAN from What are you looking for box, select NetApp
from Partner Name box, and click the Update button
VMware SRM Documentation Site
www.vmware.com/support/pubs/srm_pubs.html
63
64. Additional Resources
NetApp TR-3671: VMware vSphere Site Recovery
Manager in a NetApp Environment
media.netapp.com/documents/tr-3671.pdf
(SRM 4 only, work in progress for SRM 5 update)
RBAC rights for NetApp SRM version 4 Adapters
https://kb.netapp.com/support/index?page=content&id=1010829
RBAC rights for NetApp SRM version 5 Adapter
https://kb.netapp.com/support/index?page=content&id=1013325
64
Traditional disaster recovery plans depend on a very complex set of processes and infrastructure: duplicate server infrastructure, identical storage infrastructures, processes for getting data to a recovery site, processes for restarting servers, processes for reinstalling operating systems and/or applications, and so on. Because of this complexity, organizations depend heavily on significant amounts of personnel training, on the accuracy and completeness of the documented recovery process, and on perfect execution of that process when an outage does occur. Testing can be disruptive and expensive; organizations have a limited ability to make sure that all of their training, documentation, and execution is practiced and successful. Using traditional storage technologies results in the requirement for 2 to 3 times the capacity, a complete copy of storage at the DR site, and in some cases a third copy of data to perform DR testing. In addition, WAN utilization levels can be unacceptable.This is why tests of recovery plans often fail; basic recovery of critical workloads – if successful at all – often takes days or weeks and a significant amount of IT time and resources. Recovery can also cause unacceptable levels of WAN usage.Most firms fail to meet the continuity requirements set by their organizations and find themselves unable to provide protection for more than a few of their production workloads, leaving other workloads (e.g. file/print servers, internal web servers, departmental applications) unprotected or poorly protected.
One of the most valuable features in SRM is it’s ability to enable non-disruptive DR testing. When a DR test is performed SRM will provision a private network at the recovery site, connect the VMs to that private network and power them on. SRM can automate the shutdown of any VMs at the recovery site, such as dev/test VMs, to free compute and memory resources for the DR test and for real failover. SRM integrates with NetApp FlexClone to automatically provision FlexClone volumes for use in the DR test.
Configuring protection, performing test failover, and failover for an unplanned outage are workflows that have been in SRM since version 1. In addition to those there are several new workflows added in SRM version 5. These new workflows are described on the following slides.
The most important new feature in SRM 5 is automated failback. Prior to version 5 array based replication had to be manually reversed and the SRM environment completely reconfigured to use SRM for failback to the original site. SRM 5 introduces a new workflow called Reprotect. After performing a planned failover you execute the reprotect workflow to prepare for failback. In cases of unplanned failovers the reprotect workflow can also be executed if the original storage was not permanently lost and the storage has been recovered to an online state. When the reprotect workflow is executed SRM will use the NetApp storage adapter to reverse the SnapMirror relationships and resynchronize the storage in the opposite direction. With snapmirror only the delta of new data written at the recovery site since failover must be replicated back to the original site in a case where the original storage still survives. After the reprotect workflow is executed the roles of the two SRM sites are now reversed, with the original protected site becoming the recovery site and the original recovery site becoming the protected site. Performing an automated failback is now simply a matter of executing the planned failover workflow which properly shuts down the VMs, synchronizes storage, and starts the VMs at the recovery site.
To enable smaller environments that might not have array based replication software to utilize some of the workflows of SRM, VMware has introduced a host base replication capability in SRM 5 called vSphere Replication. This feature uses a replication appliance (a VM) that runs at each site which drives the replication process. Replication is managed through the vCenter SRM plugin as a property of each VM allowing per VM granularity of replication. This feature does support the planned/unplanned failover and test failover workflows in SRM but it does not support the reprotect workflow required for automated failback. While SRM 5 does support ESX hosts from 3.5UX through version 5, vSphere replication requires that ESXi hosts be running vSphere ESXi version 5.Array based replication such as NetApp SnapMirror is a more efficient means of replicating multiple VMs in one replication job by way of replication of entire datastores.Both array based and vSphere Replication can be used in the same environment.
SRM 5 introduces some improvements to the recovery workflows that significantly reduce the time required to recover VMs. When a VM is recovered at the recovery site the VM settings (as in configuration information stored in the vm.vmx file) had to be reconfigured to proper values to be used at the recovery site, such as the unique identifier name (UUID) of the datastores the VM was stored in, and the networks the VM was to be connected to. Prior to SRM version 5 this reconfiguration was performed per VM in a serial fashion and was done during the prepare storage step of the SRM recovery plan. This meant that no VMs could begin to boot until all VMs had been reconfigured and the recovery plan moved on to the next step to start the VM startup process. In SRM 5 the VM reconfiguration process has been moved to an independent step, where multiple VMs can be reconfigured and each VM may be powered on as soon as it has been reconfigured. SRM 5 also makes use of a new vSphere API which allows multiple VMs to be started in one API request.Some environments require that the VM guest OS be reconfigured with different network information such as IP address, subnet mask, or DNS server. To perform this reconfiguration in SRM 4 a VM customization specification could be created which defined the network configuration setting which needed to be changed. The process of applying customization specifications to VMs involves booting the VM once to set the customization, then rebooting the VM once more to apply the customization and start the VM, this cause a 2X increase in recovery time for each VM. SRM 5 now makes use of the VMware VIX API which allows network configuration to be performed without an additional reboot of the VM.
The NetApp Disaster Recovery Adapter version 1.4.3 for SRM version 4, and the new NetApp FAS/V-Series Storage Replication version 2.0 which must be used with SRM version 5 are both multiprotocol adapters supporting FC, iSCSI, and NFS VMware storage protocols in one adapter.The NetApp adapters create fully thin provisioned FlexCone environments for SRM test failover, automatically turning off or disabling volume and LUN space guarantees in created FlexClone volumes, to prevent a requirement for 2X capacity to perform a DR test and to perform testing without interrupting replication.The NetApp adapters also supports configuration of MultiStore vFiler units as storage arrays in SRM.Support for NetApp Data ONTAP operating in Cluster-Mode is planned for future release of the NetApp adapter
Volume SnapMirror can also take advantage of the native WAN compression capabilities to further reduce network utilization in low bandwidth WAN environments.
NetApp FlexClone technology allows replicated data to be instantaneously made writable and presented to the ESX hosts for storage. This enables very quick and space efficient DR testing with VMware SRM. The SRM DR testing component leverages FlexClone functionality to create a copy of the DR data in a matter of seconds, requiring only a small percentage of additional capacity for writes that occur during testing. Because of the low capacity requirements and quick provisioning provided by FlexClone, DR test environments can be created frequently to allow more aggressive and regular DR testing schedules.
During recovery from a DR event performance of the system is critical. NetApp Flash Cache and FAS Deduplication provide significantly faster VM boot times which can dramatically improve overall recovery times.Deploying NetApp Flash Cache results in the need to purchase fewer expensive disk, eliminates the need to deploy expensive SSD drives, reduces overall disk I/O and provides a virtual storage tiering that requires no administrative overhead or time to configure. Shared data that is read most often by the ESX hosts is readily available in high speed cache requiring no disk overhead and increasing overall RTO times.FAS6240 and FAS6280 now come standard with NetApp Flash Cache.
Upgrading from SRM version 4 to version 5 requires an uninstall of the SRM 4 software then an install of the SRM 5 software and use of a configuration import utility provided by VMware. In NetApp environments it is important to note that you must uninstall the SRM 4 storage adapter prior to uninstalling SRM version 4. If SRM version 4 is uninstalled before you attempt to uninstall the NetApp adapter then the adapter uninstall will fail and you will have to manually uninstall the storage adapter by removing the software and editing the windows registry.
There are several ways configure storage layouts and replication in VMware environments using NetApp storage. Adhering to best practices described in documents such as TR-3749 NetApp and VMware Storage Best Practices will allow an environment to be supported by SRM. It’s recommended to follow the configuration workflows described in the NetApp FAS/V-Series Storage Replication Adapter admin guide and release notes and to make checking the environment configuration a part of configuration of the SRM environment. This helps to ensure that the first test failover attempted might be a successful one.In Microsoft Windows environments Microsoft does not recommend replication of Active Directory servers as this can lead to issues with out of sync AD databases and inability for an AD server to service login attempts. See http://support.microsoft.com/kb/875495 for information about AD issues. Instead of replicating AD servers you should have AD servers permanently provisioned at your SRM recovery site. To provide name resolution and user authentication services in the DR Test Network, clone the AD server at the recovery site just prior to running the DR test. Once the cloning is done, before powering on the VM, be sure to connect the cloned AD server to the DR test network. After the AD VM is powered on in the test network, five Flexible Single Master Operations (FSMO) roles in the Active Directory forest must be seized as per the procedure described in the following Microsoft KB: http://support.microsoft.com/kb/255504. The five roles are Schema master, Domain naming master, RID master, PDC emulator, and Infrastructure master. The cloned AD server will now be operating privately within the DR test network and can provide AD services for VMs in test failover mode.
Upgrading from SRM version 4 to version 5 requires an uninstall of the SRM 4 software then an install of the SRM 5 software and use of a configuration import utility provided by VMware. In NetApp environments it is important to note that you must uninstall the SRM 4 storage adapter prior to uninstalling SRM version 4. If SRM version 4 is uninstalled before you attempt to uninstall the NetApp adapter then the adapter uninstall will fail and you will have to manually uninstall the storage adapter by removing the software and editing the windows registry.
Today Site Recovery Manager provides no mechanism for the SRA to report which destination in a multiple-destination replication scenario is the one intended to be used by Site Recovery Manager for DR failover. For this reason each SnapMirror relationship must be replicated to only one destination. If you’re having problems with Site Recovery Manager not properly discovering a replicated datastore, use the SnapMirror status or SnapMirror destinations command on the source system to determine if there are any other SnapMirror relationships for that same volume. There might be relationships left over from a data migration or a lab setup.In a cascaded SnapMirror relationship only the first hop of the SnapMirror transfer, from A to B in an A to B to C scenario, may be used with SRM.
To support MultiStore vFiler units as storage arrays in SRM you must turn on the vfiler.vol_clone_zapi_allow option on the physical controller hosting the vFiler. This option allows FlexClone commands to be sent directly to the vFiler.
For LUNs to be recovered by SRM on a NetApp FAS/V-Series storage array you must have LUNs in an igroup type of “vmware”. Remember that RDM LUNs (LUNs which are connected to the ESX host and then provisioned to VMs) must also be in an igroup of type “vmware” but that the LUN type would be that of whatever the guest OS required.For Disaster Recovery Adapter version 1.4.3 and earlier you must pre-create igroups and add initiators at the recovery site. Typically there will already be some storage connected to the ESX hosts at the recovery site (such as the SRM datastore for temporary placeholder VMs) and so an igroup will already exist there. Do not forget if using MultiStore vFiler units as storage arrays to create igroups and add initiators, as vFilers might not already have LUNs connected and so may not have any igroups.A new feature that was introduced in NetApp Fas/V-Series SRA version 2.0 is the automatic creation of igroups. SRA 2.0 will always automatically create igroups for failover tests, and during real recovery if no igroup exists that exactly matches the initiators contained in the SRM recovery request then the SRA will create a new igroup. Note that SRA 2.0 requires SRM version 5.You should never pre-add replicated LUNs to an igroup at the recovery site. This will generate an error during recovery. SRM must be allowed to add the LUNs to the igroup.
In order for Site Recovery Manager and the SRA to properly map NFS mounts to ESX hosts, the exports must be listed in the /etc/exports file on the source array. Exports created manually by using the exportfs on the cli will not be detected by Site Recovery Manager. You can use the cli exportfs command with the –p option (permanent) to add the export to the /etc/exports file automatically.Exports must also contain values in the rw field of the export security settings. If a share is exported using rw with no values (rw to everyone), the Site Recovery Manager and the SRA will be unable to determine that the share is utilized specifically by ESX hosts in the Site Recovery Manager environment.SRM will not allow you to protect empty datastores. If you would like a datastore to be protected then you can create a dummy VM in that datatore, no OS required in the VM, then SRM will detect the datastore and allow you to protect it.
Site Recovery Manager requires a 1-to-1-to-1 relationship between [VM] – [protection group] – [array manager].In this example these layouts are supported because each VM (VM5 and VM6) has data on only one array at either site.
Site Recovery Manager requires a 1-to-1-to-1 relationship between [VM] – [protection group] – [array manager].In this layout VM 5 would not be supported for recovery by Site Recovery Manager because at the recovery site VM 5 has data on both Array C and a RDM on array D. This would also be unsupported if VM5 were to have data on both arrays at the protected site.The requirement is due to behaviors in both SRM and the NetApp adapter. NetApp controllers are administered via API separately. This means each controller has to be added to SRM as an array manager. SRM makes an array manager call to recover all the storage required by a set of VMs. SRM does not support making two array manager calls to recover one VM. If a VM is configured with storage on more than one array it cannot be recovered because SRM will not make two array manager calls to recover a group of storage devices for a single VM (one call to one controller then a 2nd call to the other).
There are many ways to configure storage and replication in NetApp and VMware environments. The recommendation is to use volumes as NFS datastores, or to store LUNs in a volume and use volume level SnapMirror.It is possible to configure multiple qtrees in one volume each as a NFS datastore, or to provision multiple LUNs inside a volume, or multiple qtrees with a LUN in each qtree, however these configurations have implications in SRM deployments, especially as it pertains to the new failback capabilities in SRM 5.If you are using volume level SnapMirror and have provisioned multiple qtrees in a volume, and are exporting each qtree as a different NFS datastore SRM will support this for failover. However there are implications for doing this with SRM. If you failover any of the qtrees in such an environment because you are using volume level SnapMirror the SRA must perform a SnapMirror break for the whole volume including all the qtrees. However, only VMs and qtrees in the failed over recovery plan will be recovered by SRM. If you then perform a failback with SRM 5 of one subset of the qtrees in that volume there is risk of disrupting the non-failed over qtrees at the target failback site. The same is true of a configuration that uses qtrees with a LUN provisioned in each qtree. The same is true of multiple LUNs provisioned in one volume without qtrees.If you are using volume level SnapMirror, and have provisioned qtrees in the volume, and configured one NFS export at the volume level, but you mount each qtree in the volume as a separate datastore this configuration is not supported and will report an error in SRM.
Mixing of volume level mirroring, qtree level mirroring, storing LUNs in qtrees versus LUNs in volumes, using volume level NFS exports versus qtree level exports, can all be problematic for a SRM environment as SRM and the SRA attempt to match resources from the protected and recovery sites.The recommendation is to use the same granularity (volume or qtree) for replication, NFS export, and NFS mount point to avoid issues.
If you are using volume level SnapMirror and have provisioned multiple LUNs in a volume single volume you should configure all the LUNs in one volume into a single recovery plan to support failback. If you configure each LUN into a different recovery plan and failover any individual LUN, because you are using volume level SnapMirror the SRA must perform a SnapMirror break for the whole volume including all the LUNs. However, only VMs and qtrees in the failed over recovery plan will be recovered by SRM. If you then perform a failback with SRM 5 of one of the LUNs in that volume there is risk of disrupting the non-failed over LUNs at the target failback site.
SRA 2.0 cannot support mixed ALUA configurations. Mixed ALUA configuration is one where a single ESX host, or multiple ESX hosts in the same ESX cluster, has some initiators configured in ALUA enabled igroups and the same ESX host or hosts have other initiators configured in ALUA disabled igroups. An example of an unsupported single ESX host configuration would be one where some initiators are used in ALUA enabled igroup for VMFS LUNs and different initiators are in ALUA disabled igroup for RDM LUNs to support MSCS in a VM. An example of an unsupported ESX cluster configuration would be an ESX cluster that contains both ESX 3.5 hosts and ESX 4.x or 5.0 hosts where the ESX 3.5 initiators must be in a ALUA disabled igroup and the ESX 4.x or 5.0 hosts should be in a ALUA enabled igroup, and where SRM resource mappings are done at the cluster level.
The SRA in array managers, note the IP addresses from the NetApp storage networks are added into the NFS IP Addresses field. Multiple addresses are supported, separated by commas. SRM 4 and 5 supports NAS datastore connections on private storage networks and ESX host connections to a single storage controller on multiple addressesDesigns are described in TR-3749 NetApp and VMware vSphere Storage Best Practices http://media.netapp.com/documents/tr-3749.pdf
In this environment we have a private network for storage, where we’ve configured two subnets to use to connect to storage. Some datastores are mounted over one subnet and some over the other.
The storage discover process was changed in SRM 5 to report replication direction in the SRM interface. Because of this any replicated storage devices that are detected that are not part of the SRM environment may show up in the SRM interface with an error or warning on them. This can be prevented using the new volume filtering capability in SRA 2.0.
To prevent the undesired storage devices from showing up in the SRM interface you can use the volume include and volume exclude lists on the edit array mangers screen. If you enter a string of text in the volume include list the SRA will report only storage devices (NFS datastores or LUNs) on volumes where the string entered is contained in the volume name. If you enter a string of text in the volume exclude list the SRA will omit any storage devices (NFS datastores or LUNs) on volumes where the string entered is contained in the volume name.You can enter multiple strings separated by a comma to include or exclude multiple string patterns.
If you are using an IP address, connection name, or non-default host name of the source system in a snapmirror relationship then some extra configuration must be done to enable the SRA 2.0 to support this configuration.
This configuration information is required for SRA 2.0 in order to support use of IP addresses when the SnapMirror relationships are reversed. If the customer environment requires the use of IP addresses for connecting SnapMirror relationships (for example they cannot have name resolution for private replication networks) then the SRA at each site must have an internal way of resolving those IP addresses into hostnames. Using the use_ip_for_snapmirror_relation option and the ip_hostname_mapping.txt file provides support for this type of environment.The configuration files for the NetApp adapter are stored on each SRM server by default at C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\storage\sra\ONTAP Entries in the ip_hostname_mapping.txt file are case sensitive.
The default interval for discovery can be changed with the storage.storagePingInterval advance setting, right click on a site in the SRM sites tab, select storage then change value of storage.storagePingInterval value in seconds.
When reversing replication relationships SRM will request that the SRA check the configuration of the existing SnapMirror relationship to determine information about that relationships, such as the update schedule. The SRA will apply the SnapMirror update schedule to the reversed relationship. However the Data ONTAP the API call that sets relationship option does not set a value for compression or SnapMirror TCP window size (wsize). If a customer requires non-default values for compression (off) or window size (2MB) then these settings should be applied after the relationship has been reversed.
To clean up SnapMirror after replication reversal:1.Release snapshots on current destination: on_current_destination> snapmirror release vol_namecurrent_source_filer:vol_name (this command simply removes the locks on the un-necessary snapshots at the current destination location)2. Delete older SnapMirror snapshots from the current source: on_current_source> snap delete snapshot name (For each snapshot to delete, there are typically two. As SnapMirror named snapshots include the name of the active destination system, you can safely delete the SnapMirror snapshots that contain the name of the current source system)3. Allow the next scheduled update to propagate the snapshot deletion to the current destination, or perform SnapMirror update on current destination: on_current_destination> snapmirror update -S current_source_filer:vol_namecurrent_dest_filer:vol_name
This is the same issue described on slide titled “Mixed iSCSI and FC Environments”Because SRM will include both iSCSI and FC initiators in the same request and NetApp Data ONTAP does not allow iSCSI initiators to be in ALUA enabled igroups, then if a customer is using FC with ALUA enabled the ESX recovery host must have the iSCSI initiator disabled. SRM will include the iSCSI initiator in the failover request even if there are not iSCSI targets configured in the initiator, so the iSCSI initiator must be disabled, not simply un-configured.Note from that slide repeated here: SRA 2.0 cannot support mixed ALUA configurations. Mixed ALUA configuration is one where a single ESX host, or multiple ESX hosts in the same ESX cluster, has some initiators configured in ALUA enabled igroups and the same ESX host or hosts have other initiators configured in ALUA disabled igroups. An example of an unsupported single ESX host configuration would be one where some initiators are used in ALUA enabled igroup for VMFS LUNs and different initiators are in ALUA disabled igroup for RDM LUNs to support MSCS in a VM. An example of an unsupported ESX cluster configuration would be an ESX cluster that contains both ESX 3.5 hosts and ESX 4.x or 5.0 hosts where the ESX 3.5 initiators must be in a ALUA disabled igroup and the ESX 4.x or 5.0 hosts should be in a ALUA enabled igroup, and where SRM resource mappings are done at the cluster level.
By default the NetApp adapter recovers FlexVol volumes to the last replication point transferred by NetApp SnapMirror. The 1.4.3 release of the NetApp SRM adapter provides the capability to recover NetApp volume snapshots created by NetApp SnapManager for Virtual Infrastructure. This feature is not available in NetApp FAS/V-Series Storage Replication Adapter 2.0 for SRM 5.This functionality is currently limited to support for snapshots created without using the option to create a VMware consistency snapshot when the SMVI backup is created. Considering that by default the adapter recovers the same type of image as that created by a non-quiesced SMVI created snapshot, this feature currently has limited use cases. An example use case for this functionality would be that an application requires recovery to the specific point in time that was created by the SMVI backup. For example, an application runs in a VM that cannot be recovered unless it is recovered from a specific state. A custom script is used with SMVI to place the application into this state and during this state the SMVI backup is performed creating the NetApp Snapshot. The NetApp Snapshot now recovered by the adapter will contain the VM with the application in the required state. non-quiesced SMVI snapshot recovery Configuration rulesThe following configuration limitations apply when using the option to recover non-quiesced SMVI snapshots with the NetApp adapter:This feature requires SMVI version 2.0 or newer.Only recovery to the most recently created SMVI snapshot is allowed.The option to create a VMware consistency snapshot when the SMVI backup is created must be disabled for the SMVI job that creates the backup used for this purpose. (The NetApp adapter will determine if this option was disabled before allowing use of the snapshot.)The option is only supported with volume level Asynchronous SnapMirror.There should be only one VMware datastore in each NetApp FlexVol being recovered.A SnapRestore license is required on the NetApp system.This is a global option that is set for all recovery plans executed while the option is set. To enable it for specific recovery plans the option must be changed before running the desired plan.