SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




BEST PRACTICE GUIDE

REPLICATION WITH ETERNUS CS800


OVERVIEW


Replication is a feature of the ETERNUS CS800 Series de-duplication appliances that uses TCP, an Ethernet protocol, to efficiently transport a complete copy of user
data residing on one ETERNUS CS800 (“the source”) to another ETERNUS CS800 (“the target”). High efficiency is achieved by transporting only the unique data
blocks plus metadata from source to target.



SCOPE
Intended Audience: End Users, System Engineers, RTS, Resellers
This document provides best practice guidance when configuring replication between ETERNUS CS800 de-duplication appliances. This is not intended to be a
standalone document.



OBJECTIVE
The value of replication is Disaster Recovery (DR).
          ■The target ETERNUS CS800 can failback a copy of the data to the same or another ETERNUS CS800.
          ■The target ETERNUS CS800 may be used to directly access the user data at the DR site.
          ■The target ETERNUS CS800 may be physically relocated to another server location for access to the user data.



DEFINITION OF TERMS
A variety of replication terminology is used in this document. This document makes every attempt to use the same terminology as introduced in the ETERNUS
CS800 User’s Guide.
           ■Adaptive de-duplication – The mode of de-duplication which allows data de-duplication to run concurrent with the backup being ingested. The de-
             duplication process will adapt to the speed of the ingest.
           ■Backup Window – In normal use, “backup window” refers to the customer-defined period of time during which the customer data is backed up. It usually
             has a clearly identifiable start and stop time. When used together with deferred de-duplication in an ETERNUS CS800 context, the “backup window”
             refers to a “reservation window” during which de-duplication is suspended so that all Disk I/O can be applied to maximize data ingest in order to
             minimize the normal user backup window. In order to minimize confusion about which “backup window” is being discussed, this document will refer to
             this ETERNUS CS800 deferred de-duplication backup window as the “deferred de-duplication window”.
           ■Deferred de-duplication – The mode of de-duplication which begins only after the deferred de-duplication window. Typically, deferred de-duplication
             begins after the backup ingest is complete.
           ■Deferred de-duplication window – A defined window during which no de-duplication will take place. This allows maximum system resources to be
             devoted to data ingest thus allowing a faster backup. The deferred de-duplication window applies only to the share/partition for which it is defined. It is
             possible to define a second share/partition and perform backups that overlap the same time period. The data written to the share without a defined
             deferred de-duplication window will be subjected to adaptive de-duplication.
           ■De-duplication pool – The term used to refer to the collection of unique data stored in a CS800 de-duplication appliance. The size of the de-duplication
             pool is reported as the After Reduction statistic on the ETERNUS CS800 GUI and is a measure of the disk space occupied by all data backed up to
             ETERNUS CS800 after the data has been de-duplicated and compressed.
           ■Failback – The ETERNUS CS800 procedure that uses replication to copy a replicated share or partition from a target ETERNUS CS800 to another
             ETERNUS CS800 system.



Page 1 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




           ■File or cartridge replication – File or cartridge replication (FCR) extends continuous and name-space replication from operating at a share/partition level
            and zooms in to the file-directory/virtual cartridge level. FCR can be used to synchronize the content of a share or partition that is concurrently
            accessible at both source and target ETERNUS CS800.
           ■Namespace – The term that Fujitsu applies to metadata required to reconstruct de-duplicated data back into its native application format. It is used in
            phrase combinations such as “namespace replication” or “synchronize the namespace.”
           ■Partition – An ETERNUS CS800 storage destination for data transferred by FC or iSCSI where the structure is considered to be a virtual tape library
            (VTL) and the content is written to virtual tape cartridges.
           ■Recover – The ETERNUS CS800 procedure to make replicated and namespace data accessible on ETERNUS CS800 to which it had been replicated. If
            a share was replicated, then a share is recovered. If a partition is replicated, then a partition is recovered. It is not possible to convert a share to a
            partition (or vice-versa) during the recovery procedure.
           ■Share – An ETERNUS CS800 storage destination for data transferred by NAS where the content is treated as files and directories.
           ■Source – The term often applied to the ETERNUS CS800 that is sending a copy of de-duplicated data to a second ETERNUS CS800.
           ■Synchronize – When used in this document, this term means that two entities are made and/or confirmed to be identical. For example, namespace
            replication will synchronize the relevant share and/or partition content and metadata between source and target system. When used in the context of
            “virtual tape cartridge”, “file”, or “directory”, “synchronize” operates at the more granular reference of the context (for example, “synchronize cartridges”)
            between source and target. Consult APPENDIX A – Directory/File or Cartridge Replication for more information about File or Cartridge Replication
            (FCR) and synchronizing at the more granular level.
           ■Target – the label often applied to the ETERNUS CS800 that is receiving a copy of de-duplicated data.



REQUIREMENTS FOR REPLICATION
           ■De-duplicated data – The data must be de-duplicated before it can be replicated. The user can create a NAS share or a VTL partition and specify that
            data written to that share/partition be de-duplicated.
           ■Specified data – Specify what data is to be replicated: The user must specify, on the source system, that a particular share/partition is to be replicated.
           ■Sufficient bandwidth – You need to have a circuit of sufficient bandwidth available to link the source to the target. Both ends of the circuit require TCP.
            The user has a variety of circuit options available.
           ■Specified replication target – Consult the ETERNUS CS800 User’s Guide for procedural details.
            a) The user must first tell the target that it should allow replication from the source system. This is done at the target.
            b) Next, the user must tell the source ETERNUS CS800 the name or IP address of the target device. The source system will immediately check if the
            target is reachable and if replication to that target has been authorized at the target.
           ■Schedule – Implement a schedule for routine namespace replication between source and target. This is optimally scheduled to take place after both the
            backup and de-duplication have completed.



WHAT DATA CAN BE REPLICATED?
Although ETERNUS CS800 can be used to store both de-duplicated as well as non-de-duplicated data at the same time on the same appliance, only de-duplicated
data can be replicated. Data to be replicated must be written to a share/partition that is configured for both de-duplication and replication. Shares/partitions must be
configured for de-duplication at the time they are created. De-duplication can-not be added or removed once a share/partition has been configured. Replication can
be enabled/disabled on a per-share or per-partition granularity even after the share/partition is created as long as the share/partition was created with de-duplication
enabled.



HOW DOES ETERNUS CS800 REPLICATION WORK?
ETERNUS CS800 replication has two phases that work together to synchronize copies between the source and target ETERNUS CS800. Both phases – continuous
replication and namespace replication are required to maintain synchronization. Continuous replication moves unique blocks in a background process while
namespace replication synchronizes the metadata between the source and target.

Continuous Replication

           ■Continuous replication does not have its own enable/disable command. As long as replication is enabled, continuous replication will seek to replicate the
            unique data blocks between source and target, but only for shares/partitions that have de-duplication and replication enabled.
           ■As the de-duplication process discovers new unique data (data that isn’t already in the local de-duplication pool), it puts a reference to that data in a
            queue for continuous replication to process.




Page 2 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




           ■Continuous replication, while processing the queue, asks the target ETERNUS CS800 if it already has a copy of the recently-stored unique data. If the
             target responds that it already has a copy of that data, continuous replication moves on to the next entry in the queue. If the target responds that it does
             not have a copy of that unique data, continuous replication is responsible for moving a copy of that unique data to the target.
           ■Continuous replication is extremely efficient because it only sends inquiries to the target if there is new unique data on the source. That is, there is no
             need to inquire about data previously replicated between source and target.
           ■In this way, continuous replication assures that there is a copy of the unique data blocks for a share/partition also on the target. More information (the
             namespace, also known as metadata) is needed in order to reassemble the data into its original format. Metadata is synchronized by namespace
             replication.
           ■Continuous replication is constantly checking to see if there is anything in its queue. If it finds a queue entry, it immediately processes the item.
           ■Continuous replication is suspended whenever namespace replication is active.
                       ��� a backup occurs while continuous replication is suspended, any new unique data tags will be added to the continuous replication queue for
                        If
                         later processing.
                       ��� continuous queue will once again be processed whenever namespace replication is not running.
                        The



NAMESPACE REPLICATION
           ■Namespace replication is responsible for synchronizing the metadata between source and target. The metadata is required in order to reassemble the
            de-duplicated data back into the format originally written by the backup application. The data cannot be reassembled without the meta-data.
           ■Namespace replication must be enabled on a per-share/partition basis using the GUI.
           ■Namespace replication can be scheduled to occur routinely as often as once per day. It can also be initiated on-demand. Click “Replicate Now” for
            namespace replication on demand.
           ■Namespace replication will normally execute immediately when started either by schedule or on demand. If a namespace replication is already active,
            then subsequent requests are queued, the respective share/partition will show a status of queued, and the queue is processed in FIFO order.

Partial Namespace Replication
Partial namespace replication can occur under the following conditions:

           ■Namespace replication is triggered while a NAS file is open in the share to be replicated.
           ■Namespace replication is triggered while a virtual cartridge from the partition to be replicated is loaded in a virtual drive.
           ■Namespace replication is triggered before all data for a share / partition has been de-duplicated

This means that not all metadata required for reassembling the data into its original application format is available. It also means that not all data blocks are available
because only unique de-duplicated data blocks are replicated to the target. Consequently, only some of the data can be reassembled into the original application
format on the target until a complete namespace replication is achieved. The potential ramification of a partial namespace replication is that some files may not be
available for a restore. A successful (i.e., not a partial) namespace typically catches up 24 hours later if a daily namespace replication schedule is implemented.
Manually clicking on the Replicate Now button in the GUI will also allow name-space replication to resynchronize.

In order to avoid a “partial” completion status, it is advisable to schedule namespace replication to occur after all data has been de-duplicated.
            ■In order to avoid partial namespace replication when issuing the “replicate now” command manually, click the Check Readiness button first. Check
              Readiness will determine if all data destined for the respective share/partition has been de-duplicated and report back.
            ■See when should I schedule namespace replication in the following chapters.



WHAT CONTROL DO I HAVE OVER REPLICATION?
There are several commands to control replication for the entire ETERNUS CS800 system. Consult the ETERNUS CS800 User’s Guide for details.

Pause/Resume
         ■Click Pause to pause all namespace and continuous replication. The pause will take effect as soon as the current data block finishes replicating. That is,
          on a low bandwidth replication link, it may take some time before you see the effect of the pause command.
         ■Click Resume to allow replication to resume.

Enable/Disable
         ■Click Enable to enable replication for all shares and partitions that have de-duplication configured.
                   ���CAUTION: If there are shares or partitions that you do not want replicated, then you should enable the shares and/or partitions individually
                     rather than using this GUI command. Refer to the ETERNUS CS800 User’s Guide if you require help with this procedure.

Page 3 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




                     ���Namespace replication will begin at the next scheduled time for each share and partition. If there is no namespace replication schedule, then it
                       will depend on when the user clicks Replicate Now for the respective share or partition.
                     ��� continuous replication queue will begin building with the next backup that writes into that share or partition. If there is no namespace
                       The
                       replication active and the circuit between source and target is up, then continuous replication will begin moving new unique data to the target.
                       Only new unique data ingested during that backup will be replicated to the target.
           ■Click Disable to disable replication from all shares and partitions that have been configured for de-duplication.
                     ��� in-process and queued namespace replications will attempt to complete before the Disable toggles off their namespace replication.
                      All

Replication for an individual share/partition can be managed by enabling and/or disabling replication for that share/partition. Call up the configuration for that
share/partition and edit the replication setting. Refer to the ETERNUS CS800 User’s Guide if you require help with this procedure.



HOW CAN I TELL HOW FAST MY REPLICATION IS PROCEEDING AND HOW MUCH BANDWIDTH MY REPLICATION IS USING?
There is no single number that defines the replication rate. Replication is a mutual dependency between the de-duplication rate, replication queue processing rate,
network loading and network latency. Replication is broken down into two measurements: (1) Replication Processing Rate, and (2) Replication Ethernet Load Rate.

Data must first be de-duplicated before it can be added to the replication queue. With adaptive de-duplication, data is de-duplicated as it is ingested. In a hypothetical
case, if ingest occurs at 100 MB/S and the rate of change in that data is 5%, then the rate at which new data is encountered is 5% of 100 MB/S, or 5 MB/S.
            ■The rate at which new data is encountered is the same as the rate at which it is placed on the replication queue: 5 MB/S.
            ■Consequently, the rate at which this new data is available for replication via the Ethernet port is 5 MB/S.

This means there is a replication rate that is based on the rate of ingest and a replication rate that is based on the amount of Ethernet loading. In the above
hypothetical example, ETERNUS CS800 is processing the ingest at 100 MB/S and determining what already exists at the replication target. The replication process-
ing rate is 100 MB/S. Unique data blocks are identified and replicated to the target at a replication Ethernet load rate of 5 MB/S (assuming that there are no
bandwidth or latency bottlenecks in the replication link).

Recap:
           ■Ingest (backup) is at 100 MB/S
           ■De-duplication is at 100 MB/S
           ■Replication is at 100 MB/S because we're verifying that some data already exists at the target (because it already exists in the de-duplication pool) and
           we're transferring a copy of only that data that isn't already in the blockpool.
           ■Side effect: Ethernet loading is 5 MB/S

Replicating the namespace happens very quickly, typically finishing within seconds if namespace replication is scheduled to occur after de-duplication has completed
and continuous replication has moved all the data blocks.




WHY SHOULD I REPLICATE THE NAMESPACE WHEN I FIRST CREATE A NEW SHARE/PARTITION?
Always replicate a share/partition namespace immediately after it is created and has de-duplication and replication enabled, independent of whether you have
specified either the Adaptive or the deferred de-duplication policy.

Replicate the namespace, via the on-demand Replicate Now button, for each new share/partition as soon as it is created and before any data is written to it.
           ■The initial namespace replication of the empty share/partition will run very quickly (in a matter of minutes).
           ■This action establishes the namespace structure for the share/partition on the target so that the first namespace replication after a backup will run
             quickly.

Failure to replicate the empty namespace is not fatal. The speed of the first-ever namespace replication following a backup, where the empty namespace was not
replicated first, may run noticeably slower than if the best practice recommendation had been followed. This will be especially noticeable if a huge amount of data has
been backed up.




Page 4 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




WHEN SHOULD I SCHEDULE NAMESPACE REPLICATION?
Namespace replication interacts with the de-duplication pool for its duration. Therefore, namespace replication will perform optimally if it does not overlap with other
processes (such as de-duplication, space reclamation, restores, Read/Verify, tape creation) that also access the de-duplication pool at the same time. If scheduled
properly, namespace replication will complete in a matter of minutes.

Avoid overlap of the following processes with namespace replication:
          ■De-duplication. You should avoid overlap with de-duplication for two reasons:
                      ��� that you do not end up with a partial namespace replication (see Partial Namespace Replication for more information).
                        So
                      ��� that you do not inadvertently slow de-duplication. In ETERNUS CS800 systems that are nearly full to capacity, this could have the side
                        So
                         effect of slowing the backup.
          ■Space Reclamation. Every ETERNUS CS800 must eventually reclaim the data blocks occupied by expired data. That can be a very I/O intensive
             process that is best completed as quickly as possible. If replication and space reclamation overlap, both can potentially be slowed by more than a factor
             of 2.
          ■Restores, backup application Read/Verify, and tape creation. All of these processes generate additional I/O. If the data being retrieved from
             ETERNUS CS800 is available from non-truncated space, a cache of native format data, then the impact of the operation is minimal. However, if the data
             first has to be retrieved from the de-duplication pool and reconstructed into native application format, then there will be a noticeable impact on
             performance of all of the overlapping processes.

If you have short discrete backup windows, then it should be relatively easy to determine the optimal schedule for namespace replication.



HOW MUCH BANDWIDTH DO I NEED FOR MY REPLICATION TO BE SUCCESSFUL?
ETERNUS CS800 will transfer only unique data, data that the target does not already have, when replicating from source to target. So you need sufficient bandwidth
to
       ■Replicate the daily load of new unique data from source to target
       ■Replicate the namespace (typically only a few MB)
       ■Room for data growth

For new ETERNUS CS800 installs: the ETERNUS Pre-Sales Systems Engineer (SE) has a sizing tool that that can calculate what your expected effective bandwidth
requirement will be
          ■Be aware that although you might have “plenty” of bandwidth available, end-to-end latency can significantly impact the ability of ETERNUS CS800 to
            utilize that bandwidth.
          ■If a communications link is already present between source and target location, you should per-form an FTP of 50-100 MB of totally random data
            between source and target and measure the performance. That will be a measure of how much bandwidth is available for replication.

NOTE: Totally random data is required so that any WAN optimization device (Riverbed, Silver Peak, etc.) in the circuit does not, without the knowledge of the user,
inflate the FTP transfer rate. ETERNUS CS800 will be replicating only unique data. In some instances, the user may elect to enable encryption during replication.
WAN optimization devices typically do not accelerate replication packets.

If ETERNUS CS800 is already installed and replicating:
         ■The SE can perform the same FTP test measurement to determine effective bandwidth.


HOW LONG WILL MY FIRST-EVER (NAMESPACE) REPLICATION TAKE?
There are several questions that have to be asked before an answer can be provided:

     1.    We need to know when this first-ever replication will be activated. For example:
           ��� replication be activated at the time that ETERNUS CS800 is installed? In this case, the first-ever replication will correspond with the first-ever backup.
           Will
           ��� replication be activated some time (days/weeks/months) after the first-ever backup to ETERNUS CS800? In this case, there will be a backlog of data
            Will
             waiting to be replicated.
           ��� replication configured but the namespace replication schedule was overlooked? If so, anywhere in the range of “none of the data” to “most of the
            Was
             data blocks” may already be at the target and the amount of time for namespace replication may be very short (minutes).
     2.    We need an estimate of how much data will be queued for this first-ever replication.
           ��� described in question 1, above, the amount of data to replicate will vary depending on when this first-ever replication is performed. It could have a
            As
             significant range from “all data in the share/partition” to “only the metadata.”

Page 5 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




           ���Although ETERNUS CS800 will only replicate unique data from source to target, it will go through a verification process during replication to make sure
             that all necessary data is replicated to the target and can be reconstructed into native application format.
           ��� more data there is to replicate, the longer it will take.
            The
     3.    We need to know what effective bandwidth is available. See how much bandwidth do I need for my replication to be successful?
     4.    What other activities will the source and target s be engaged in (for example, backup, space reclamation, read/verify, tape creation, etc) that could impact
           the speed of replication?
           ���Refer to the section titled “When should I schedule namespace replication?” for a discussion about competing activities.
           ���Impact of competing activities will depend on both scheduling and duration of the first-ever replication. If it is short in duration, the likelihood of impact is
             minimal. However, if duration is long, then overlap with competing activities is inevitable.

A simple example:
     1. Install two ETERNUS CS800 systems and configure replication
     2. Back up 1 TB of Exchange data in 8 hours.
          ��� Exchange servers are usually configured for single-instance store (no longer valid for Exchange 2010), so there is only one copy of any e-mail and
            attachment. That means the first Exchange backup typically has less than 5% de-duplication. Space savings from the initial Exchange backup come
            mainly from compression and not de-duplication.
          ���Exchange data is typically 1.6:1 compressible. We will ignore compression in this simple example.
          ��� rate of change in the content from one Exchange backup to the next can range from 1% to over 20%. The typical rate of change is 10%. The lower
           The
            the rate of change, the more de-duplication is achieved among the backups stored in ETERNUS CS800.
     3. The first-ever replication will need to transfer a copy of nearly the entire first backup: 1 TB. The amount of time required for this transfer depends on the
          effective bandwidth available.
          ���Using T1 (1.544 Mbps), that first-ever replication would take about 60 days. In the meantime, subsequent backups will be introducing more unique data
            that will be added to the continuous replication queue for replication after the current queue is completed.
          ���Using OC1 (51.840 Mbps), that would take about 2 days.
          ���Using OC3 (155.260 Mbps), that would take 8 hours, because continuous replication is happening during the backup and namespace replication assures
            that the namespace is the synchronized between the source and target system.
          ��� more new data that is backed up for the first time and replicated, the proportionately longer the first-ever replication will take.
           The
     4. After the first-ever replication, only the new unique data is replicated to the target. If we assume a typical 10% rate of change between these Exchange
          backups, then the routine full backup’s replication would take…
          ���About 6 days with T1. Obviously this disqualifies T1 as a bandwidth to use for this replication link.
          ���About 1.5 days with T2 (6.312 Mbps), also disqualifying T2 since we need to finish routine replication in a 24-hour window or the data will start building an
            irreconcilable backlog.
          ���About 8 hours with T3 (44.736 Mbps) and higher bandwidths. Replication is not occupying the entire bandwidth during this 8-hour period. The reason the
            duration is estimated at 8 hours is because the backup is happening during the same 8 hours and the unique data is being sent to the target as it’s being
            encountered. The effective load would be 3.5 MB/S out of an available bandwidth of 5.6 MB/S.
You can see from the complexity of the above list of qualifying questions, that there is no simple answer to this question. Your ETERNUS CS800 Pre-Sales Systems
Engineer (SE) is your best source of information to answer this question.


HOW CAN I ACCELERATE THE FIRST-EVER REPLICATION?
The first replication of a backup event usually takes significantly more time than later routine replication events. That is because at the time of the first event, virtually
everything in the de-duplication pool on the source is typically new and unknown to the target system. The first replication event will be transferring a larger amount of
de-duplicated data than any of the following routine replication events.

There can be an exception to this "first replication is significantly longer than the others" statement. For example, if you are replicating four remote ETERNUS CS800
systems to the same target, it is possible that one of the other source systems may have already deposited data into the target system de-duplication pool that
duplicates what another wants to send. In that instance, the de-duplication pool content does not have to change. Only the namespace replication needs to occur,
and that happens very quickly because the namespace is typically very small. (Typical namespace size is only few MB.)

A number of initialization options are available that can decrease the amount of time needed for that first replication. The goal in each of these is to seed the de-
duplication pool of the destination ETERNUS CS800 so that a minimum number of bytes need to be transferred to maintain synchronization between the two
systems.

Option 1: Co-locate the source and target and replicate locally
Attach both the source and target systems on a dedicated GigE network and replicate locally at the highest rate supported by the ETERNUS CS800. This allows the
initial replication to proceed at the fastest possible rate.



Page 6 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




After replication completes, the target system can be deployed to its intended location and subsequent replications to maintain synchronization between units will
require significantly less time.

Option 2: Co-locate the source and target with the backup server for the first full backup
Depending on the amount of data to back up, this option may be faster than performing co-located replication on a dedicated GigE network. The Fujitsu Pre-Sales
Systems Engineer can provide advice.

You have three operational options:
          ■Sequentially perform a full backup to the source ETERNUS CS800 and then to the target ETERNUS CS800.
          ■Perform an inline full backup to both source and target at the same time.
          ■Clone the data from source to target after the first full backup completes.
            When considering these operational options, keep in mind the following:
          ■The type of backup (VTL or NAS) must be identical for both ETERNUS CS800 systems.
          ■VTL backups, depending on your server and ecosystem, can run significantly faster than NAS backups.

Steps:
     1.    Co-locate and execute one of the operational options mentioned above. This will place the unique data blocks into the de-duplication pool of each
           ETERNUS CS800.
     2.    Perform a namespace replication from source to target.
           ��� for namespace replication to complete before proceeding.
            Wait
           ��� little, if any additional unique data will be transferred during this namespace replication.
            Very
           ��� establishes a recovery point for the source in the target device.
            This
           ��� recovery point will have a copy of the namespace from the source.
            This
           ��� namespace copy will have pointers to all the blocks in the target’s de-duplication pool.
            This
           ���Effectively, each unique block in the target’s de-duplication pool will have 2 subscribers: The original process that put the unique blocks into the target’s
             de-duplication pool, and the copy of the namespace from the source.
     3.    Delete the clone/inline copy saveset references in the backup application catalog by expiring the savesets and releasing the media.
     4.    Delete the share/partition on the target ETERNUS CS800.
           ��� for this command to complete before proceeding.
            Wait
           ��� will remove all pointers from that share/partition to the unique blocks in the de-duplication pool. The unique blocks will not disappear or become
            This
             eligible for space reclamation because the namespace replication that you performed in step 2 (above) is pointing to the same unique blocks. Only
             unique blocks with zero “subscribers” pointing to them are eligible for space reclamation.
           ���Failure to do this will keep the original data around forever and can impact the amount of space available for future replication and data retention.
     5.    Deploy the target ETERNUS CS800 to its intended location.

Once deployed, the target may need a day or two to replicate unique data from new backups that may have taken place while the target was in transit.

Option 3: Use physical tape to initialize the target ETERNUS CS800
Depending on the amount of data to back up, this option may be faster than performing co-located replication on a dedicated GigE network. The ETERNUS CS800
Pre-Sales Systems Engineer can provide advice.

It is essential that the type of backup (VTL or NAS) is preserved during this process. Physical tape is only the transport medium and your process of writing the data
to the tape at the source must be precisely re-versed when reading data from the tape and writing it to the target. While there may be other methods and/or utilities
for accomplishing this, this Best Practices option will only focus on using the customer backup application and VTL to accomplish this “seeding”.

           ■You must engage the same backup application in this process at both the source and target in order to preserve the formatting and metadata inserts of
            the original backup application. Failure to do so will result in having that same data replicated later and not being fully recognized. This means it will be
            stored a second time with its new application format.
           ■Failure to follow this procedure correctly may mean that the first remote replication will take a very long time.
           ■Failure to follow this procedure correctly may mean that your data will consume more disk space on the target and may cause the target to run out of
            disk capacity sooner than expected.


Steps:
     1.    Use your backup application to create a clone copy of a recent backup that you did to tape (either virtual or physical tape).
           ��� copy of the most recent backup assures that you have the majority of new unique data.
            A
           ��� you have a fairly recent tape copy, you can use it instead of creating a new tape copy because a recent copy will typically have more that 80% of the
            If
              data that you will be replicating to the target. The older the tape copy, the less useful it will be.

     2.    Transport that tape copy to the location of the target.
     3.    Create a (temporary) VTL partition on the target.
           ��� will clone/copy the tape to this partition in order to initialize the de-duplication pool with a copy of the unique data.
            You
     4.    Using the same backup application at the target as you used at the source:
           a. Import the cartridge to the backup application. This will make the contents of the cartridge accessible to the backup application. The contents will be
              identifiable as one or more backup save sets.

Page 7 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




           b. Duplicate/clone the contents of that cartridge to a virtual cartridge in the temporary partition of the target. This will create a copy of the de-duplicated
              data in the target. Later, when you perform your first namespace replication, the data that is already in the target’s de-duplication pool will not need to be
              transferred, thereby significantly speeding up the name-space replication process.
     5.    Perform a namespace replication from source to target.
           ��� for namespace replication to complete before proceeding.
            Wait
           ��� little, if any, additional unique data will be transferred during this process.
            Very

NOTE: The more backups that have occurred at the source since the tape was written and copied to the target, the more new unique data there will be that has to be
transferred to the target via replication.

           ��� establishes a recovery point for the source in the target.
           This
           ��� recovery point will have a copy of the namespace from the source.
           This
           ��� namespace copy will have pointers to all the blocks in the target’s de-duplication pool that are in common with the source.
           This
           ���
           Effectively, each unique block in the target’s de-duplication pool will have 2 subscribers: The original process that put the unique blocks into the target’s
            de-duplication pool, and the copy of the namespace from the source.

     6.    Delete the clone/inline copy save set references in the backup application catalog by expiring the save sets and releasing the media.
     7.    Additional cleanup step: delete / expire the references created from the imported tape.
     8.    Delete the temporary partition on the target.
           ��� for this command to complete before proceeding.
            Wait
           ��� will remove all pointers from that temporary partition to the unique blocks in the de-duplication pool. The unique blocks will not disappear or become
            This
             eligible for space reclamation because the namespace replication that you performed in step 5 (above) is pointing to the same unique blocks. Only
             unique blocks with zero “subscribers” pointing to them are eligible for space reclamation.
           ���Failure to delete this temporary partition will keep the original data around forever and can impact the amount of space available for future replication and
             data retention.


WHEN SHOULD I USE ENCRYPTION WITH REPLICATION?

ETERNUS CS800 offers the ability to encrypt data while in transit. That is, the source ETERNUS CS800 encrypts the blocks when sending them. The target
ETERNUS CS800 decrypts the blocks upon receipt and stores them unencrypted. AES-128 encryption is used.

Customers who have VPNs (virtual private networks) or encrypted circuits typically have no need to encrypt replication data with the ETERNUS CS800.

Customers who use public networks or have ultra-high security requirements for their data may wish to encrypt replication data that is in transit.


APPENDIX A – DIRECTORY/FILE OR CARTRIDGE REPLICATION

What is File or Cartridge Replication?
File or Cartridge Replication (FCR) extends continuous and namespace replication from operating at a share/partition level and zooms in to the file-directory/virtual
cartridge level. FCR can be used to synchronize the content of a share or partition that is concurrently accessible at both source and target.

FCR applies only to ETERNUS CS800 de-duplication appliances running v1.3.1 or later firmware. Consult the ETERNUS CS800 User’s Guide to learn how to
configure FCR.
Using VTL as an example:
           ■One could configure FCR for each virtual cartridge in the virtual library.
           ■When an FCR cartridge is written to, continuous replication will transfer de-duplicated unique data to the replication target.
           ■Once the cartridge is unmounted, FCR will wait for any trailing de-duplication and continuous replication traffic for that cartridge to complete. Then FCR
            will transfer the namespace for that cartridge to the destination ETERNUS CS800.

■Once completed, this process allows immediate access to that cartridge and the new data on it to servers accessing the destination ETERNUS CS800.

ETERNUS CS800 has reserved “replication threads” just for FCR. While these replication threads may end up competing for replication bandwidth, they significantly
shorten the namespace replication wait time for data written to a cartridge. This is almost like clicking Replicate Now on the ETERNUS CS800 GUI after every
cartridge eject, but doing so automatically rather than manually.

Advantages:
■FCR means that the target VTL is synchronized down to the cartridge level shortly after a cartridge is unloaded and the data has been de-duplicated.
■This enhances data availability at the target. Data is immediately accessible at the target after FCR completes.

When should I use FCR?

FCR fulfills one or more requirements of various user groups. If you have one or more of those requirements, then you should use FCR.
            ■Some users want the assurance that data has been replicated as early as possible in order to minimize the chance that a backup does not have a DR
              copy. FCR fulfills that requirement.
Page 8 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




           ■Some users want to access the DR copy as early as possible after the backup. FCR fulfills that requirement.
           ■Some users want their backup application to create a physical tape at the target (replicated) site. FCR ensures the unloaded virtual cartridges between
            the source and target are identical.
           ■Some users want to perform namespace replication more than once per day. FCR does NOT fulfill this requirement. See why should I do namespace
            replication if I’m using FCR for my entire share/partition in the following chapters.

Why should I do namespace replication if I’m using FCR for my entire share/partition?

FCR assures that specific content at both source and target is identical after synchronization.
         ■For NAS shares – this is at the file or directory level and occurs via a CLI or GUI command.
         ■For VTL partitions – this is at the virtual cartridge level and occurs when a virtual cartridge configured for FCR via the GUI is unloaded from a virtual tape
           drive.

FCR synchronization assures that data from the source is accessible on the target as soon as possible. “Accessible” means that the data from the source is
synchronized to an active share or partition. It does not mean that the entire namespace for the share/partition on the source has been replicated to the target.
Share/partition-level namespace replication only occurs when a user clicks on Replicate Now in the GUI or if namespace replication for that share/partition has been
scheduled to occur routinely.

Users must still perform routine namespace replication for the shares/partitions on the source that they want to replicate. Unless routine namespace replication is
performed, any share/partition Recover action would not include the most recent changes synchronized through FCR that have occurred since the most recent
namespace replication.

How do I recover data that has been synchronized with FCR but for which no namespace replication has yet occurred ?

Data that has been synchronized to a target with FCR can be recovered to the source using the steps described in this section, depending on the circumstance. It is
assumed that this would be part of a disaster recovery procedure.
           ■If the user only wanted to replicate a copy from the target to a third ETERNUS CS800, then this is a simple situation of establishing and executing
             namespace replication.
           ■If the user had some disaster at the source location that occurred after a full namespace replication and no other data had been written and
             synchronized via FCR, then the user would perform a normal replication failback as documented in the ETERNUS CS 800 User’s Guide.
           ■The procedures suggested below would only be followed if a disaster happened on the source after FCR updates had occurred on the replication target
             and it was vital to retrieve a copy of the most recent library state.
           ■It is not expected that the procedures suggested below will be used routinely, but only as part of a disaster recovery procedure.

Recovering FCR-updated VTL partitions from the target

If you had been replicating a VTL partition using the combination of namespace replication and FCR and now wanted to replace everything in the source partition
with everything in the active VTL partition on the target, do the following:

     1.    Replicate the active partition from the target back to the source. (All data already exist on the source, so this namespace replication will complete quickly.)
     2.    Delete the original partition on the source. (You must do this because the original partition contains duplicate barcodes to the partition you replicated to the
           source in step 1. ETERNUS CS800 will not allow identical barcodes in active partitions.)
     3.    Recover the replicated partition and give it the same (or different) name as before.
     4.    Populate the recovered partition with tape drives, as before.
     5.    Connect the VTL to your backup application.
     6.    Perform an inventory to identify where the cartridges are located.

NOTE: A backup application Import should not be necessary because the backup application catalog should already reflect any new data that had been backed up.

Recovering FCR-updated NAS shares from the target

If you were replicating NAS shares and only want to retrieve a subset of the data stored in the share on the target:

     1.    Replicate the active share on the target back to the source. (Call this the “failback share” copy.)
     2.    Recover the failback share and give it a different name.
     3.    Mount the original and the failback share and copy the desired files/directories from the failback share to the original share.
     4.    Unmount and delete the failback share. Failure to do this could result in unique data in the fail-back share remaining on indefinitely, reducing available
           capacity, and influencing the overall de-duplication ratio reported by ETERNUS CS800.
     5.    Or you can turn off FCR at the source and turn it on at the target. Then manually trigger synchronization back from the target to the source for the specific
           files/directories you re-quire. Don’t forget to reset FCR to its original orientation when you’re done.

If you were replicating NAS shares and wanted to retrieve everything from the target to replace everything you have in that share on the source:

     1.    Replicate the active share on the target back to the source.
     2.    Delete the original share on the source.
     3.    Recover the replicated share and give it the same (or different) name as before.
Page 9 of 10
BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011]




     4.    Mount the recovered share and continue as before.


APPENDIX B – FREQUENTLY ASKED QUESTION

If a partition (or share) does not have replication enabled for it, but does have de-duplication enabled, does any of its data get replicated?

           Only data from a share/partition that has both de-duplication and replication enabled is replicated to the target. All replication is done on a per-
           share/partition basis. Shares/partitions that do not have replication enabled will not have their unique content replicated.

Can I replicate only part of a partition? For example, my retention policy is four weeks but I only want to replicate the most-recent two weeks. Can I do
that?

           Replication is all-or-none for a given share/partition. A solution that meets your requirement might be to create a second partition on the source system
           and use the application to clone select data from the first partition to this second partition. After it has been cloned, the original copy in the source partition
           can be expired by the application.

           This has several advantages:
           ■Isolates those data that should be replicated from those that should not be replicated.
           ■Potentially reduces the number of bytes being replicated, thereby reducing replication band-width demand.
           ■Reduces the amount of data to be stored on the target. Storing only two weeks of unique data should be less than or equal to storing up to four weeks of
             unique data TB.
           ■Allows separate retention and expiration policies for the two types of data. Archive copies can be retained indefinitely where short-term copies could be
             expired after mere days or weeks.

If I expire one or more backups in my backup application, does that mean the data for the expired backups will not be replicated?

           Simply expiring a save set with the application does not mean that it will not be replicated. The system does not know that a save set has been expired by
           the application. It is only when the application overwrites the expired save set that the ETERNUS CS800 system releases the blocks containing the data of
           the expired save set. Released blocks can then be overwritten with new data.




                                          CONTACT                                              Here follows the legal disclaimer of your organization:
                                          Fujitsu Technology Solutions GmbH                    e.g.: All rights reserved, including intellectual property rights. Technical data
                                          Mies-van-der-Rohe-Straße 8, Munich, 80807,           subject to modifications and delivery subject to availability. Any liability that the
                                          Germany                                              data and illustrations are complete, actual or correct is excluded. Designations
                                          E-mail: storage-pm@ts.fujitsu.com                    may be trademarks and/or copyrights of the respective manufacturer, the use of
                                          Website: http://ts.fujitsu.com                       which by third parties for their own purposes may infringe the rights of such
                                                                                               owner. For further information see ts.fujitsu.com/terms_of_use.html
                                                                                               Copyright © Fujitsu Technology Solutions GmbH 2011



Page 10 of 10

Weitere ähnliche Inhalte

Mehr von Kingfin Enterprises Limited

Top 10 Tips for Implementing Desktop Virtualisation.
Top 10 Tips for Implementing Desktop Virtualisation. Top 10 Tips for Implementing Desktop Virtualisation.
Top 10 Tips for Implementing Desktop Virtualisation. Kingfin Enterprises Limited
 
The buyers' guide to virtual + physical data protection
The buyers' guide to virtual + physical data protectionThe buyers' guide to virtual + physical data protection
The buyers' guide to virtual + physical data protectionKingfin Enterprises Limited
 
How to Overcome 11 Challenges for Small IT Environments
How to Overcome 11 Challenges for Small IT EnvironmentsHow to Overcome 11 Challenges for Small IT Environments
How to Overcome 11 Challenges for Small IT EnvironmentsKingfin Enterprises Limited
 
Microsoft Windows Server 2012 Early Adopter Guide
Microsoft Windows Server 2012 Early Adopter GuideMicrosoft Windows Server 2012 Early Adopter Guide
Microsoft Windows Server 2012 Early Adopter GuideKingfin Enterprises Limited
 

Mehr von Kingfin Enterprises Limited (20)

Diskashur Desktop Hard Disk Drive Datasheet
Diskashur Desktop Hard Disk Drive DatasheetDiskashur Desktop Hard Disk Drive Datasheet
Diskashur Desktop Hard Disk Drive Datasheet
 
Diskashur Hard Disk Drives Datasheet
Diskashur Hard Disk Drives DatasheetDiskashur Hard Disk Drives Datasheet
Diskashur Hard Disk Drives Datasheet
 
VoIP for Beginners
VoIP for BeginnersVoIP for Beginners
VoIP for Beginners
 
Top 10 Tips for Implementing Desktop Virtualisation.
Top 10 Tips for Implementing Desktop Virtualisation. Top 10 Tips for Implementing Desktop Virtualisation.
Top 10 Tips for Implementing Desktop Virtualisation.
 
A Smarter Path to ERP Selection
A Smarter Path to ERP SelectionA Smarter Path to ERP Selection
A Smarter Path to ERP Selection
 
The buyers' guide to virtual + physical data protection
The buyers' guide to virtual + physical data protectionThe buyers' guide to virtual + physical data protection
The buyers' guide to virtual + physical data protection
 
Fujitsu Scansnap SV600 Product Introduction
Fujitsu Scansnap SV600 Product IntroductionFujitsu Scansnap SV600 Product Introduction
Fujitsu Scansnap SV600 Product Introduction
 
The 7 types of Power Problems
The 7 types of Power ProblemsThe 7 types of Power Problems
The 7 types of Power Problems
 
ASUS VivoBook S400
ASUS VivoBook S400ASUS VivoBook S400
ASUS VivoBook S400
 
How to Overcome 11 Challenges for Small IT Environments
How to Overcome 11 Challenges for Small IT EnvironmentsHow to Overcome 11 Challenges for Small IT Environments
How to Overcome 11 Challenges for Small IT Environments
 
Fujitsu ScanSnap S1300i Brochure
Fujitsu ScanSnap S1300i BrochureFujitsu ScanSnap S1300i Brochure
Fujitsu ScanSnap S1300i Brochure
 
Symantec Partner Certificate
Symantec Partner CertificateSymantec Partner Certificate
Symantec Partner Certificate
 
Dell registered partner certificate
Dell registered partner certificateDell registered partner certificate
Dell registered partner certificate
 
MOBILE DEVELOPMENT IN THE BUSINESS
MOBILE DEVELOPMENT IN THE BUSINESSMOBILE DEVELOPMENT IN THE BUSINESS
MOBILE DEVELOPMENT IN THE BUSINESS
 
Microsoft Windows Server 2012 Early Adopter Guide
Microsoft Windows Server 2012 Early Adopter GuideMicrosoft Windows Server 2012 Early Adopter Guide
Microsoft Windows Server 2012 Early Adopter Guide
 
How Unified Communications Pays For Itself
How Unified Communications Pays For ItselfHow Unified Communications Pays For Itself
How Unified Communications Pays For Itself
 
Fujitsu STYLISTIC Q702 Tablet PC
Fujitsu STYLISTIC Q702 Tablet PCFujitsu STYLISTIC Q702 Tablet PC
Fujitsu STYLISTIC Q702 Tablet PC
 
Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7
 
Fujitsu PRIMERGY RX100 S7
Fujitsu PRIMERGY RX100 S7Fujitsu PRIMERGY RX100 S7
Fujitsu PRIMERGY RX100 S7
 
Dell Partner Letter
Dell Partner LetterDell Partner Letter
Dell Partner Letter
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Replication with ETERNUS CS800 - Best Practice Guide

  • 1. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] BEST PRACTICE GUIDE REPLICATION WITH ETERNUS CS800 OVERVIEW Replication is a feature of the ETERNUS CS800 Series de-duplication appliances that uses TCP, an Ethernet protocol, to efficiently transport a complete copy of user data residing on one ETERNUS CS800 (“the source”) to another ETERNUS CS800 (“the target”). High efficiency is achieved by transporting only the unique data blocks plus metadata from source to target. SCOPE Intended Audience: End Users, System Engineers, RTS, Resellers This document provides best practice guidance when configuring replication between ETERNUS CS800 de-duplication appliances. This is not intended to be a standalone document. OBJECTIVE The value of replication is Disaster Recovery (DR). ■The target ETERNUS CS800 can failback a copy of the data to the same or another ETERNUS CS800. ■The target ETERNUS CS800 may be used to directly access the user data at the DR site. ■The target ETERNUS CS800 may be physically relocated to another server location for access to the user data. DEFINITION OF TERMS A variety of replication terminology is used in this document. This document makes every attempt to use the same terminology as introduced in the ETERNUS CS800 User’s Guide. ■Adaptive de-duplication – The mode of de-duplication which allows data de-duplication to run concurrent with the backup being ingested. The de- duplication process will adapt to the speed of the ingest. ■Backup Window – In normal use, “backup window” refers to the customer-defined period of time during which the customer data is backed up. It usually has a clearly identifiable start and stop time. When used together with deferred de-duplication in an ETERNUS CS800 context, the “backup window” refers to a “reservation window” during which de-duplication is suspended so that all Disk I/O can be applied to maximize data ingest in order to minimize the normal user backup window. In order to minimize confusion about which “backup window” is being discussed, this document will refer to this ETERNUS CS800 deferred de-duplication backup window as the “deferred de-duplication window”. ■Deferred de-duplication – The mode of de-duplication which begins only after the deferred de-duplication window. Typically, deferred de-duplication begins after the backup ingest is complete. ■Deferred de-duplication window – A defined window during which no de-duplication will take place. This allows maximum system resources to be devoted to data ingest thus allowing a faster backup. The deferred de-duplication window applies only to the share/partition for which it is defined. It is possible to define a second share/partition and perform backups that overlap the same time period. The data written to the share without a defined deferred de-duplication window will be subjected to adaptive de-duplication. ■De-duplication pool – The term used to refer to the collection of unique data stored in a CS800 de-duplication appliance. The size of the de-duplication pool is reported as the After Reduction statistic on the ETERNUS CS800 GUI and is a measure of the disk space occupied by all data backed up to ETERNUS CS800 after the data has been de-duplicated and compressed. ■Failback – The ETERNUS CS800 procedure that uses replication to copy a replicated share or partition from a target ETERNUS CS800 to another ETERNUS CS800 system. Page 1 of 10
  • 2. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] ■File or cartridge replication – File or cartridge replication (FCR) extends continuous and name-space replication from operating at a share/partition level and zooms in to the file-directory/virtual cartridge level. FCR can be used to synchronize the content of a share or partition that is concurrently accessible at both source and target ETERNUS CS800. ■Namespace – The term that Fujitsu applies to metadata required to reconstruct de-duplicated data back into its native application format. It is used in phrase combinations such as “namespace replication” or “synchronize the namespace.” ■Partition – An ETERNUS CS800 storage destination for data transferred by FC or iSCSI where the structure is considered to be a virtual tape library (VTL) and the content is written to virtual tape cartridges. ■Recover – The ETERNUS CS800 procedure to make replicated and namespace data accessible on ETERNUS CS800 to which it had been replicated. If a share was replicated, then a share is recovered. If a partition is replicated, then a partition is recovered. It is not possible to convert a share to a partition (or vice-versa) during the recovery procedure. ■Share – An ETERNUS CS800 storage destination for data transferred by NAS where the content is treated as files and directories. ■Source – The term often applied to the ETERNUS CS800 that is sending a copy of de-duplicated data to a second ETERNUS CS800. ■Synchronize – When used in this document, this term means that two entities are made and/or confirmed to be identical. For example, namespace replication will synchronize the relevant share and/or partition content and metadata between source and target system. When used in the context of “virtual tape cartridge”, “file”, or “directory”, “synchronize” operates at the more granular reference of the context (for example, “synchronize cartridges”) between source and target. Consult APPENDIX A – Directory/File or Cartridge Replication for more information about File or Cartridge Replication (FCR) and synchronizing at the more granular level. ■Target – the label often applied to the ETERNUS CS800 that is receiving a copy of de-duplicated data. REQUIREMENTS FOR REPLICATION ■De-duplicated data – The data must be de-duplicated before it can be replicated. The user can create a NAS share or a VTL partition and specify that data written to that share/partition be de-duplicated. ■Specified data – Specify what data is to be replicated: The user must specify, on the source system, that a particular share/partition is to be replicated. ■Sufficient bandwidth – You need to have a circuit of sufficient bandwidth available to link the source to the target. Both ends of the circuit require TCP. The user has a variety of circuit options available. ■Specified replication target – Consult the ETERNUS CS800 User’s Guide for procedural details. a) The user must first tell the target that it should allow replication from the source system. This is done at the target. b) Next, the user must tell the source ETERNUS CS800 the name or IP address of the target device. The source system will immediately check if the target is reachable and if replication to that target has been authorized at the target. ■Schedule – Implement a schedule for routine namespace replication between source and target. This is optimally scheduled to take place after both the backup and de-duplication have completed. WHAT DATA CAN BE REPLICATED? Although ETERNUS CS800 can be used to store both de-duplicated as well as non-de-duplicated data at the same time on the same appliance, only de-duplicated data can be replicated. Data to be replicated must be written to a share/partition that is configured for both de-duplication and replication. Shares/partitions must be configured for de-duplication at the time they are created. De-duplication can-not be added or removed once a share/partition has been configured. Replication can be enabled/disabled on a per-share or per-partition granularity even after the share/partition is created as long as the share/partition was created with de-duplication enabled. HOW DOES ETERNUS CS800 REPLICATION WORK? ETERNUS CS800 replication has two phases that work together to synchronize copies between the source and target ETERNUS CS800. Both phases – continuous replication and namespace replication are required to maintain synchronization. Continuous replication moves unique blocks in a background process while namespace replication synchronizes the metadata between the source and target. Continuous Replication ■Continuous replication does not have its own enable/disable command. As long as replication is enabled, continuous replication will seek to replicate the unique data blocks between source and target, but only for shares/partitions that have de-duplication and replication enabled. ■As the de-duplication process discovers new unique data (data that isn’t already in the local de-duplication pool), it puts a reference to that data in a queue for continuous replication to process. Page 2 of 10
  • 3. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] ■Continuous replication, while processing the queue, asks the target ETERNUS CS800 if it already has a copy of the recently-stored unique data. If the target responds that it already has a copy of that data, continuous replication moves on to the next entry in the queue. If the target responds that it does not have a copy of that unique data, continuous replication is responsible for moving a copy of that unique data to the target. ■Continuous replication is extremely efficient because it only sends inquiries to the target if there is new unique data on the source. That is, there is no need to inquire about data previously replicated between source and target. ■In this way, continuous replication assures that there is a copy of the unique data blocks for a share/partition also on the target. More information (the namespace, also known as metadata) is needed in order to reassemble the data into its original format. Metadata is synchronized by namespace replication. ■Continuous replication is constantly checking to see if there is anything in its queue. If it finds a queue entry, it immediately processes the item. ■Continuous replication is suspended whenever namespace replication is active. ��� a backup occurs while continuous replication is suspended, any new unique data tags will be added to the continuous replication queue for If later processing. ��� continuous queue will once again be processed whenever namespace replication is not running. The NAMESPACE REPLICATION ■Namespace replication is responsible for synchronizing the metadata between source and target. The metadata is required in order to reassemble the de-duplicated data back into the format originally written by the backup application. The data cannot be reassembled without the meta-data. ■Namespace replication must be enabled on a per-share/partition basis using the GUI. ■Namespace replication can be scheduled to occur routinely as often as once per day. It can also be initiated on-demand. Click “Replicate Now” for namespace replication on demand. ■Namespace replication will normally execute immediately when started either by schedule or on demand. If a namespace replication is already active, then subsequent requests are queued, the respective share/partition will show a status of queued, and the queue is processed in FIFO order. Partial Namespace Replication Partial namespace replication can occur under the following conditions: ■Namespace replication is triggered while a NAS file is open in the share to be replicated. ■Namespace replication is triggered while a virtual cartridge from the partition to be replicated is loaded in a virtual drive. ■Namespace replication is triggered before all data for a share / partition has been de-duplicated This means that not all metadata required for reassembling the data into its original application format is available. It also means that not all data blocks are available because only unique de-duplicated data blocks are replicated to the target. Consequently, only some of the data can be reassembled into the original application format on the target until a complete namespace replication is achieved. The potential ramification of a partial namespace replication is that some files may not be available for a restore. A successful (i.e., not a partial) namespace typically catches up 24 hours later if a daily namespace replication schedule is implemented. Manually clicking on the Replicate Now button in the GUI will also allow name-space replication to resynchronize. In order to avoid a “partial” completion status, it is advisable to schedule namespace replication to occur after all data has been de-duplicated. ■In order to avoid partial namespace replication when issuing the “replicate now” command manually, click the Check Readiness button first. Check Readiness will determine if all data destined for the respective share/partition has been de-duplicated and report back. ■See when should I schedule namespace replication in the following chapters. WHAT CONTROL DO I HAVE OVER REPLICATION? There are several commands to control replication for the entire ETERNUS CS800 system. Consult the ETERNUS CS800 User’s Guide for details. Pause/Resume ■Click Pause to pause all namespace and continuous replication. The pause will take effect as soon as the current data block finishes replicating. That is, on a low bandwidth replication link, it may take some time before you see the effect of the pause command. ■Click Resume to allow replication to resume. Enable/Disable ■Click Enable to enable replication for all shares and partitions that have de-duplication configured. ���CAUTION: If there are shares or partitions that you do not want replicated, then you should enable the shares and/or partitions individually rather than using this GUI command. Refer to the ETERNUS CS800 User’s Guide if you require help with this procedure. Page 3 of 10
  • 4. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] ���Namespace replication will begin at the next scheduled time for each share and partition. If there is no namespace replication schedule, then it will depend on when the user clicks Replicate Now for the respective share or partition. ��� continuous replication queue will begin building with the next backup that writes into that share or partition. If there is no namespace The replication active and the circuit between source and target is up, then continuous replication will begin moving new unique data to the target. Only new unique data ingested during that backup will be replicated to the target. ■Click Disable to disable replication from all shares and partitions that have been configured for de-duplication. ��� in-process and queued namespace replications will attempt to complete before the Disable toggles off their namespace replication. All Replication for an individual share/partition can be managed by enabling and/or disabling replication for that share/partition. Call up the configuration for that share/partition and edit the replication setting. Refer to the ETERNUS CS800 User’s Guide if you require help with this procedure. HOW CAN I TELL HOW FAST MY REPLICATION IS PROCEEDING AND HOW MUCH BANDWIDTH MY REPLICATION IS USING? There is no single number that defines the replication rate. Replication is a mutual dependency between the de-duplication rate, replication queue processing rate, network loading and network latency. Replication is broken down into two measurements: (1) Replication Processing Rate, and (2) Replication Ethernet Load Rate. Data must first be de-duplicated before it can be added to the replication queue. With adaptive de-duplication, data is de-duplicated as it is ingested. In a hypothetical case, if ingest occurs at 100 MB/S and the rate of change in that data is 5%, then the rate at which new data is encountered is 5% of 100 MB/S, or 5 MB/S. ■The rate at which new data is encountered is the same as the rate at which it is placed on the replication queue: 5 MB/S. ■Consequently, the rate at which this new data is available for replication via the Ethernet port is 5 MB/S. This means there is a replication rate that is based on the rate of ingest and a replication rate that is based on the amount of Ethernet loading. In the above hypothetical example, ETERNUS CS800 is processing the ingest at 100 MB/S and determining what already exists at the replication target. The replication process- ing rate is 100 MB/S. Unique data blocks are identified and replicated to the target at a replication Ethernet load rate of 5 MB/S (assuming that there are no bandwidth or latency bottlenecks in the replication link). Recap: ■Ingest (backup) is at 100 MB/S ■De-duplication is at 100 MB/S ■Replication is at 100 MB/S because we're verifying that some data already exists at the target (because it already exists in the de-duplication pool) and we're transferring a copy of only that data that isn't already in the blockpool. ■Side effect: Ethernet loading is 5 MB/S Replicating the namespace happens very quickly, typically finishing within seconds if namespace replication is scheduled to occur after de-duplication has completed and continuous replication has moved all the data blocks. WHY SHOULD I REPLICATE THE NAMESPACE WHEN I FIRST CREATE A NEW SHARE/PARTITION? Always replicate a share/partition namespace immediately after it is created and has de-duplication and replication enabled, independent of whether you have specified either the Adaptive or the deferred de-duplication policy. Replicate the namespace, via the on-demand Replicate Now button, for each new share/partition as soon as it is created and before any data is written to it. ■The initial namespace replication of the empty share/partition will run very quickly (in a matter of minutes). ■This action establishes the namespace structure for the share/partition on the target so that the first namespace replication after a backup will run quickly. Failure to replicate the empty namespace is not fatal. The speed of the first-ever namespace replication following a backup, where the empty namespace was not replicated first, may run noticeably slower than if the best practice recommendation had been followed. This will be especially noticeable if a huge amount of data has been backed up. Page 4 of 10
  • 5. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] WHEN SHOULD I SCHEDULE NAMESPACE REPLICATION? Namespace replication interacts with the de-duplication pool for its duration. Therefore, namespace replication will perform optimally if it does not overlap with other processes (such as de-duplication, space reclamation, restores, Read/Verify, tape creation) that also access the de-duplication pool at the same time. If scheduled properly, namespace replication will complete in a matter of minutes. Avoid overlap of the following processes with namespace replication: ■De-duplication. You should avoid overlap with de-duplication for two reasons: ��� that you do not end up with a partial namespace replication (see Partial Namespace Replication for more information). So ��� that you do not inadvertently slow de-duplication. In ETERNUS CS800 systems that are nearly full to capacity, this could have the side So effect of slowing the backup. ■Space Reclamation. Every ETERNUS CS800 must eventually reclaim the data blocks occupied by expired data. That can be a very I/O intensive process that is best completed as quickly as possible. If replication and space reclamation overlap, both can potentially be slowed by more than a factor of 2. ■Restores, backup application Read/Verify, and tape creation. All of these processes generate additional I/O. If the data being retrieved from ETERNUS CS800 is available from non-truncated space, a cache of native format data, then the impact of the operation is minimal. However, if the data first has to be retrieved from the de-duplication pool and reconstructed into native application format, then there will be a noticeable impact on performance of all of the overlapping processes. If you have short discrete backup windows, then it should be relatively easy to determine the optimal schedule for namespace replication. HOW MUCH BANDWIDTH DO I NEED FOR MY REPLICATION TO BE SUCCESSFUL? ETERNUS CS800 will transfer only unique data, data that the target does not already have, when replicating from source to target. So you need sufficient bandwidth to ■Replicate the daily load of new unique data from source to target ■Replicate the namespace (typically only a few MB) ■Room for data growth For new ETERNUS CS800 installs: the ETERNUS Pre-Sales Systems Engineer (SE) has a sizing tool that that can calculate what your expected effective bandwidth requirement will be ■Be aware that although you might have “plenty” of bandwidth available, end-to-end latency can significantly impact the ability of ETERNUS CS800 to utilize that bandwidth. ■If a communications link is already present between source and target location, you should per-form an FTP of 50-100 MB of totally random data between source and target and measure the performance. That will be a measure of how much bandwidth is available for replication. NOTE: Totally random data is required so that any WAN optimization device (Riverbed, Silver Peak, etc.) in the circuit does not, without the knowledge of the user, inflate the FTP transfer rate. ETERNUS CS800 will be replicating only unique data. In some instances, the user may elect to enable encryption during replication. WAN optimization devices typically do not accelerate replication packets. If ETERNUS CS800 is already installed and replicating: ■The SE can perform the same FTP test measurement to determine effective bandwidth. HOW LONG WILL MY FIRST-EVER (NAMESPACE) REPLICATION TAKE? There are several questions that have to be asked before an answer can be provided: 1. We need to know when this first-ever replication will be activated. For example: ��� replication be activated at the time that ETERNUS CS800 is installed? In this case, the first-ever replication will correspond with the first-ever backup. Will ��� replication be activated some time (days/weeks/months) after the first-ever backup to ETERNUS CS800? In this case, there will be a backlog of data Will waiting to be replicated. ��� replication configured but the namespace replication schedule was overlooked? If so, anywhere in the range of “none of the data” to “most of the Was data blocks” may already be at the target and the amount of time for namespace replication may be very short (minutes). 2. We need an estimate of how much data will be queued for this first-ever replication. ��� described in question 1, above, the amount of data to replicate will vary depending on when this first-ever replication is performed. It could have a As significant range from “all data in the share/partition” to “only the metadata.” Page 5 of 10
  • 6. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] ���Although ETERNUS CS800 will only replicate unique data from source to target, it will go through a verification process during replication to make sure that all necessary data is replicated to the target and can be reconstructed into native application format. ��� more data there is to replicate, the longer it will take. The 3. We need to know what effective bandwidth is available. See how much bandwidth do I need for my replication to be successful? 4. What other activities will the source and target s be engaged in (for example, backup, space reclamation, read/verify, tape creation, etc) that could impact the speed of replication? ���Refer to the section titled “When should I schedule namespace replication?” for a discussion about competing activities. ���Impact of competing activities will depend on both scheduling and duration of the first-ever replication. If it is short in duration, the likelihood of impact is minimal. However, if duration is long, then overlap with competing activities is inevitable. A simple example: 1. Install two ETERNUS CS800 systems and configure replication 2. Back up 1 TB of Exchange data in 8 hours. ��� Exchange servers are usually configured for single-instance store (no longer valid for Exchange 2010), so there is only one copy of any e-mail and attachment. That means the first Exchange backup typically has less than 5% de-duplication. Space savings from the initial Exchange backup come mainly from compression and not de-duplication. ���Exchange data is typically 1.6:1 compressible. We will ignore compression in this simple example. ��� rate of change in the content from one Exchange backup to the next can range from 1% to over 20%. The typical rate of change is 10%. The lower The the rate of change, the more de-duplication is achieved among the backups stored in ETERNUS CS800. 3. The first-ever replication will need to transfer a copy of nearly the entire first backup: 1 TB. The amount of time required for this transfer depends on the effective bandwidth available. ���Using T1 (1.544 Mbps), that first-ever replication would take about 60 days. In the meantime, subsequent backups will be introducing more unique data that will be added to the continuous replication queue for replication after the current queue is completed. ���Using OC1 (51.840 Mbps), that would take about 2 days. ���Using OC3 (155.260 Mbps), that would take 8 hours, because continuous replication is happening during the backup and namespace replication assures that the namespace is the synchronized between the source and target system. ��� more new data that is backed up for the first time and replicated, the proportionately longer the first-ever replication will take. The 4. After the first-ever replication, only the new unique data is replicated to the target. If we assume a typical 10% rate of change between these Exchange backups, then the routine full backup’s replication would take… ���About 6 days with T1. Obviously this disqualifies T1 as a bandwidth to use for this replication link. ���About 1.5 days with T2 (6.312 Mbps), also disqualifying T2 since we need to finish routine replication in a 24-hour window or the data will start building an irreconcilable backlog. ���About 8 hours with T3 (44.736 Mbps) and higher bandwidths. Replication is not occupying the entire bandwidth during this 8-hour period. The reason the duration is estimated at 8 hours is because the backup is happening during the same 8 hours and the unique data is being sent to the target as it’s being encountered. The effective load would be 3.5 MB/S out of an available bandwidth of 5.6 MB/S. You can see from the complexity of the above list of qualifying questions, that there is no simple answer to this question. Your ETERNUS CS800 Pre-Sales Systems Engineer (SE) is your best source of information to answer this question. HOW CAN I ACCELERATE THE FIRST-EVER REPLICATION? The first replication of a backup event usually takes significantly more time than later routine replication events. That is because at the time of the first event, virtually everything in the de-duplication pool on the source is typically new and unknown to the target system. The first replication event will be transferring a larger amount of de-duplicated data than any of the following routine replication events. There can be an exception to this "first replication is significantly longer than the others" statement. For example, if you are replicating four remote ETERNUS CS800 systems to the same target, it is possible that one of the other source systems may have already deposited data into the target system de-duplication pool that duplicates what another wants to send. In that instance, the de-duplication pool content does not have to change. Only the namespace replication needs to occur, and that happens very quickly because the namespace is typically very small. (Typical namespace size is only few MB.) A number of initialization options are available that can decrease the amount of time needed for that first replication. The goal in each of these is to seed the de- duplication pool of the destination ETERNUS CS800 so that a minimum number of bytes need to be transferred to maintain synchronization between the two systems. Option 1: Co-locate the source and target and replicate locally Attach both the source and target systems on a dedicated GigE network and replicate locally at the highest rate supported by the ETERNUS CS800. This allows the initial replication to proceed at the fastest possible rate. Page 6 of 10
  • 7. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] After replication completes, the target system can be deployed to its intended location and subsequent replications to maintain synchronization between units will require significantly less time. Option 2: Co-locate the source and target with the backup server for the first full backup Depending on the amount of data to back up, this option may be faster than performing co-located replication on a dedicated GigE network. The Fujitsu Pre-Sales Systems Engineer can provide advice. You have three operational options: ■Sequentially perform a full backup to the source ETERNUS CS800 and then to the target ETERNUS CS800. ■Perform an inline full backup to both source and target at the same time. ■Clone the data from source to target after the first full backup completes. When considering these operational options, keep in mind the following: ■The type of backup (VTL or NAS) must be identical for both ETERNUS CS800 systems. ■VTL backups, depending on your server and ecosystem, can run significantly faster than NAS backups. Steps: 1. Co-locate and execute one of the operational options mentioned above. This will place the unique data blocks into the de-duplication pool of each ETERNUS CS800. 2. Perform a namespace replication from source to target. ��� for namespace replication to complete before proceeding. Wait ��� little, if any additional unique data will be transferred during this namespace replication. Very ��� establishes a recovery point for the source in the target device. This ��� recovery point will have a copy of the namespace from the source. This ��� namespace copy will have pointers to all the blocks in the target’s de-duplication pool. This ���Effectively, each unique block in the target’s de-duplication pool will have 2 subscribers: The original process that put the unique blocks into the target’s de-duplication pool, and the copy of the namespace from the source. 3. Delete the clone/inline copy saveset references in the backup application catalog by expiring the savesets and releasing the media. 4. Delete the share/partition on the target ETERNUS CS800. ��� for this command to complete before proceeding. Wait ��� will remove all pointers from that share/partition to the unique blocks in the de-duplication pool. The unique blocks will not disappear or become This eligible for space reclamation because the namespace replication that you performed in step 2 (above) is pointing to the same unique blocks. Only unique blocks with zero “subscribers” pointing to them are eligible for space reclamation. ���Failure to do this will keep the original data around forever and can impact the amount of space available for future replication and data retention. 5. Deploy the target ETERNUS CS800 to its intended location. Once deployed, the target may need a day or two to replicate unique data from new backups that may have taken place while the target was in transit. Option 3: Use physical tape to initialize the target ETERNUS CS800 Depending on the amount of data to back up, this option may be faster than performing co-located replication on a dedicated GigE network. The ETERNUS CS800 Pre-Sales Systems Engineer can provide advice. It is essential that the type of backup (VTL or NAS) is preserved during this process. Physical tape is only the transport medium and your process of writing the data to the tape at the source must be precisely re-versed when reading data from the tape and writing it to the target. While there may be other methods and/or utilities for accomplishing this, this Best Practices option will only focus on using the customer backup application and VTL to accomplish this “seeding”. ■You must engage the same backup application in this process at both the source and target in order to preserve the formatting and metadata inserts of the original backup application. Failure to do so will result in having that same data replicated later and not being fully recognized. This means it will be stored a second time with its new application format. ■Failure to follow this procedure correctly may mean that the first remote replication will take a very long time. ■Failure to follow this procedure correctly may mean that your data will consume more disk space on the target and may cause the target to run out of disk capacity sooner than expected. Steps: 1. Use your backup application to create a clone copy of a recent backup that you did to tape (either virtual or physical tape). ��� copy of the most recent backup assures that you have the majority of new unique data. A ��� you have a fairly recent tape copy, you can use it instead of creating a new tape copy because a recent copy will typically have more that 80% of the If data that you will be replicating to the target. The older the tape copy, the less useful it will be. 2. Transport that tape copy to the location of the target. 3. Create a (temporary) VTL partition on the target. ��� will clone/copy the tape to this partition in order to initialize the de-duplication pool with a copy of the unique data. You 4. Using the same backup application at the target as you used at the source: a. Import the cartridge to the backup application. This will make the contents of the cartridge accessible to the backup application. The contents will be identifiable as one or more backup save sets. Page 7 of 10
  • 8. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] b. Duplicate/clone the contents of that cartridge to a virtual cartridge in the temporary partition of the target. This will create a copy of the de-duplicated data in the target. Later, when you perform your first namespace replication, the data that is already in the target’s de-duplication pool will not need to be transferred, thereby significantly speeding up the name-space replication process. 5. Perform a namespace replication from source to target. ��� for namespace replication to complete before proceeding. Wait ��� little, if any, additional unique data will be transferred during this process. Very NOTE: The more backups that have occurred at the source since the tape was written and copied to the target, the more new unique data there will be that has to be transferred to the target via replication. ��� establishes a recovery point for the source in the target. This ��� recovery point will have a copy of the namespace from the source. This ��� namespace copy will have pointers to all the blocks in the target’s de-duplication pool that are in common with the source. This ��� Effectively, each unique block in the target’s de-duplication pool will have 2 subscribers: The original process that put the unique blocks into the target’s de-duplication pool, and the copy of the namespace from the source. 6. Delete the clone/inline copy save set references in the backup application catalog by expiring the save sets and releasing the media. 7. Additional cleanup step: delete / expire the references created from the imported tape. 8. Delete the temporary partition on the target. ��� for this command to complete before proceeding. Wait ��� will remove all pointers from that temporary partition to the unique blocks in the de-duplication pool. The unique blocks will not disappear or become This eligible for space reclamation because the namespace replication that you performed in step 5 (above) is pointing to the same unique blocks. Only unique blocks with zero “subscribers” pointing to them are eligible for space reclamation. ���Failure to delete this temporary partition will keep the original data around forever and can impact the amount of space available for future replication and data retention. WHEN SHOULD I USE ENCRYPTION WITH REPLICATION? ETERNUS CS800 offers the ability to encrypt data while in transit. That is, the source ETERNUS CS800 encrypts the blocks when sending them. The target ETERNUS CS800 decrypts the blocks upon receipt and stores them unencrypted. AES-128 encryption is used. Customers who have VPNs (virtual private networks) or encrypted circuits typically have no need to encrypt replication data with the ETERNUS CS800. Customers who use public networks or have ultra-high security requirements for their data may wish to encrypt replication data that is in transit. APPENDIX A – DIRECTORY/FILE OR CARTRIDGE REPLICATION What is File or Cartridge Replication? File or Cartridge Replication (FCR) extends continuous and namespace replication from operating at a share/partition level and zooms in to the file-directory/virtual cartridge level. FCR can be used to synchronize the content of a share or partition that is concurrently accessible at both source and target. FCR applies only to ETERNUS CS800 de-duplication appliances running v1.3.1 or later firmware. Consult the ETERNUS CS800 User’s Guide to learn how to configure FCR. Using VTL as an example: ■One could configure FCR for each virtual cartridge in the virtual library. ■When an FCR cartridge is written to, continuous replication will transfer de-duplicated unique data to the replication target. ■Once the cartridge is unmounted, FCR will wait for any trailing de-duplication and continuous replication traffic for that cartridge to complete. Then FCR will transfer the namespace for that cartridge to the destination ETERNUS CS800. ■Once completed, this process allows immediate access to that cartridge and the new data on it to servers accessing the destination ETERNUS CS800. ETERNUS CS800 has reserved “replication threads” just for FCR. While these replication threads may end up competing for replication bandwidth, they significantly shorten the namespace replication wait time for data written to a cartridge. This is almost like clicking Replicate Now on the ETERNUS CS800 GUI after every cartridge eject, but doing so automatically rather than manually. Advantages: ■FCR means that the target VTL is synchronized down to the cartridge level shortly after a cartridge is unloaded and the data has been de-duplicated. ■This enhances data availability at the target. Data is immediately accessible at the target after FCR completes. When should I use FCR? FCR fulfills one or more requirements of various user groups. If you have one or more of those requirements, then you should use FCR. ■Some users want the assurance that data has been replicated as early as possible in order to minimize the chance that a backup does not have a DR copy. FCR fulfills that requirement. Page 8 of 10
  • 9. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] ■Some users want to access the DR copy as early as possible after the backup. FCR fulfills that requirement. ■Some users want their backup application to create a physical tape at the target (replicated) site. FCR ensures the unloaded virtual cartridges between the source and target are identical. ■Some users want to perform namespace replication more than once per day. FCR does NOT fulfill this requirement. See why should I do namespace replication if I’m using FCR for my entire share/partition in the following chapters. Why should I do namespace replication if I’m using FCR for my entire share/partition? FCR assures that specific content at both source and target is identical after synchronization. ■For NAS shares – this is at the file or directory level and occurs via a CLI or GUI command. ■For VTL partitions – this is at the virtual cartridge level and occurs when a virtual cartridge configured for FCR via the GUI is unloaded from a virtual tape drive. FCR synchronization assures that data from the source is accessible on the target as soon as possible. “Accessible” means that the data from the source is synchronized to an active share or partition. It does not mean that the entire namespace for the share/partition on the source has been replicated to the target. Share/partition-level namespace replication only occurs when a user clicks on Replicate Now in the GUI or if namespace replication for that share/partition has been scheduled to occur routinely. Users must still perform routine namespace replication for the shares/partitions on the source that they want to replicate. Unless routine namespace replication is performed, any share/partition Recover action would not include the most recent changes synchronized through FCR that have occurred since the most recent namespace replication. How do I recover data that has been synchronized with FCR but for which no namespace replication has yet occurred ? Data that has been synchronized to a target with FCR can be recovered to the source using the steps described in this section, depending on the circumstance. It is assumed that this would be part of a disaster recovery procedure. ■If the user only wanted to replicate a copy from the target to a third ETERNUS CS800, then this is a simple situation of establishing and executing namespace replication. ■If the user had some disaster at the source location that occurred after a full namespace replication and no other data had been written and synchronized via FCR, then the user would perform a normal replication failback as documented in the ETERNUS CS 800 User’s Guide. ■The procedures suggested below would only be followed if a disaster happened on the source after FCR updates had occurred on the replication target and it was vital to retrieve a copy of the most recent library state. ■It is not expected that the procedures suggested below will be used routinely, but only as part of a disaster recovery procedure. Recovering FCR-updated VTL partitions from the target If you had been replicating a VTL partition using the combination of namespace replication and FCR and now wanted to replace everything in the source partition with everything in the active VTL partition on the target, do the following: 1. Replicate the active partition from the target back to the source. (All data already exist on the source, so this namespace replication will complete quickly.) 2. Delete the original partition on the source. (You must do this because the original partition contains duplicate barcodes to the partition you replicated to the source in step 1. ETERNUS CS800 will not allow identical barcodes in active partitions.) 3. Recover the replicated partition and give it the same (or different) name as before. 4. Populate the recovered partition with tape drives, as before. 5. Connect the VTL to your backup application. 6. Perform an inventory to identify where the cartridges are located. NOTE: A backup application Import should not be necessary because the backup application catalog should already reflect any new data that had been backed up. Recovering FCR-updated NAS shares from the target If you were replicating NAS shares and only want to retrieve a subset of the data stored in the share on the target: 1. Replicate the active share on the target back to the source. (Call this the “failback share” copy.) 2. Recover the failback share and give it a different name. 3. Mount the original and the failback share and copy the desired files/directories from the failback share to the original share. 4. Unmount and delete the failback share. Failure to do this could result in unique data in the fail-back share remaining on indefinitely, reducing available capacity, and influencing the overall de-duplication ratio reported by ETERNUS CS800. 5. Or you can turn off FCR at the source and turn it on at the target. Then manually trigger synchronization back from the target to the source for the specific files/directories you re-quire. Don’t forget to reset FCR to its original orientation when you’re done. If you were replicating NAS shares and wanted to retrieve everything from the target to replace everything you have in that share on the source: 1. Replicate the active share on the target back to the source. 2. Delete the original share on the source. 3. Recover the replicated share and give it the same (or different) name as before. Page 9 of 10
  • 10. BEST PRACTICE GUIDE - REPLICATION [MARCH, 2011] 4. Mount the recovered share and continue as before. APPENDIX B – FREQUENTLY ASKED QUESTION If a partition (or share) does not have replication enabled for it, but does have de-duplication enabled, does any of its data get replicated? Only data from a share/partition that has both de-duplication and replication enabled is replicated to the target. All replication is done on a per- share/partition basis. Shares/partitions that do not have replication enabled will not have their unique content replicated. Can I replicate only part of a partition? For example, my retention policy is four weeks but I only want to replicate the most-recent two weeks. Can I do that? Replication is all-or-none for a given share/partition. A solution that meets your requirement might be to create a second partition on the source system and use the application to clone select data from the first partition to this second partition. After it has been cloned, the original copy in the source partition can be expired by the application. This has several advantages: ■Isolates those data that should be replicated from those that should not be replicated. ■Potentially reduces the number of bytes being replicated, thereby reducing replication band-width demand. ■Reduces the amount of data to be stored on the target. Storing only two weeks of unique data should be less than or equal to storing up to four weeks of unique data TB. ■Allows separate retention and expiration policies for the two types of data. Archive copies can be retained indefinitely where short-term copies could be expired after mere days or weeks. If I expire one or more backups in my backup application, does that mean the data for the expired backups will not be replicated? Simply expiring a save set with the application does not mean that it will not be replicated. The system does not know that a save set has been expired by the application. It is only when the application overwrites the expired save set that the ETERNUS CS800 system releases the blocks containing the data of the expired save set. Released blocks can then be overwritten with new data. CONTACT Here follows the legal disclaimer of your organization: Fujitsu Technology Solutions GmbH e.g.: All rights reserved, including intellectual property rights. Technical data Mies-van-der-Rohe-Straße 8, Munich, 80807, subject to modifications and delivery subject to availability. Any liability that the Germany data and illustrations are complete, actual or correct is excluded. Designations E-mail: storage-pm@ts.fujitsu.com may be trademarks and/or copyrights of the respective manufacturer, the use of Website: http://ts.fujitsu.com which by third parties for their own purposes may infringe the rights of such owner. For further information see ts.fujitsu.com/terms_of_use.html Copyright © Fujitsu Technology Solutions GmbH 2011 Page 10 of 10