SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
White Paper




EMC VNXe FILE DEDUPLICATION AND
COMPRESSION
Overview




              Abstract
              This white paper describes EMC® VNXe™ File Deduplication and
              Compression, a VNXe system feature that increases the
              efficiency with which network-attached storage (NAS) data is
              stored. This paper outlines the advantages of increased storage
              efficiency. It also explains how File Deduplication and
              Compression work, the principles and technical architecture
              behind it, and how it interacts with other VNXe features and
              functions.

              October 2011
Copyright © 2011 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as
of its publication date. The information is subject to change
without notice.

The information in this publication is provided “as is.” EMC
Corporation makes no representations or warranties of any kind
with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or
fitness for a particular purpose.

Use, copying, and distribution of any EMC software described in
this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.

VMware, VMware vCenter, and VMware View are registered
trademarks or trademarks of VMware, Inc. in the United States
and/or other jurisdictions. All other trademarks used herein are
the property of their respective owners.

Part Number h8966




       EMC VNXe’s Data Deduplication and Compression – Overview    2
Table of Contents
Executive summary ....................................................................................... 4
   Audience............................................................................................................................ 4
Terminology ................................................................................................. 4
Detailed Overview ......................................................................................... 5
   Data reduction technology choices..................................................................................... 5
   Minimizing client impact .................................................................................................... 6
     Targeting Files ................................................................................................................ 6
     CPU Throttling ................................................................................................................ 7
     CIFS Compression .......................................................................................................... 7
   Access Time ....................................................................................................................... 9
   Length of time in days the file has not been accessed ........................................................ 9
   The space reduction process............................................................................................ 10
Management .............................................................................................. 13
   Enabling/Disabling Deduplication and Compression........................................................ 13
   Deduplication Settings..................................................................................................... 16
   Deduplication Status and Statistics.................................................................................. 16
Client input/output to space-reduced files....................................................... 17
   Read access ..................................................................................................................... 17
   Write access..................................................................................................................... 18
Interoperability with other VNXe features ........................................................ 18
   Network and VNXe NDMP PAX (tar and dump) .................................................................. 18
   Point-in-time views of the storage .................................................................................... 19
   Replication....................................................................................................................... 19
   VNXe File-Level Retention................................................................................................. 20
   File Deduplication and Compression in a VMware Environment........................................ 20
Deployment considerations .......................................................................... 20
Conclusion ................................................................................................ 21
References................................................................................................. 21




                                             EMC VNXe’s Data Deduplication and Compression – Overview                                       3
Executive summary
As storage usage increases exponentially, improving storage efficiency is critical for
many customers today. There are many viable solutions to achieve this objective. The
solution you implement is based on what you believe is a suitable or effective
approach to achieve storage efficiency in your production environment. If you are
looking to reduce the cost of storing data at the file-level without affecting end-user
experience, you should consider the EMC® VNXe™ File Deduplication and
Compression feature.

The File Deduplication and Compression feature increases file storage efficiency by
eliminating redundant data from the files stored in Shared-Folder or VMware NFS-
Datastore storage, thereby saving storage space and cost. For each, file-level
deduplication gives the Storage Processor (SP) the ability to process files in order to
compress them, and the ability to share the same instance of the data when the files
are identical. This feature is applied to Shared Folder and VMware NFS datastore
storage, and is transparent to access protocols.
This whitepaper introduces File Deduplication and Compression’s product design,
which offers ease of management and intelligent storage efficiency. This paper
illustrates the concepts of File Deduplication and Compression and shows to manage
deduplication on VNXe. The feature is designed to be a simple but powerful addition
to an already strong portfolio of VNXe features. Administrators can deploy this feature
in existing VNXe storage environments. With this in mind, the paper discusses the
interoperability between File Deduplication and Compression, and other VNXe
features.

Audience
This white paper is intended for use by EMC field personnel and customers who are
familiar with EMC VNXe technology, but are not familiar with VNXe File Deduplication
and Compression.


Terminology
•   Common Internet File System (CIFS) – An access protocol that allows users to access files
    and folders from Windows hosts located on a network. User authentication is maintained
    through Active Directory, and file access is determined by directory access controls.

•   Deduplication – The process used to compress redundant data, allowing space to be saved
    on a storage resource. When multiple files have identical data, the storage resource stores
    only one copy of the data, and shares that data between the multiple files.




                             EMC VNXe’s Data Deduplication and Compression – Overview           4
•   Network File System (NFS) – An access protocol that allows users to access files and folders
    from Linux/UNIX hosts located on a network.

•   Network Data Management Protocol (NDMP) –A protocol that provides a standard for
    backing up file servers on a network. NDMP allows centralized applications to back up file
    servers running on various platforms and platform versions. NDMP reduces network
    congestion by isolating control path traffic from data path traffic, which permits centrally
    managed and monitored local backup operations.

•   Portable Archive Interchange (PAX) – An AIEEE/Posix standard archive protocol that works
    with standard UNIX tape formats and provides file-level backup and recovery operations.

•   Share – A named, mountable instance of shared-folder storage, accessible through a
    shared folder or VMware NFS datastore. Each share is accessible through the protocol (NFS
    or CIFS) defined for the shared folder where it resides.

•   Shared folder – A VNXe storage resource that provides access to individual file systems for
    sharing files and folders. Shared folders contain either Windows shares (which transfer data
    according to the CIFS protocol) or NFS shares (which transfer data according to the NFS
    protocol). NFS shares are sometimes referred to as NFS exports.

•   Snapshot – A read-only, point-in-time copy of data stored on the storage system. You can
    recover files from snapshots or restore a storage element to a snapshot.

•   Storage processor (SP) – A hardware component that provides the processing resources for
    performing storage operations such as creating, managing, and monitoring storage
    resources.

•   Unisphere™ – A web-based management environment for creating storage resources,
    configuring and scheduling protection for stored data, and managing and monitoring other
    storage operations on VNXe systems.




Detailed Overview
Data reduction technology choices
There are a number of technologies classified under data reduction or deduplication.
Two major data reduction technologies utilized in the VNXe are:

•   File-level deduplication – This provides relatively modest space savings. It does not require
    substantial CPU and memory resources to implement.




                              EMC VNXe’s Data Deduplication and Compression – Overview             5
•   Compression – This is often regarded as a separate mechanism from deduplication.
    However, compression can also be viewed as variable, bit-level, intra-object deduplication.
    Compression alters the way data is stored, mainly to improve storage efficiency. In fact, it
    offers, by far, the greatest space savings of all the techniques listed for typical NAS data,
    and is relatively modest in terms of its resource footprint. It is relatively CPU-intensive but
    requires very little memory.

File Deduplication and Compression combines file-level deduplication and data compression
technologies to provide a maximum benefit for the resources that are required to provide
storage efficiency, while minimizing the client impact on mission-critical files. This feature
processes file data, not metadata. If multiple files contain the same data but different names,
the files are deduplicated. Deduplicated files can also have different permissions and
timestamps.

For example, if there are 50 unique files that can be deduplicated in Shared Folder storage, 50
unique files will still exist but the data will be compressed, yielding space savings. If there are
70 identical copies of a presentation document in the storage resource, 70 files will still exist
but they will all share the same file data. In this example, the data usage will decrease by a
factor of almost 70. In addition, the one instance of the file data shared by the 70 files will also
be compressed, providing further space savings.


Minimizing client impact
VNXe performs all deduplication processing as a background asynchronous operation that acts
on file data after it is written into the primary storage. It does not process data while file data is
being written into the storage resource. This avoids latency in the client side, because access to
production data is sensitive to latency.

In addition to processing data in the background, File Deduplication and Compression avoids
processing the hot data in storage. The hot data is any file that is in active use by clients. Note
that hot data is defined by how recently clients accessed or modified the files. By not
processing active files, you avoid introducing any performance penalty on the data that clients
and users are using to run their business. Typically only a small proportion of the data in
production storage is in active use. This means that VNXe File Deduplication and Compression
processes the bulk of the data in storage without affecting the production workload.

The policy engine uses a pre-defined default policy to scan the files in a deduplication-enabled
storage resource. This is based on analysis of how typical files age from active use to disuse in
various company settings of different industry and sector types.


Targeting Files
The File Deduplication and Compression feature targets older files and avoids new files that are
considered active. Therefore, this feature scans each storage resource that is marked to be
processed no more than once a day. Administrators can prompt the system to scan a specific
storage resource immediately, if required.




                              EMC VNXe’s Data Deduplication and Compression – Overview                6
File size is also a criterion to decide whether a file is a candidate for deduplication. File
Deduplication and Compression scans a file for its minimum and maximum size to determine if
it should be deduplicated.

Administrators can define filters for VNXe to avoid processing file based on the file extension.
Administrators can also define filters based on pathname to restrict directories from being
scanned.


CPU Throttling
The impact of the deduplication processing on VNXe is controlled through an automated
scheduling mechanism; this mechanism is self-regulating based on the CPU load. Each SP
scans and deduplicates only one storage resource at a time. While scanning and deduplicating
files, if VNXe detects that the CPU load of the SP exceeds a system-defined threshold, the
process throttles its activity to a minimal level until it detects that the CPU load has decreased
to less than a low-activity threshold. This means that the deduplication and reduplication
processes effectively consume CPU cycles that would otherwise be idle, and does not affect the
system’s ability to satisfy client activity.


CIFS Compression
The CIFS compression attribute feature allows Windows CIFS clients to manually deduplicate
and reduplicate individual files or directories that do not meet the defined policy. Windows
Explorer shows the names of files marked in a different color (the default is blue). The system
attempts to preserve a deduplicated file in the deduplicated state even when it is written to. As
soon as deduplication is disabled, CIFS compression is no longer allowed until the
administrator enables or suspends deduplication.

The VNXe system shows deduplicated files as being compressed to the CIFS client. As shown in
Figure 1, CIFS clients can deduplicate and reduplicate individual files and directories via the
Windows Explorer advanced attribute, Compress contents to save disk space. To do this, right-
click a non-deduplicated file, and select Properties > Advanced > Compress contents to save
disk space. Marking a directory only affects new files that are added to the directory. It does
not affect existing files in the directory.




                             EMC VNXe’s Data Deduplication and Compression – Overview              7
Figure 1 Manually compression files




 EMC VNXe’s Data Deduplication and Compression – Overview   8
A full list of the VNXe deduplication settings can be seen shown in Table 1.

          Table 1 VNXe File Deduplication and Compression Policy Criteria


           Setting                            Definition                                                Default
                                                                                                        Value

                                                                                             30 days
           Access                             Length of time in days the file
           Time                               has not been accessed

Modification Time                  Length of time in days the file has not been modified     30 days

Minimum Size                       Files below this size will not be deduplicated            24 KB

Maximum Size                       Files greater than this size will not be deduplicated     8 TB

Detection Method                   Hash used to detect duplicate files                       SHA1

Case sensitive                     CIFS environment compatibility                            Disabled

File Extension Exclude List*       Files with the following extensions will not be           No extensions
                                   deduplicated

Pathname Exclude List*             Files with the following pathname will not be             No pathnames
                                   deduplicated

Minimum Scan Interval              Frequency with which the deduplication policy             7 days
                                   engine will scan a deduplication-enabled storage
                                   resource

Protection Storage Threshold At what usage capacity percentage of the storage                90%
                             resource’s Protection Storage will the space
                             reduction process not proceed

High CPU Throttle                  If the CPU reaches this level, deduplication will         75%
                                   throttle down

Low CPU Throttle                   If deduplication is throttled down and the CPU level      40%
                                   returns to this level, deduplication will throttle back
                                   up

CIFS Compression Attribute         Enables CIFS deduplication                                Enabled

Backup Data Threshold              At what percentage of the logical size of the file        90%
                                   should the space-reduced size be in order for NDMP
                                   to back up the file in its space-reduced format

                 * Settings are configurable in Unisphere.




                                              EMC VNXe’s Data Deduplication and Compression – Overview            9
The space reduction process
File Deduplication and Compression has a policy engine that specifies data for exclusion from
processing on Shared Folder or VMware NFS Datastore storage and decides whether to
deduplicate specific files based on the policy criteria. Figure 2 depicts a typical Shared Folder
storage. It contains files with different attributes (i.e. last modified date, file size, and type).




                     Figure 2 Shared Folder storage contains files with different attribute s


When File Deduplication and Compression is enabled, it periodically scans the primary storage
for files that match the policy criteria (shown in Figure 3). The hidden store is dynamically
allocated to hold deduplicated data and the hash table used for single instancing.




                               Figure 3 Scanning a file against the policy criteria




                                EMC VNXe’s Data Deduplication and Compression – Overview               10
If the file meets the policy criteria, it is copied to the hidden store for compression and it is
hashed. The hash is used to compare with other hashes in the hash table to determine if there
is another copy the policy engine has processed before. Since it is a unique hash, the hash is
stored in the hash table for later comparisons (shown in Figure 4).




                           Figure 4 File meets policy criteria and gets processe d


The space that the file data occupied in the user portion of the primary storage is freed and the
file’s internal metadata is updated to reference the copy of the data in the hidden store (shown
in Figure 5).




                        Figure 5 Stub file points to the file data in the hidden store




                              EMC VNXe’s Data Deduplication and Compression – Overview          11
If the data associated with the file has been identified before, the space it occupies is freed and
the internal file metadata is updated to reference the instance already in the hidden store
(shown in Figure 6).




                                   Figure 6 A duplicate file is detecte d


Note that the VNXe storage system detects non-compressible files and stores them in their
original form. However, these files can still benefit from file-level deduplication.




                              EMC VNXe’s Data Deduplication and Compression – Overview           12
Management
You can manage the File Deduplication and Compression feature through the VNXe Unisphere
Manager graphical user interface (GUI) or command line interface (CLI).


Enabling/Disabling Deduplication and Compression
File Deduplication and Compression can be enabled when provisioning Shared Folder or
VMware (NFS Datastore) storage. To enable this feature for Shared Folder storage, select the
Enable Deduplication and Compression checkbox in the Configure Shared Folder Attributes
wizard pane during storage creation (see Figure 7).




          Figure 7 Enable File Deduplication and Compression for Share d Folder storage


To enable this feature for VMware NFS Datastore storage, select the Enable Deduplication and
Compression checkbox in the Specify File System Type wizard pane during storage creation
(see Figure 8).




                                EMC VNXe’s Data Deduplication and Compression – Overview       13
Figure 8 Enabling feature whe n provisioning storage


Alternatively, this feature can be enabled after the storage resource has been created by
choosing the storage resource and selecting Details > Deduplication tab > Enable (shown in
Figure 9).




                        Figure 9 Enabling feature on existing storage




                              EMC VNXe’s Data Deduplication and Compression – Overview       14
Enabling this feature prompts an immediate scan of the storage resource. However, if another
scan is already in progress on the storage server, it will not be initiated. Both Enable and
Enable in the Suspend mode options allows Windows CIFS clients to make use of the Windows
Explorer advanced attribute, "Compress content to save disk space".

After you enable a storage resource for deduplication, File Deduplication and Compression
scans it periodically and looks for more files to deduplicate. You can view the state of the
deduplication process in the storage resource’s Deduplication tab.

The default File Deduplication and Compression state for a storage resource is ”Disabled”.
When a storage resource is in the “Disabled” state, it has no deduplicated files and the policy
engine does not scan for files to deduplicate.

The “Enabled” state indicates that the File Deduplication and Compression processing is
enabled for the storage resource. When a storage resource is in the “Enabled” state, it may
contain deduplicated files, and the policy engine scans the source resource for more files to
deduplicate on its next scheduled run.

The “Suspended” state means that the File Deduplication and Compression processing is
suspended for the storage resource. When a source resource is in the “Suspended” state, it
may contain deduplicated files. However, the policy engine does not scan to look for any more
files to deduplicate.

Administrators can switch between deduplication states at any time. If they switch the
deduplication state from “Enabled” or “Suspended” to “Disabled”, it prompts the system to
reduplicate all deduplicated files in the source resource (shown in Figure 10). The system
checks whether there is sufficient space available in the source resource to complete this
process (without filling the source resource completely) before accepting the request to turn
deduplication “Disabled”.




                          Figure 10 Message to reduplicate all de duplicated files

Users can initiate a deduplicate scan manually by selecting the “Force Rescan Now” button.




                             EMC VNXe’s Data Deduplication and Compression – Overview             15
If the space is not sufficient, the system informs the administrator about the amount of
additional space that is required to complete the operation, and recommends the extension of
the storage resource.


Deduplication Settings
In Unisphere, policy settings are configured at the storage resource level. The Deduplication tab
is found in the Details page of the storage resource. There are two policy criteria that are
configurable by the administrator:

    •   Excluded File Extensions - If certain file extensions need to be excluded from the
        deduplication process, set up the file extension exclusion list before running
        deduplication for the first time. If you add exclusions after deduplication has been run
        on a storage resource, the new exclusions take effect from that point forward and do not
        affect files that were deduplicated in previous scans. File extensions must include the
        leading dot.

    •   Excluded Paths - If you want to exclude pathnames from the deduplication process, add
        the pathname exclusion list before running deduplication for the first time. If exclusions
        are added after deduplication has been initiated on a storage resource, these exclusions
        take effect from that point forward. Files that were deduplicated in previous scans are
        not affected. The forward slash (/) is a valid directory delimiter within a single
        pathname.


Deduplication Status and Statistics
As shown in Figure 11, VNXe displays the results of the deduplication process on the data in
the source resource with the following statistics:

•   State - Enabled or disabled.
•   Status – Running, In Progress, or Idle.
•   Last Scan - When the last successful scan of the storage resource was completed.
•   Files deduplicated - Number of files processed by the deduplication policy engine to save
    space. It also shows the percentage of deduplicated files.
•   Saved – Amount (before and after), and percentage of space saved by deduplication.




                              EMC VNXe’s Data Deduplication and Compression – Overview          16
Figure 11 Status of Deduplication Scan



When scanning a storage resource enabled for deduplication the first time, the statistics shown
are reported in real time. After the first scan, statistics are reported as static values based on
the last successful scan.


Client input/output to space-reduced files
The File Deduplication and Compression feature does not affect the client input/output (I/O) to
files that have not been deduplicated. The feature does not introduce any additional overhead
for access to files that it has not processed. The default policy is designed to filter out files that
have frequent I/O access and thus avoid adversely affecting the time required to access those
files.

Read access
Read access to deduplicated files is satisfied by decompressing the data in memory and
passing it back to the client. VNXe does not decompress or alter any data on disk in response
to client-read activity. In addition, random reads require decompression of the requested
portion of the file data and not of the entire file data. Reading a file that is not deduplicated can
take longer than reading a deduplicated file, because of the decompression activity. However,
the opposite may also be true. Reading a deduplicated file is sometimes faster than reading a
file that is not deduplicated, because less data needs to be read from the disk, which more
than offsets the increased CPU activity associated with decompressing the data.




                               EMC VNXe’s Data Deduplication and Compression – Overview              17
Write access
A client request to write to or modify a deduplicated file causes the requested portion of the file
to reduplicate (decompress) in the storage resource. A write to or a modification of a
deduplicated file causes a copy of the requested portion of the file data to be reduplicated for
this particular instance while preserving the deduplicated data for the remaining references to
this file. The following three factors mitigate this effect:

•   Most applications do not modify files. They typically make a local copy, modify it, and when
    finished, write the entire new file back to the file server, discarding the old copy in the
    process. Therefore, reduplication on the file server does not occur; it is just replaced.
•   VNXe avoids processing active files (accessed or modified recently) based on policy
    definitions. Hence, deduplicated files are less likely to be modified and, if they are,
    performance is less likely to be a critical factor.
•   When a client writes to a deduplicated file, the SP writes only the individual blocks of text
    that have changed. The entire file is not decompressed and reduplicated on disk until the
    number of individual changed blocks plus the number of blocks in the corresponding
    deduplicated file is larger than the logical file size.


Interoperability with other VNXe features
Network and VNXe NDMP PAX (tar and dump)
When backing up over the network through CIFS or NFS, space-reduced files are filtered for their
compressibility. This determines if the space-reduced files are to be backed up in their
compressed format or in their original form. This is done so as not to affect the restore time on
files that provide minimal amount of savings. Space-reduced files are not reduplicated to their
original size for transfer to the backup application, and they are not reduplicated on disk. The
space-saving storage efficiency benefits realized in the VNXe deduplicated production storage
do not flow through to backups when using network-based backups. When backing up
deduplication-enabled storage with NDMP PAX, the compressed version of the file is backed up
and all instances of the deduplicated file appear in the backup stream. It is worth noting that in
most cases the majority of space savings come from compression, which means that the
majority of the space savings made in storage resource apply to the backup as well.

In NDMP PAX backups, a threshold is used to check on the compressibility of each space-
reduced file before sending the file data compressed. There are cases when it is better to back
up the file uncompressed, especially when there is sparsely-written data associated with the
file. By default, the deduplicated size of the file must be 90 percent (or less) than the original
size of the file at the time it was deduplicated.

This ensures that compressible files are backed up in their compressed format, and non-
compressible files are backed up in their un-deduplicated format. (The small amount of space
that would be saved on tape does not justify the extra amount it would take to restore the files
from backup.)




                              EMC VNXe’s Data Deduplication and Compression – Overview             18
When restoring files from a PAX-based NDMP backup, the files are restored in their compressed
format. This feature also provides right-sizing capabilities because an administrator can back
up the deduplicated files from one storage resource and restore them onto a smaller storage
resource. This feature is independent of the backup applications.


Point-in-time views of the storage
The deduplication process releases space in the production storage immediately. However,
blocks may be copied to the protection storage in the process. Deduplicating data associated
with a file involves copying the data within the storage resource to the hidden and reduced
data store, so that file-level deduplication and compression can be applied to it.

Because snapshots use a “copy on first modify” principle, the blocks that get deduplicated are
copied to the protection storage first. This is done to preserves the previous snapshot’s point-
in-time view of the storage resource. These blocks are freed when the corresponding snapshot
gets deleted or refreshed, and are available for reuse by other snapshots. It is difficult to
predict the number of blocks that will be copied to the protection storage during deduplication,
because this depends on how full the primary storage is and how quickly it is changing.

By default, VNXe is configured to abort deduplication operations on a storage resource before it
causes the protection storage to extend. This avoids the protection storage expanding due to
deduplication activity. If the deduplication process is aborted in this way, an alert is generated
that explains what happened. The VNXe administrator can choose to extend the protection
storage or simply let the deduplication process execute again on its next scheduled run.


Replication
Deduplicating the contents of a storage resource before you replicate it can greatly reduce the
amount of data that is sent over the network as part of the initial-baseline copy process. The
amount of data is transferred over the network depends on the timing of replication updates
and when deduplication runs.

In all but the most extreme circumstances, replication updates are more frequent than
deduplication scans of a storage resource. This means that new and changed data in the
storage resource is usually replicated in its non-deduplicated form first. Furthermore, any
subsequent deduplication of that data prompts additional replication traffic; this is due to the
block changes that were made to the hidden data store of the storage resource.

This is true of any deduplication solution that processes replicated data and updates remote
replicas of the data, because the replication updates occur more frequently than deduplication
scans. The space savings realized by the production storage resource is reflected on the
destination storage resource.




                             EMC VNXe’s Data Deduplication and Compression – Overview              19
VNXe File-Level Retention
You can enable File Deduplication and Compression on enterprise File-Level Retention (FLR-E)
storage without compromising on the protection offered to the data that the storage resources
contain.


File Deduplication and Compression in a VMware Environment
File Deduplication and Compression can be used in a VMware environment to improve storage
efficiency of the virtual machines (VMs). Active VMs are constantly changing states and are
typically bypassed by VNXe’s deduplication policy engine.

VMs that are selected for storage optimization are compressed using the same compression
algorithm that is used in File Deduplication and Compression. The algorithm compresses the
VMDK file (virtual disk) and leaves the other files that make up the VMs intact. When a
compressed VMDK is read, only the data blocks containing the actual data are decompressed.
Likewise, when data is written to the VMDK, instead of compressing it “on the fly”, data is
written to a set-aside buffer, which then gets compressed as a background task. In most
situations, the amount of data read or written is a small percentage of the total amount of data
stored.


Deployment considerations
The following need to be considered when deploying File Deduplication and Compression:

•   File Deduplication and Compression does not deduplicate data across or between
    storage resources.
•   With NDMP PAX, backing up a deduplication-enabled storage results in a shorter
    time window. However, the restore window required increases. File Deduplication
    and Compression does not process or affect alternate data streams (also known
    as named attributes) associated with files and directories in the storage resource.
•   File Deduplication and Compression does not process files less than 24 KB in
    size. The overhead associated with processing such files negates any space
    savings achieved.
•   There is no limit to the size of a storage resource on which File Deduplication and
    Compression can be enabled. However, the storage resource must have at least 1
    MB of free space before deduplication can be enabled. If there is not enough free
    space, an error message is generated and the server log is updated.
•   You can enable File Deduplication and Compression on any number of storage
    resources.
•   You can enable File Deduplication and Compression on existing Shared-Folder
    and VMware NFS-datastore storage. This feature cannot be applied to iSCSI
    storage hosted on the VNXe system.




                              EMC VNXe’s Data Deduplication and Compression – Overview             20
Conclusion
The File Deduplication and Compression feature adds to VNXe’s impressive storage efficiency
capabilities by intelligently reducing space usage and providing further storage efficiency. You
can optimize the use of the existing storage environment for storage resource data through file-
level deduplication and compression.
A combination of file-level deduplication and compression represents the best technique to
provide maximum benefit for the resources consumed. This feature builds on VNXe’s ease of
use by providing a single-click option in Unisphere to enable deduplication for each storage.
File Deduplication and Compression also supports almost all Unisphere features and works in
accordance with maximum-supported VNXe limits.




References

The following whitepapers can be found on the VNXe page on Powerlink:
•   EMC VNXe Series Storage Systems – A Detailed Review
•   EMC VNXe Data Protection – Overview




                             EMC VNXe’s Data Deduplication and Compression – Overview           21

Weitere ähnliche Inhalte

Was ist angesagt?

TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014
TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014
TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014Symantec
 
EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...
EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...
EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...Principled Technologies
 
StartUpOpen 2011 - Projekat13
StartUpOpen 2011 - Projekat13StartUpOpen 2011 - Projekat13
StartUpOpen 2011 - Projekat13BlogOpen
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and PropertiesSaadi Rahman
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernelguest547d74
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architectureArpan Baishya
 
Memory management in vx works
Memory management in vx worksMemory management in vx works
Memory management in vx worksDhan V Sagar
 
Distributed operating system amoeba case study
Distributed operating system  amoeba case studyDistributed operating system  amoeba case study
Distributed operating system amoeba case studyRamuAryan
 
Chapter 9 OS
Chapter 9 OSChapter 9 OS
Chapter 9 OSC.U
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4alixafar
 
Case study operating systems
Case study operating systemsCase study operating systems
Case study operating systemsAkhil Bevara
 

Was ist angesagt? (20)

TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014
TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014
TECHNICAL BRIEF▶ NDMP Backups with Backup Exec 2014
 
EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...
EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...
EMC XtremIO storage array 4.0 and VMware vSphere 6.0: Scaling mixed-database ...
 
StartUpOpen 2011 - Projekat13
StartUpOpen 2011 - Projekat13StartUpOpen 2011 - Projekat13
StartUpOpen 2011 - Projekat13
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and Properties
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
Case study
Case studyCase study
Case study
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
Emc storag
Emc storagEmc storag
Emc storag
 
Barrelfish OS
Barrelfish OS Barrelfish OS
Barrelfish OS
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
UNIT 3.docx
UNIT 3.docxUNIT 3.docx
UNIT 3.docx
 
M2tech CNBS server
M2tech CNBS serverM2tech CNBS server
M2tech CNBS server
 
Memory management in vx works
Memory management in vx worksMemory management in vx works
Memory management in vx works
 
ch11
ch11ch11
ch11
 
IEEExeonmem
IEEExeonmemIEEExeonmem
IEEExeonmem
 
Distributed operating system amoeba case study
Distributed operating system  amoeba case studyDistributed operating system  amoeba case study
Distributed operating system amoeba case study
 
Chapter 9 OS
Chapter 9 OSChapter 9 OS
Chapter 9 OS
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4
 
Case study operating systems
Case study operating systemsCase study operating systems
Case study operating systems
 
2. microkernel new
2. microkernel new2. microkernel new
2. microkernel new
 

Andere mochten auch

Slideshare Bse 2012
Slideshare Bse 2012Slideshare Bse 2012
Slideshare Bse 2012Yvonne Allan
 
Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...
Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...
Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...Marco Frullanti
 
Rich snippets prezentacija
Rich snippets prezentacijaRich snippets prezentacija
Rich snippets prezentacijaAddvisors
 
Introduction - Lab Report
Introduction - Lab ReportIntroduction - Lab Report
Introduction - Lab ReportQuanina Quan
 
教案分享 拼出四等分拼圖Ppt
教案分享  拼出四等分拼圖Ppt教案分享  拼出四等分拼圖Ppt
教案分享 拼出四等分拼圖Ppt浩哲 武
 
Texas s ta r powerpoint
Texas  s ta r powerpointTexas  s ta r powerpoint
Texas s ta r powerpointHalogen30
 
Inorganic enzyme - a new approach of origin of life
Inorganic enzyme - a new approach of origin of lifeInorganic enzyme - a new approach of origin of life
Inorganic enzyme - a new approach of origin of lifehuangxiaolan
 
Business models for a progressive world, 2011 highlights
Business models for a progressive world, 2011 highlightsBusiness models for a progressive world, 2011 highlights
Business models for a progressive world, 2011 highlightsAna Karina T
 
Cci report on indian retail
Cci report on indian retailCci report on indian retail
Cci report on indian retailMamta Binani
 
Sato all in-one-training
Sato all in-one-trainingSato all in-one-training
Sato all in-one-trainingPROFIBLOG
 
บทที่ 4
บทที่ 4บทที่ 4
บทที่ 4einscream
 
Crowne Plaza Tysons Corner
Crowne Plaza Tysons CornerCrowne Plaza Tysons Corner
Crowne Plaza Tysons Cornerjenlynn10
 
Historia fotografia begoña villazón
Historia fotografia begoña villazónHistoria fotografia begoña villazón
Historia fotografia begoña villazónbego92villazon
 
система профориентации и основные её направления
система профориентации и основные её направлениясистема профориентации и основные её направления
система профориентации и основные её направленияТатьяна Глинская
 
The Power of Wear Rings - Pumps & Systems Magazine
The Power of Wear Rings - Pumps & Systems MagazineThe Power of Wear Rings - Pumps & Systems Magazine
The Power of Wear Rings - Pumps & Systems Magazinerobertaronen
 
Mobile Research Goes To The Game - Paper
Mobile Research Goes To The Game - PaperMobile Research Goes To The Game - Paper
Mobile Research Goes To The Game - PaperResearch Now
 

Andere mochten auch (16)

Slideshare Bse 2012
Slideshare Bse 2012Slideshare Bse 2012
Slideshare Bse 2012
 
Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...
Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...
Federmanager Bologna Personal branding 8 marzo - Presentazione di Massimo Mat...
 
Rich snippets prezentacija
Rich snippets prezentacijaRich snippets prezentacija
Rich snippets prezentacija
 
Introduction - Lab Report
Introduction - Lab ReportIntroduction - Lab Report
Introduction - Lab Report
 
教案分享 拼出四等分拼圖Ppt
教案分享  拼出四等分拼圖Ppt教案分享  拼出四等分拼圖Ppt
教案分享 拼出四等分拼圖Ppt
 
Texas s ta r powerpoint
Texas  s ta r powerpointTexas  s ta r powerpoint
Texas s ta r powerpoint
 
Inorganic enzyme - a new approach of origin of life
Inorganic enzyme - a new approach of origin of lifeInorganic enzyme - a new approach of origin of life
Inorganic enzyme - a new approach of origin of life
 
Business models for a progressive world, 2011 highlights
Business models for a progressive world, 2011 highlightsBusiness models for a progressive world, 2011 highlights
Business models for a progressive world, 2011 highlights
 
Cci report on indian retail
Cci report on indian retailCci report on indian retail
Cci report on indian retail
 
Sato all in-one-training
Sato all in-one-trainingSato all in-one-training
Sato all in-one-training
 
บทที่ 4
บทที่ 4บทที่ 4
บทที่ 4
 
Crowne Plaza Tysons Corner
Crowne Plaza Tysons CornerCrowne Plaza Tysons Corner
Crowne Plaza Tysons Corner
 
Historia fotografia begoña villazón
Historia fotografia begoña villazónHistoria fotografia begoña villazón
Historia fotografia begoña villazón
 
система профориентации и основные её направления
система профориентации и основные её направлениясистема профориентации и основные её направления
система профориентации и основные её направления
 
The Power of Wear Rings - Pumps & Systems Magazine
The Power of Wear Rings - Pumps & Systems MagazineThe Power of Wear Rings - Pumps & Systems Magazine
The Power of Wear Rings - Pumps & Systems Magazine
 
Mobile Research Goes To The Game - Paper
Mobile Research Goes To The Game - PaperMobile Research Goes To The Game - Paper
Mobile Research Goes To The Game - Paper
 

Ähnlich wie White Paper: EMC VNXe File Deduplication and Compression

White Paper: EMC VNXe Data Protection — A Detailed Review
White Paper: EMC VNXe Data Protection — A Detailed Review   White Paper: EMC VNXe Data Protection — A Detailed Review
White Paper: EMC VNXe Data Protection — A Detailed Review EMC
 
EMC VNX FAST VP
EMC VNX FAST VP EMC VNX FAST VP
EMC VNX FAST VP EMC
 
EMC FAST VP for Unified Storage Systems
EMC FAST VP for Unified Storage Systems EMC FAST VP for Unified Storage Systems
EMC FAST VP for Unified Storage Systems EMC
 
EMC FAST Cache
EMC FAST Cache EMC FAST Cache
EMC FAST Cache EMC
 
White Paper: Introduction to VFCache
White Paper: Introduction to VFCache   White Paper: Introduction to VFCache
White Paper: Introduction to VFCache EMC
 
Ficha Tecnica EMC VNX5400
Ficha Tecnica EMC VNX5400Ficha Tecnica EMC VNX5400
Ficha Tecnica EMC VNX5400RaGaZoMe
 
Whitepaper : EMC Federated Tiered Storage (FTS)
Whitepaper : EMC Federated Tiered Storage (FTS) Whitepaper : EMC Federated Tiered Storage (FTS)
Whitepaper : EMC Federated Tiered Storage (FTS) EMC
 
White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices   White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices EMC
 
White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices   White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices EMC
 
Life Sciences at RENCI - Big Data IT to Manage, Decipher, and Inform
Life Sciences at RENCI - Big Data IT to Manage, Decipher, and InformLife Sciences at RENCI - Big Data IT to Manage, Decipher, and Inform
Life Sciences at RENCI - Big Data IT to Manage, Decipher, and InformEMC
 
White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...
White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...
White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...EMC
 
White Paper: EMC Isilon OneFS — A Technical Overview
White Paper: EMC Isilon OneFS — A Technical Overview   White Paper: EMC Isilon OneFS — A Technical Overview
White Paper: EMC Isilon OneFS — A Technical Overview EMC
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesSudarshan Dhondaley
 
VNX with the Cloud Tiering Appliance
VNX with the Cloud Tiering Appliance VNX with the Cloud Tiering Appliance
VNX with the Cloud Tiering Appliance EMC
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage systemZhichao Liang
 
Net App Unified Storage Architecture
Net App Unified Storage ArchitectureNet App Unified Storage Architecture
Net App Unified Storage ArchitectureSamantha_Roehl
 
Net App Unified Storage Architecture
Net App Unified Storage ArchitectureNet App Unified Storage Architecture
Net App Unified Storage Architecturenburgett
 

Ähnlich wie White Paper: EMC VNXe File Deduplication and Compression (20)

White Paper: EMC VNXe Data Protection — A Detailed Review
White Paper: EMC VNXe Data Protection — A Detailed Review   White Paper: EMC VNXe Data Protection — A Detailed Review
White Paper: EMC VNXe Data Protection — A Detailed Review
 
EMC VNX FAST VP
EMC VNX FAST VP EMC VNX FAST VP
EMC VNX FAST VP
 
EMC FAST VP for Unified Storage Systems
EMC FAST VP for Unified Storage Systems EMC FAST VP for Unified Storage Systems
EMC FAST VP for Unified Storage Systems
 
EMC FAST Cache
EMC FAST Cache EMC FAST Cache
EMC FAST Cache
 
White Paper: Introduction to VFCache
White Paper: Introduction to VFCache   White Paper: Introduction to VFCache
White Paper: Introduction to VFCache
 
Ficha Tecnica EMC VNX5400
Ficha Tecnica EMC VNX5400Ficha Tecnica EMC VNX5400
Ficha Tecnica EMC VNX5400
 
Whitepaper : EMC Federated Tiered Storage (FTS)
Whitepaper : EMC Federated Tiered Storage (FTS) Whitepaper : EMC Federated Tiered Storage (FTS)
Whitepaper : EMC Federated Tiered Storage (FTS)
 
H8520 vnx-family-ds
H8520 vnx-family-dsH8520 vnx-family-ds
H8520 vnx-family-ds
 
White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices   White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices
 
White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices   White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices
 
Life Sciences at RENCI - Big Data IT to Manage, Decipher, and Inform
Life Sciences at RENCI - Big Data IT to Manage, Decipher, and InformLife Sciences at RENCI - Big Data IT to Manage, Decipher, and Inform
Life Sciences at RENCI - Big Data IT to Manage, Decipher, and Inform
 
White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...
White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...
White Paper: Optimizing Primary Storage Through File Archiving with EMC Cloud...
 
White Paper: EMC Isilon OneFS — A Technical Overview
White Paper: EMC Isilon OneFS — A Technical Overview   White Paper: EMC Isilon OneFS — A Technical Overview
White Paper: EMC Isilon OneFS — A Technical Overview
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 Notes
 
VNX with the Cloud Tiering Appliance
VNX with the Cloud Tiering Appliance VNX with the Cloud Tiering Appliance
VNX with the Cloud Tiering Appliance
 
Vmware san connectivity
Vmware san connectivityVmware san connectivity
Vmware san connectivity
 
EMC VNX
EMC VNXEMC VNX
EMC VNX
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 
Net App Unified Storage Architecture
Net App Unified Storage ArchitectureNet App Unified Storage Architecture
Net App Unified Storage Architecture
 
Net App Unified Storage Architecture
Net App Unified Storage ArchitectureNet App Unified Storage Architecture
Net App Unified Storage Architecture
 

Mehr von EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Mehr von EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Kürzlich hochgeladen

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

White Paper: EMC VNXe File Deduplication and Compression

  • 1. White Paper EMC VNXe FILE DEDUPLICATION AND COMPRESSION Overview Abstract This white paper describes EMC® VNXe™ File Deduplication and Compression, a VNXe system feature that increases the efficiency with which network-attached storage (NAS) data is stored. This paper outlines the advantages of increased storage efficiency. It also explains how File Deduplication and Compression work, the principles and technical architecture behind it, and how it interacts with other VNXe features and functions. October 2011
  • 2. Copyright © 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. VMware, VMware vCenter, and VMware View are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners. Part Number h8966 EMC VNXe’s Data Deduplication and Compression – Overview 2
  • 3. Table of Contents Executive summary ....................................................................................... 4 Audience............................................................................................................................ 4 Terminology ................................................................................................. 4 Detailed Overview ......................................................................................... 5 Data reduction technology choices..................................................................................... 5 Minimizing client impact .................................................................................................... 6 Targeting Files ................................................................................................................ 6 CPU Throttling ................................................................................................................ 7 CIFS Compression .......................................................................................................... 7 Access Time ....................................................................................................................... 9 Length of time in days the file has not been accessed ........................................................ 9 The space reduction process............................................................................................ 10 Management .............................................................................................. 13 Enabling/Disabling Deduplication and Compression........................................................ 13 Deduplication Settings..................................................................................................... 16 Deduplication Status and Statistics.................................................................................. 16 Client input/output to space-reduced files....................................................... 17 Read access ..................................................................................................................... 17 Write access..................................................................................................................... 18 Interoperability with other VNXe features ........................................................ 18 Network and VNXe NDMP PAX (tar and dump) .................................................................. 18 Point-in-time views of the storage .................................................................................... 19 Replication....................................................................................................................... 19 VNXe File-Level Retention................................................................................................. 20 File Deduplication and Compression in a VMware Environment........................................ 20 Deployment considerations .......................................................................... 20 Conclusion ................................................................................................ 21 References................................................................................................. 21 EMC VNXe’s Data Deduplication and Compression – Overview 3
  • 4. Executive summary As storage usage increases exponentially, improving storage efficiency is critical for many customers today. There are many viable solutions to achieve this objective. The solution you implement is based on what you believe is a suitable or effective approach to achieve storage efficiency in your production environment. If you are looking to reduce the cost of storing data at the file-level without affecting end-user experience, you should consider the EMC® VNXe™ File Deduplication and Compression feature. The File Deduplication and Compression feature increases file storage efficiency by eliminating redundant data from the files stored in Shared-Folder or VMware NFS- Datastore storage, thereby saving storage space and cost. For each, file-level deduplication gives the Storage Processor (SP) the ability to process files in order to compress them, and the ability to share the same instance of the data when the files are identical. This feature is applied to Shared Folder and VMware NFS datastore storage, and is transparent to access protocols. This whitepaper introduces File Deduplication and Compression’s product design, which offers ease of management and intelligent storage efficiency. This paper illustrates the concepts of File Deduplication and Compression and shows to manage deduplication on VNXe. The feature is designed to be a simple but powerful addition to an already strong portfolio of VNXe features. Administrators can deploy this feature in existing VNXe storage environments. With this in mind, the paper discusses the interoperability between File Deduplication and Compression, and other VNXe features. Audience This white paper is intended for use by EMC field personnel and customers who are familiar with EMC VNXe technology, but are not familiar with VNXe File Deduplication and Compression. Terminology • Common Internet File System (CIFS) – An access protocol that allows users to access files and folders from Windows hosts located on a network. User authentication is maintained through Active Directory, and file access is determined by directory access controls. • Deduplication – The process used to compress redundant data, allowing space to be saved on a storage resource. When multiple files have identical data, the storage resource stores only one copy of the data, and shares that data between the multiple files. EMC VNXe’s Data Deduplication and Compression – Overview 4
  • 5. Network File System (NFS) – An access protocol that allows users to access files and folders from Linux/UNIX hosts located on a network. • Network Data Management Protocol (NDMP) –A protocol that provides a standard for backing up file servers on a network. NDMP allows centralized applications to back up file servers running on various platforms and platform versions. NDMP reduces network congestion by isolating control path traffic from data path traffic, which permits centrally managed and monitored local backup operations. • Portable Archive Interchange (PAX) – An AIEEE/Posix standard archive protocol that works with standard UNIX tape formats and provides file-level backup and recovery operations. • Share – A named, mountable instance of shared-folder storage, accessible through a shared folder or VMware NFS datastore. Each share is accessible through the protocol (NFS or CIFS) defined for the shared folder where it resides. • Shared folder – A VNXe storage resource that provides access to individual file systems for sharing files and folders. Shared folders contain either Windows shares (which transfer data according to the CIFS protocol) or NFS shares (which transfer data according to the NFS protocol). NFS shares are sometimes referred to as NFS exports. • Snapshot – A read-only, point-in-time copy of data stored on the storage system. You can recover files from snapshots or restore a storage element to a snapshot. • Storage processor (SP) – A hardware component that provides the processing resources for performing storage operations such as creating, managing, and monitoring storage resources. • Unisphere™ – A web-based management environment for creating storage resources, configuring and scheduling protection for stored data, and managing and monitoring other storage operations on VNXe systems. Detailed Overview Data reduction technology choices There are a number of technologies classified under data reduction or deduplication. Two major data reduction technologies utilized in the VNXe are: • File-level deduplication – This provides relatively modest space savings. It does not require substantial CPU and memory resources to implement. EMC VNXe’s Data Deduplication and Compression – Overview 5
  • 6. Compression – This is often regarded as a separate mechanism from deduplication. However, compression can also be viewed as variable, bit-level, intra-object deduplication. Compression alters the way data is stored, mainly to improve storage efficiency. In fact, it offers, by far, the greatest space savings of all the techniques listed for typical NAS data, and is relatively modest in terms of its resource footprint. It is relatively CPU-intensive but requires very little memory. File Deduplication and Compression combines file-level deduplication and data compression technologies to provide a maximum benefit for the resources that are required to provide storage efficiency, while minimizing the client impact on mission-critical files. This feature processes file data, not metadata. If multiple files contain the same data but different names, the files are deduplicated. Deduplicated files can also have different permissions and timestamps. For example, if there are 50 unique files that can be deduplicated in Shared Folder storage, 50 unique files will still exist but the data will be compressed, yielding space savings. If there are 70 identical copies of a presentation document in the storage resource, 70 files will still exist but they will all share the same file data. In this example, the data usage will decrease by a factor of almost 70. In addition, the one instance of the file data shared by the 70 files will also be compressed, providing further space savings. Minimizing client impact VNXe performs all deduplication processing as a background asynchronous operation that acts on file data after it is written into the primary storage. It does not process data while file data is being written into the storage resource. This avoids latency in the client side, because access to production data is sensitive to latency. In addition to processing data in the background, File Deduplication and Compression avoids processing the hot data in storage. The hot data is any file that is in active use by clients. Note that hot data is defined by how recently clients accessed or modified the files. By not processing active files, you avoid introducing any performance penalty on the data that clients and users are using to run their business. Typically only a small proportion of the data in production storage is in active use. This means that VNXe File Deduplication and Compression processes the bulk of the data in storage without affecting the production workload. The policy engine uses a pre-defined default policy to scan the files in a deduplication-enabled storage resource. This is based on analysis of how typical files age from active use to disuse in various company settings of different industry and sector types. Targeting Files The File Deduplication and Compression feature targets older files and avoids new files that are considered active. Therefore, this feature scans each storage resource that is marked to be processed no more than once a day. Administrators can prompt the system to scan a specific storage resource immediately, if required. EMC VNXe’s Data Deduplication and Compression – Overview 6
  • 7. File size is also a criterion to decide whether a file is a candidate for deduplication. File Deduplication and Compression scans a file for its minimum and maximum size to determine if it should be deduplicated. Administrators can define filters for VNXe to avoid processing file based on the file extension. Administrators can also define filters based on pathname to restrict directories from being scanned. CPU Throttling The impact of the deduplication processing on VNXe is controlled through an automated scheduling mechanism; this mechanism is self-regulating based on the CPU load. Each SP scans and deduplicates only one storage resource at a time. While scanning and deduplicating files, if VNXe detects that the CPU load of the SP exceeds a system-defined threshold, the process throttles its activity to a minimal level until it detects that the CPU load has decreased to less than a low-activity threshold. This means that the deduplication and reduplication processes effectively consume CPU cycles that would otherwise be idle, and does not affect the system’s ability to satisfy client activity. CIFS Compression The CIFS compression attribute feature allows Windows CIFS clients to manually deduplicate and reduplicate individual files or directories that do not meet the defined policy. Windows Explorer shows the names of files marked in a different color (the default is blue). The system attempts to preserve a deduplicated file in the deduplicated state even when it is written to. As soon as deduplication is disabled, CIFS compression is no longer allowed until the administrator enables or suspends deduplication. The VNXe system shows deduplicated files as being compressed to the CIFS client. As shown in Figure 1, CIFS clients can deduplicate and reduplicate individual files and directories via the Windows Explorer advanced attribute, Compress contents to save disk space. To do this, right- click a non-deduplicated file, and select Properties > Advanced > Compress contents to save disk space. Marking a directory only affects new files that are added to the directory. It does not affect existing files in the directory. EMC VNXe’s Data Deduplication and Compression – Overview 7
  • 8. Figure 1 Manually compression files EMC VNXe’s Data Deduplication and Compression – Overview 8
  • 9. A full list of the VNXe deduplication settings can be seen shown in Table 1. Table 1 VNXe File Deduplication and Compression Policy Criteria Setting Definition Default Value 30 days Access Length of time in days the file Time has not been accessed Modification Time Length of time in days the file has not been modified 30 days Minimum Size Files below this size will not be deduplicated 24 KB Maximum Size Files greater than this size will not be deduplicated 8 TB Detection Method Hash used to detect duplicate files SHA1 Case sensitive CIFS environment compatibility Disabled File Extension Exclude List* Files with the following extensions will not be No extensions deduplicated Pathname Exclude List* Files with the following pathname will not be No pathnames deduplicated Minimum Scan Interval Frequency with which the deduplication policy 7 days engine will scan a deduplication-enabled storage resource Protection Storage Threshold At what usage capacity percentage of the storage 90% resource’s Protection Storage will the space reduction process not proceed High CPU Throttle If the CPU reaches this level, deduplication will 75% throttle down Low CPU Throttle If deduplication is throttled down and the CPU level 40% returns to this level, deduplication will throttle back up CIFS Compression Attribute Enables CIFS deduplication Enabled Backup Data Threshold At what percentage of the logical size of the file 90% should the space-reduced size be in order for NDMP to back up the file in its space-reduced format * Settings are configurable in Unisphere. EMC VNXe’s Data Deduplication and Compression – Overview 9
  • 10. The space reduction process File Deduplication and Compression has a policy engine that specifies data for exclusion from processing on Shared Folder or VMware NFS Datastore storage and decides whether to deduplicate specific files based on the policy criteria. Figure 2 depicts a typical Shared Folder storage. It contains files with different attributes (i.e. last modified date, file size, and type). Figure 2 Shared Folder storage contains files with different attribute s When File Deduplication and Compression is enabled, it periodically scans the primary storage for files that match the policy criteria (shown in Figure 3). The hidden store is dynamically allocated to hold deduplicated data and the hash table used for single instancing. Figure 3 Scanning a file against the policy criteria EMC VNXe’s Data Deduplication and Compression – Overview 10
  • 11. If the file meets the policy criteria, it is copied to the hidden store for compression and it is hashed. The hash is used to compare with other hashes in the hash table to determine if there is another copy the policy engine has processed before. Since it is a unique hash, the hash is stored in the hash table for later comparisons (shown in Figure 4). Figure 4 File meets policy criteria and gets processe d The space that the file data occupied in the user portion of the primary storage is freed and the file’s internal metadata is updated to reference the copy of the data in the hidden store (shown in Figure 5). Figure 5 Stub file points to the file data in the hidden store EMC VNXe’s Data Deduplication and Compression – Overview 11
  • 12. If the data associated with the file has been identified before, the space it occupies is freed and the internal file metadata is updated to reference the instance already in the hidden store (shown in Figure 6). Figure 6 A duplicate file is detecte d Note that the VNXe storage system detects non-compressible files and stores them in their original form. However, these files can still benefit from file-level deduplication. EMC VNXe’s Data Deduplication and Compression – Overview 12
  • 13. Management You can manage the File Deduplication and Compression feature through the VNXe Unisphere Manager graphical user interface (GUI) or command line interface (CLI). Enabling/Disabling Deduplication and Compression File Deduplication and Compression can be enabled when provisioning Shared Folder or VMware (NFS Datastore) storage. To enable this feature for Shared Folder storage, select the Enable Deduplication and Compression checkbox in the Configure Shared Folder Attributes wizard pane during storage creation (see Figure 7). Figure 7 Enable File Deduplication and Compression for Share d Folder storage To enable this feature for VMware NFS Datastore storage, select the Enable Deduplication and Compression checkbox in the Specify File System Type wizard pane during storage creation (see Figure 8). EMC VNXe’s Data Deduplication and Compression – Overview 13
  • 14. Figure 8 Enabling feature whe n provisioning storage Alternatively, this feature can be enabled after the storage resource has been created by choosing the storage resource and selecting Details > Deduplication tab > Enable (shown in Figure 9). Figure 9 Enabling feature on existing storage EMC VNXe’s Data Deduplication and Compression – Overview 14
  • 15. Enabling this feature prompts an immediate scan of the storage resource. However, if another scan is already in progress on the storage server, it will not be initiated. Both Enable and Enable in the Suspend mode options allows Windows CIFS clients to make use of the Windows Explorer advanced attribute, "Compress content to save disk space". After you enable a storage resource for deduplication, File Deduplication and Compression scans it periodically and looks for more files to deduplicate. You can view the state of the deduplication process in the storage resource’s Deduplication tab. The default File Deduplication and Compression state for a storage resource is ”Disabled”. When a storage resource is in the “Disabled” state, it has no deduplicated files and the policy engine does not scan for files to deduplicate. The “Enabled” state indicates that the File Deduplication and Compression processing is enabled for the storage resource. When a storage resource is in the “Enabled” state, it may contain deduplicated files, and the policy engine scans the source resource for more files to deduplicate on its next scheduled run. The “Suspended” state means that the File Deduplication and Compression processing is suspended for the storage resource. When a source resource is in the “Suspended” state, it may contain deduplicated files. However, the policy engine does not scan to look for any more files to deduplicate. Administrators can switch between deduplication states at any time. If they switch the deduplication state from “Enabled” or “Suspended” to “Disabled”, it prompts the system to reduplicate all deduplicated files in the source resource (shown in Figure 10). The system checks whether there is sufficient space available in the source resource to complete this process (without filling the source resource completely) before accepting the request to turn deduplication “Disabled”. Figure 10 Message to reduplicate all de duplicated files Users can initiate a deduplicate scan manually by selecting the “Force Rescan Now” button. EMC VNXe’s Data Deduplication and Compression – Overview 15
  • 16. If the space is not sufficient, the system informs the administrator about the amount of additional space that is required to complete the operation, and recommends the extension of the storage resource. Deduplication Settings In Unisphere, policy settings are configured at the storage resource level. The Deduplication tab is found in the Details page of the storage resource. There are two policy criteria that are configurable by the administrator: • Excluded File Extensions - If certain file extensions need to be excluded from the deduplication process, set up the file extension exclusion list before running deduplication for the first time. If you add exclusions after deduplication has been run on a storage resource, the new exclusions take effect from that point forward and do not affect files that were deduplicated in previous scans. File extensions must include the leading dot. • Excluded Paths - If you want to exclude pathnames from the deduplication process, add the pathname exclusion list before running deduplication for the first time. If exclusions are added after deduplication has been initiated on a storage resource, these exclusions take effect from that point forward. Files that were deduplicated in previous scans are not affected. The forward slash (/) is a valid directory delimiter within a single pathname. Deduplication Status and Statistics As shown in Figure 11, VNXe displays the results of the deduplication process on the data in the source resource with the following statistics: • State - Enabled or disabled. • Status – Running, In Progress, or Idle. • Last Scan - When the last successful scan of the storage resource was completed. • Files deduplicated - Number of files processed by the deduplication policy engine to save space. It also shows the percentage of deduplicated files. • Saved – Amount (before and after), and percentage of space saved by deduplication. EMC VNXe’s Data Deduplication and Compression – Overview 16
  • 17. Figure 11 Status of Deduplication Scan When scanning a storage resource enabled for deduplication the first time, the statistics shown are reported in real time. After the first scan, statistics are reported as static values based on the last successful scan. Client input/output to space-reduced files The File Deduplication and Compression feature does not affect the client input/output (I/O) to files that have not been deduplicated. The feature does not introduce any additional overhead for access to files that it has not processed. The default policy is designed to filter out files that have frequent I/O access and thus avoid adversely affecting the time required to access those files. Read access Read access to deduplicated files is satisfied by decompressing the data in memory and passing it back to the client. VNXe does not decompress or alter any data on disk in response to client-read activity. In addition, random reads require decompression of the requested portion of the file data and not of the entire file data. Reading a file that is not deduplicated can take longer than reading a deduplicated file, because of the decompression activity. However, the opposite may also be true. Reading a deduplicated file is sometimes faster than reading a file that is not deduplicated, because less data needs to be read from the disk, which more than offsets the increased CPU activity associated with decompressing the data. EMC VNXe’s Data Deduplication and Compression – Overview 17
  • 18. Write access A client request to write to or modify a deduplicated file causes the requested portion of the file to reduplicate (decompress) in the storage resource. A write to or a modification of a deduplicated file causes a copy of the requested portion of the file data to be reduplicated for this particular instance while preserving the deduplicated data for the remaining references to this file. The following three factors mitigate this effect: • Most applications do not modify files. They typically make a local copy, modify it, and when finished, write the entire new file back to the file server, discarding the old copy in the process. Therefore, reduplication on the file server does not occur; it is just replaced. • VNXe avoids processing active files (accessed or modified recently) based on policy definitions. Hence, deduplicated files are less likely to be modified and, if they are, performance is less likely to be a critical factor. • When a client writes to a deduplicated file, the SP writes only the individual blocks of text that have changed. The entire file is not decompressed and reduplicated on disk until the number of individual changed blocks plus the number of blocks in the corresponding deduplicated file is larger than the logical file size. Interoperability with other VNXe features Network and VNXe NDMP PAX (tar and dump) When backing up over the network through CIFS or NFS, space-reduced files are filtered for their compressibility. This determines if the space-reduced files are to be backed up in their compressed format or in their original form. This is done so as not to affect the restore time on files that provide minimal amount of savings. Space-reduced files are not reduplicated to their original size for transfer to the backup application, and they are not reduplicated on disk. The space-saving storage efficiency benefits realized in the VNXe deduplicated production storage do not flow through to backups when using network-based backups. When backing up deduplication-enabled storage with NDMP PAX, the compressed version of the file is backed up and all instances of the deduplicated file appear in the backup stream. It is worth noting that in most cases the majority of space savings come from compression, which means that the majority of the space savings made in storage resource apply to the backup as well. In NDMP PAX backups, a threshold is used to check on the compressibility of each space- reduced file before sending the file data compressed. There are cases when it is better to back up the file uncompressed, especially when there is sparsely-written data associated with the file. By default, the deduplicated size of the file must be 90 percent (or less) than the original size of the file at the time it was deduplicated. This ensures that compressible files are backed up in their compressed format, and non- compressible files are backed up in their un-deduplicated format. (The small amount of space that would be saved on tape does not justify the extra amount it would take to restore the files from backup.) EMC VNXe’s Data Deduplication and Compression – Overview 18
  • 19. When restoring files from a PAX-based NDMP backup, the files are restored in their compressed format. This feature also provides right-sizing capabilities because an administrator can back up the deduplicated files from one storage resource and restore them onto a smaller storage resource. This feature is independent of the backup applications. Point-in-time views of the storage The deduplication process releases space in the production storage immediately. However, blocks may be copied to the protection storage in the process. Deduplicating data associated with a file involves copying the data within the storage resource to the hidden and reduced data store, so that file-level deduplication and compression can be applied to it. Because snapshots use a “copy on first modify” principle, the blocks that get deduplicated are copied to the protection storage first. This is done to preserves the previous snapshot’s point- in-time view of the storage resource. These blocks are freed when the corresponding snapshot gets deleted or refreshed, and are available for reuse by other snapshots. It is difficult to predict the number of blocks that will be copied to the protection storage during deduplication, because this depends on how full the primary storage is and how quickly it is changing. By default, VNXe is configured to abort deduplication operations on a storage resource before it causes the protection storage to extend. This avoids the protection storage expanding due to deduplication activity. If the deduplication process is aborted in this way, an alert is generated that explains what happened. The VNXe administrator can choose to extend the protection storage or simply let the deduplication process execute again on its next scheduled run. Replication Deduplicating the contents of a storage resource before you replicate it can greatly reduce the amount of data that is sent over the network as part of the initial-baseline copy process. The amount of data is transferred over the network depends on the timing of replication updates and when deduplication runs. In all but the most extreme circumstances, replication updates are more frequent than deduplication scans of a storage resource. This means that new and changed data in the storage resource is usually replicated in its non-deduplicated form first. Furthermore, any subsequent deduplication of that data prompts additional replication traffic; this is due to the block changes that were made to the hidden data store of the storage resource. This is true of any deduplication solution that processes replicated data and updates remote replicas of the data, because the replication updates occur more frequently than deduplication scans. The space savings realized by the production storage resource is reflected on the destination storage resource. EMC VNXe’s Data Deduplication and Compression – Overview 19
  • 20. VNXe File-Level Retention You can enable File Deduplication and Compression on enterprise File-Level Retention (FLR-E) storage without compromising on the protection offered to the data that the storage resources contain. File Deduplication and Compression in a VMware Environment File Deduplication and Compression can be used in a VMware environment to improve storage efficiency of the virtual machines (VMs). Active VMs are constantly changing states and are typically bypassed by VNXe’s deduplication policy engine. VMs that are selected for storage optimization are compressed using the same compression algorithm that is used in File Deduplication and Compression. The algorithm compresses the VMDK file (virtual disk) and leaves the other files that make up the VMs intact. When a compressed VMDK is read, only the data blocks containing the actual data are decompressed. Likewise, when data is written to the VMDK, instead of compressing it “on the fly”, data is written to a set-aside buffer, which then gets compressed as a background task. In most situations, the amount of data read or written is a small percentage of the total amount of data stored. Deployment considerations The following need to be considered when deploying File Deduplication and Compression: • File Deduplication and Compression does not deduplicate data across or between storage resources. • With NDMP PAX, backing up a deduplication-enabled storage results in a shorter time window. However, the restore window required increases. File Deduplication and Compression does not process or affect alternate data streams (also known as named attributes) associated with files and directories in the storage resource. • File Deduplication and Compression does not process files less than 24 KB in size. The overhead associated with processing such files negates any space savings achieved. • There is no limit to the size of a storage resource on which File Deduplication and Compression can be enabled. However, the storage resource must have at least 1 MB of free space before deduplication can be enabled. If there is not enough free space, an error message is generated and the server log is updated. • You can enable File Deduplication and Compression on any number of storage resources. • You can enable File Deduplication and Compression on existing Shared-Folder and VMware NFS-datastore storage. This feature cannot be applied to iSCSI storage hosted on the VNXe system. EMC VNXe’s Data Deduplication and Compression – Overview 20
  • 21. Conclusion The File Deduplication and Compression feature adds to VNXe’s impressive storage efficiency capabilities by intelligently reducing space usage and providing further storage efficiency. You can optimize the use of the existing storage environment for storage resource data through file- level deduplication and compression. A combination of file-level deduplication and compression represents the best technique to provide maximum benefit for the resources consumed. This feature builds on VNXe’s ease of use by providing a single-click option in Unisphere to enable deduplication for each storage. File Deduplication and Compression also supports almost all Unisphere features and works in accordance with maximum-supported VNXe limits. References The following whitepapers can be found on the VNXe page on Powerlink: • EMC VNXe Series Storage Systems – A Detailed Review • EMC VNXe Data Protection – Overview EMC VNXe’s Data Deduplication and Compression – Overview 21