SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Open Source Data Deduplication




                        Nick Webb
nickw@redwireservices.com www.redwireservices.com @RedWireServices
                          (206) 829-8621
                          Last updated 8/10/2011
Introduction
●   What is Deduplication? Different kinds?
●   Why do you want it?
●   How does it work?
●   Advantages / Drawbacks
●   Commercial Implementations
●   Open Source implementations, performance,
    reliability, and stability of each
What is Data Deduplication
Wikipedia:

. . . data deduplication is a specialized data compression
technique for eliminating coarse-grained redundant data,
typically to improve storage utilization. In the deduplication
process, duplicate data is deleted, leaving only one copy of
the data to be stored, along with references to the unique
copy of data. Deduplication is able to reduce the required
storage capacity since only the unique data is stored.

Depending on the type of deduplication, redundant files may
be reduced, or even portions of files or other data that are
similar can also be removed . . .
Why Dedupe?
●   Save disk space and money (less disks)
●   Less disks = less power, cooling, and space
●   Improve write performance (of duplicate data)
●   Be efficient – don’t re-copy or store previously
    stored data
Where does it Work Well?
●   Secondary Storage
    ●   Backups/Archives
    ●   Online backups with limited
        bandwidth/replication
    ●   Save disk space – additional
        full backups take little space
●   Virtual Machines (Primary &
    Secondary)
●   File Shares
Not a Fit
●   Random data
    ●   Video
    ●   Pictures
    ●   Music
    ●   Encrypted files
         –   many vendors dedupe, then encrypt
Types
●   Source / Target
●   Global
●   Fixed/Sliding Block
●   File Based (SIS)
Drawbacks
●   Slow writes, slower reads
●   High CPU/memory utilization (dedicated server
    is a must)
●   Increases data loss risk / corruption
    ●   Collision risk of 1.3x10^-49% chance per PB
    ●   (256 bit hash & 8KB Blocks)
How Does it Work?
Without Dedupe
With Dedupe
Block Reclamation
 ●   In general, blocks are not
     removed/freed when a file is
     removed
 ●   We must periodically check blocks
     for references, a block with no
     reference can be deleted, freeing
     allocated space
 ●   Process can be expensive,
     scheduled during off-peak
Commercial Implementations
●   Just about every backup vendor
    ●   Symantec, CommVault
    ●   Cloud: Asigra, Baracuda, Dropbox (global), JungleDisk,
        Mozy
●   NAS/SAN/Backup Targets
    ●   NEC HydraStor
    ●   DataDomain/EMC Avamar
    ●   Quantum
    ●   NetApp
Open Source Implementations
●   Fuse Based
    ●   Lessfs
    ●   SDFS (OpenDedupe)
●   Others
    ●   ZFS
    ●   btrfs (? Off-line only)
●   Limited (file based / SIS)
    ●   BackupPC (reliable!)
    ●   Rdiff-backup
How Good is it?
●   Many see 10-20x deduplicaiton meaning 10-20
    times more logical object storage than physical
●   Especially true in backup or virtual
    environments
SDFS / OpenDedupe
                 www.opendedup.org
●   Java 7 Based / platform agnostic
●   Uses fuse
●   S3 storage support
●   Snapshots
●   Inline or batch mode deduplication
●   Supposedly fast (290MBps+ on great H/W)
●   Support for global/clustered dedupe
●   Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install & Go

Install Java
# rpm –Uvh SDFS-1.0.7-2.x86_64.rpm
# sudo mkfs.sdfs --volume-name=sdfs_128k 
     --io-max-file-write-buffers=32 
     --volume-capacity=550GB 
     --io-chunk-size=128 
     --chunk-store-data-location=/mnt/data
# sudo modprobe fuse
# sudo mount.sdfs -v sdfs_128k -m 
     /mnt/dedupe
SDFS
●   Pro
    ●   Works when configured properly
    ●   Appears to be multithreaded
●   Con
    ●   Slow / resource intensive (CPU/Memory)
    ●   Fragile, easy to mess up options, leading to crashes, little
        user feedback
    ●   Standard POSIX utilities do not show accurate data (e.g. df,
        must use getfattr -d <mount point>, and calculate bytes →
        GB/TB and % free yourself)
    ●   Slow with 4k blocks, recommended for VMs
LessFS
                          www.lessfs.com

●   Written in C = Less CPU Overhead
●   Have to build yourself (configure && make && make install)
●   Has replication, encryption
●   Uses fuse
LessFS Install
wget http://...lessfs-1.4.2.tar.gz
tar zxvf *.tar.gz
wget http://...db-4.8.30.tar.gz
yum install buildstuff…
. . .
echo never >
/sys/kernel/mm/redhat_transparent_hugepage/defrag
echo no >
/sys/kernel/mm/redhat_transparent_hugepage/khugep
aged/defrag
LessFS Go
sudo vi /etc/lessfs.cfg
BLOCKDATA_PATH=/mnt/data/dta/blockdata.dta
META_PATH=/mnt/meta/mta
BLKSIZE=4096 # only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off


mklessfs -c /etc/lessfs.cfg
lessfs /etc/lessfs.cfg /mnt/dedupe
LessFS
●   Pro
    ●   Does inline compression by default as well
    ●   Reasonable VM compression with 128k blocks
●   Con
    ●   Fragile
    ●   Stats/FS info hard to see (per file accounting, no totals)
    ●   Kernel >= 2.6.26 required for blocks > 4k (RHEL6 only)
    ●   Running with 4k blocks is not really feasible
LessFS
Other OSS
●   ZFS?
    ●   Tried it, and empirically it was a drag, but I have no
        hard data (got like 3x dedupe with identical full
        backups of VMs)
    ●   At least it’s stable…
Kick the Tires
●   Test data set; ~330GB of data
    ●   22GB of documents, pictures, music
    ●   Virtual Machines
        –   220GB Windows 2003 Server with SQL Data
        –   2003 AD DC ~60GB
        –   2003 Server ~8GB
        –   Two OpenSolaris VMs, 1.5 & 2.7GB
        –   3GB Windows 2000 VM
        –   15GB XP Pro VM
Kick the Tires
●   Test Environment
    ●   AWS High CPU Extra Large Instance
    ●   ~7GB of RAM
    ●   ~Eight Cores ~2.5GHz each
    ●   ext4
Compression Performance
●   First round (all “unique” data)
●   If another copy was put in (like another full), we should expect
    100% reduction for that non-unique data (1x dedupe per run)
      FS              Home   % Home      VM     % VM        Combined % Total     MBps
                      Data   Reduction   Data   Reduction            Reduction
      SDFS 4k         21GB   4.50%       109    64%         128GB    61%         16
                                         GB
      lessfs 4k       24GB   -9%         N/A    51%         N/A      50%         4
      (est.)
      SDFS 128k       21GB   4.50%       255    16%         276GB    15%         40
                                         GB
      lessfs 128k     21GB   4.50%       130    57%         183GB    44%         24
                                         GB
      tar/gz --fast   21GB   4.50%       178    41%         199GB    39%         35
                                         GB
Write Performance
                      (don't trust this)
                                  MBps

40


35


30


25


20                                                                          MBps


15


10


 5


 0
     raw    SDFS 4k   lessfs 4k   SDFS 128k   lessfs 128k   tar/gz --fast
Kick the Tires: Part 2
●   Test data set – two ~204GB full backup
    archives from a popular commercial vendor
●   Test Environment
    ●   VirtualBox VM, 2GB RAM, 2 Cores, 2x7200RPM
        SATA drives (meta & data separated for LessFS)
    ●   Physical CPU: Quad Core Xeon
Write Performance
                                 MBps

40


35


30


25


20                                                                           MBps


15


10


 5


 0
     raw   SDFS 128k W   SDFS 128k Re-W   LessFS 128k W   LessFS 128k Re-W
Load
(SDFS 128k)
Open Source Dedupe
●   Pro
    ●   Free
    ●   Can be stable, if well managed
●   Con
    ●   Not in repos yet
    ●   Efforts behind them seem very limited, 1 dev each
    ●   No/Poor documentation
The Future
●   Eventual Commodity?
●   brtfs
    ●   Dedupe planned (off-line only)
Conclusion/Recommendations
●   Dedupe is great, if it works and it meets your
    performance and storage requirements
●   OSS Dedupe has a way to go
●   SDFS/OpenDedupe is best OSS option right
    now
●   JungleDisk is good and cheap, but not OSS
About Red Wire Services
If you found this presentation helpful, consider
Red Wire Services for your next
Backup, Archive, or IT Disaster Recovery
Planning project.
Learn more at www.RedWireServices.com
About Nick Webb
Nick Webb is the founder of Red Wire Services, in
Seattle, WA. Nick is available to speak on a variety of IT
Disaster Recovery related topics, including:
●   Preserving Your Digital Legacy
●   Getting Started with your Small Business Disaster
    Recovery Plan
●   Archive Storage for SMBs
If interested in having Nick speak to your group, please
call (206) 829-8621 or email info@redwireservices.com

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHanborq Inc.
 
CDW: SAN vs. NAS
CDW: SAN vs. NASCDW: SAN vs. NAS
CDW: SAN vs. NASSpiceworks
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageTony Pearson
 
Network Attached Storage (NAS)
Network Attached Storage (NAS)Network Attached Storage (NAS)
Network Attached Storage (NAS)sandeepgodfather
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashIbm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashAshutosh Mate
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCCeph Community
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 
Quantum NDX - NAS Based Data Protection
Quantum NDX - NAS Based Data Protection Quantum NDX - NAS Based Data Protection
Quantum NDX - NAS Based Data Protection Quantum
 
IBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowIBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowSandeep Patil
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfsRami Jebara
 
Continuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed GapsContinuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed GapsGilHecht
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prepJason Wong
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security Sandeep Patil
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics
 

Was ist angesagt? (18)

Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
CDW: SAN vs. NAS
CDW: SAN vs. NASCDW: SAN vs. NAS
CDW: SAN vs. NAS
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object Storage
 
Network Attached Storage (NAS)
Network Attached Storage (NAS)Network Attached Storage (NAS)
Network Attached Storage (NAS)
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashIbm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ash
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Quantum NDX - NAS Based Data Protection
Quantum NDX - NAS Based Data Protection Quantum NDX - NAS Based Data Protection
Quantum NDX - NAS Based Data Protection
 
IBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowIBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking Flow
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
Continuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed GapsContinuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed Gaps
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prep
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
 
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
 

Andere mochten auch

[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud StoragePasquale Puzio
 
Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication conceptsSaroj Sahu
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring dataSara-Jayne Terp
 
Accurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your BookingsAccurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your BookingsTavisca Solutions
 
Creative portfolio tavisca 2014
Creative portfolio tavisca 2014Creative portfolio tavisca 2014
Creative portfolio tavisca 2014Tavisca Solutions
 
Presentation deduplication backup software and system
Presentation   deduplication backup software and systemPresentation   deduplication backup software and system
Presentation deduplication backup software and systemxKinAnx
 
Secure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudSecure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudnexgentech15
 
Accurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your BookingsAccurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your BookingsTavisca Solutions
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationSWAMI06
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiersLars Marius Garshol
 
Source of Data in Research
Source of Data in ResearchSource of Data in Research
Source of Data in ResearchManu K M
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentalsemcbaltics
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubLucidworks
 

Andere mochten auch (16)

Deduplication
DeduplicationDeduplication
Deduplication
 
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage
 
Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication concepts
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring data
 
Avamar presales 1.0
Avamar presales 1.0Avamar presales 1.0
Avamar presales 1.0
 
Accurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your BookingsAccurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your Bookings
 
Creative portfolio tavisca 2014
Creative portfolio tavisca 2014Creative portfolio tavisca 2014
Creative portfolio tavisca 2014
 
Presentation deduplication backup software and system
Presentation   deduplication backup software and systemPresentation   deduplication backup software and system
Presentation deduplication backup software and system
 
Hotel map process
Hotel map process   Hotel map process
Hotel map process
 
Secure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudSecure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloud
 
Accurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your BookingsAccurate Hotel Mapping to Increase your Bookings
Accurate Hotel Mapping to Increase your Bookings
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiers
 
Source of Data in Research
Source of Data in ResearchSource of Data in Research
Source of Data in Research
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentals
 
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, StubhubDeduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
 

Ähnlich wie Open Source Data Deduplication

VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storageAngelo Rajadurai
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...DataStax Academy
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningFromDual GmbH
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems confluent
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...JAXLondon2014
 
SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonUri Cohen
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient WorkflowsExploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflowsjasonajohnson
 
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016Tomas Vondra
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 

Ähnlich wie Open Source Data Deduplication (20)

VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
 
SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax London
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient WorkflowsExploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflows
 
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
 
Five steps perform_2009 (1)
Five steps perform_2009 (1)Five steps perform_2009 (1)
Five steps perform_2009 (1)
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Open Source Data Deduplication

  • 1. Open Source Data Deduplication Nick Webb nickw@redwireservices.com www.redwireservices.com @RedWireServices (206) 829-8621 Last updated 8/10/2011
  • 2. Introduction ● What is Deduplication? Different kinds? ● Why do you want it? ● How does it work? ● Advantages / Drawbacks ● Commercial Implementations ● Open Source implementations, performance, reliability, and stability of each
  • 3. What is Data Deduplication Wikipedia: . . . data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored, along with references to the unique copy of data. Deduplication is able to reduce the required storage capacity since only the unique data is stored. Depending on the type of deduplication, redundant files may be reduced, or even portions of files or other data that are similar can also be removed . . .
  • 4. Why Dedupe? ● Save disk space and money (less disks) ● Less disks = less power, cooling, and space ● Improve write performance (of duplicate data) ● Be efficient – don’t re-copy or store previously stored data
  • 5. Where does it Work Well? ● Secondary Storage ● Backups/Archives ● Online backups with limited bandwidth/replication ● Save disk space – additional full backups take little space ● Virtual Machines (Primary & Secondary) ● File Shares
  • 6. Not a Fit ● Random data ● Video ● Pictures ● Music ● Encrypted files – many vendors dedupe, then encrypt
  • 7. Types ● Source / Target ● Global ● Fixed/Sliding Block ● File Based (SIS)
  • 8. Drawbacks ● Slow writes, slower reads ● High CPU/memory utilization (dedicated server is a must) ● Increases data loss risk / corruption ● Collision risk of 1.3x10^-49% chance per PB ● (256 bit hash & 8KB Blocks)
  • 9. How Does it Work?
  • 12. Block Reclamation ● In general, blocks are not removed/freed when a file is removed ● We must periodically check blocks for references, a block with no reference can be deleted, freeing allocated space ● Process can be expensive, scheduled during off-peak
  • 13. Commercial Implementations ● Just about every backup vendor ● Symantec, CommVault ● Cloud: Asigra, Baracuda, Dropbox (global), JungleDisk, Mozy ● NAS/SAN/Backup Targets ● NEC HydraStor ● DataDomain/EMC Avamar ● Quantum ● NetApp
  • 14. Open Source Implementations ● Fuse Based ● Lessfs ● SDFS (OpenDedupe) ● Others ● ZFS ● btrfs (? Off-line only) ● Limited (file based / SIS) ● BackupPC (reliable!) ● Rdiff-backup
  • 15. How Good is it? ● Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical ● Especially true in backup or virtual environments
  • 16. SDFS / OpenDedupe www.opendedup.org ● Java 7 Based / platform agnostic ● Uses fuse ● S3 storage support ● Snapshots ● Inline or batch mode deduplication ● Supposedly fast (290MBps+ on great H/W) ● Support for global/clustered dedupe ● Probably most mature OSS Dedupe (IMHO)
  • 17. SDFS
  • 18. SDFS Install & Go Install Java # rpm –Uvh SDFS-1.0.7-2.x86_64.rpm # sudo mkfs.sdfs --volume-name=sdfs_128k --io-max-file-write-buffers=32 --volume-capacity=550GB --io-chunk-size=128 --chunk-store-data-location=/mnt/data # sudo modprobe fuse # sudo mount.sdfs -v sdfs_128k -m /mnt/dedupe
  • 19. SDFS ● Pro ● Works when configured properly ● Appears to be multithreaded ● Con ● Slow / resource intensive (CPU/Memory) ● Fragile, easy to mess up options, leading to crashes, little user feedback ● Standard POSIX utilities do not show accurate data (e.g. df, must use getfattr -d <mount point>, and calculate bytes → GB/TB and % free yourself) ● Slow with 4k blocks, recommended for VMs
  • 20. LessFS www.lessfs.com ● Written in C = Less CPU Overhead ● Have to build yourself (configure && make && make install) ● Has replication, encryption ● Uses fuse
  • 21. LessFS Install wget http://...lessfs-1.4.2.tar.gz tar zxvf *.tar.gz wget http://...db-4.8.30.tar.gz yum install buildstuff… . . . echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugep aged/defrag
  • 22. LessFS Go sudo vi /etc/lessfs.cfg BLOCKDATA_PATH=/mnt/data/dta/blockdata.dta META_PATH=/mnt/meta/mta BLKSIZE=4096 # only 4k supported on centos 5 ENCRYPT_DATA=on ENCRYPT_META=off mklessfs -c /etc/lessfs.cfg lessfs /etc/lessfs.cfg /mnt/dedupe
  • 23. LessFS ● Pro ● Does inline compression by default as well ● Reasonable VM compression with 128k blocks ● Con ● Fragile ● Stats/FS info hard to see (per file accounting, no totals) ● Kernel >= 2.6.26 required for blocks > 4k (RHEL6 only) ● Running with 4k blocks is not really feasible
  • 25. Other OSS ● ZFS? ● Tried it, and empirically it was a drag, but I have no hard data (got like 3x dedupe with identical full backups of VMs) ● At least it’s stable…
  • 26. Kick the Tires ● Test data set; ~330GB of data ● 22GB of documents, pictures, music ● Virtual Machines – 220GB Windows 2003 Server with SQL Data – 2003 AD DC ~60GB – 2003 Server ~8GB – Two OpenSolaris VMs, 1.5 & 2.7GB – 3GB Windows 2000 VM – 15GB XP Pro VM
  • 27. Kick the Tires ● Test Environment ● AWS High CPU Extra Large Instance ● ~7GB of RAM ● ~Eight Cores ~2.5GHz each ● ext4
  • 28. Compression Performance ● First round (all “unique” data) ● If another copy was put in (like another full), we should expect 100% reduction for that non-unique data (1x dedupe per run) FS Home % Home VM % VM Combined % Total MBps Data Reduction Data Reduction Reduction SDFS 4k 21GB 4.50% 109 64% 128GB 61% 16 GB lessfs 4k 24GB -9% N/A 51% N/A 50% 4 (est.) SDFS 128k 21GB 4.50% 255 16% 276GB 15% 40 GB lessfs 128k 21GB 4.50% 130 57% 183GB 44% 24 GB tar/gz --fast 21GB 4.50% 178 41% 199GB 39% 35 GB
  • 29.
  • 30. Write Performance (don't trust this) MBps 40 35 30 25 20 MBps 15 10 5 0 raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k tar/gz --fast
  • 31. Kick the Tires: Part 2 ● Test data set – two ~204GB full backup archives from a popular commercial vendor ● Test Environment ● VirtualBox VM, 2GB RAM, 2 Cores, 2x7200RPM SATA drives (meta & data separated for LessFS) ● Physical CPU: Quad Core Xeon
  • 32. Write Performance MBps 40 35 30 25 20 MBps 15 10 5 0 raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W
  • 33.
  • 35. Open Source Dedupe ● Pro ● Free ● Can be stable, if well managed ● Con ● Not in repos yet ● Efforts behind them seem very limited, 1 dev each ● No/Poor documentation
  • 36. The Future ● Eventual Commodity? ● brtfs ● Dedupe planned (off-line only)
  • 37. Conclusion/Recommendations ● Dedupe is great, if it works and it meets your performance and storage requirements ● OSS Dedupe has a way to go ● SDFS/OpenDedupe is best OSS option right now ● JungleDisk is good and cheap, but not OSS
  • 38. About Red Wire Services If you found this presentation helpful, consider Red Wire Services for your next Backup, Archive, or IT Disaster Recovery Planning project. Learn more at www.RedWireServices.com
  • 39. About Nick Webb Nick Webb is the founder of Red Wire Services, in Seattle, WA. Nick is available to speak on a variety of IT Disaster Recovery related topics, including: ● Preserving Your Digital Legacy ● Getting Started with your Small Business Disaster Recovery Plan ● Archive Storage for SMBs If interested in having Nick speak to your group, please call (206) 829-8621 or email info@redwireservices.com

Hinweis der Redaktion

  1. Different types of deduplication levels:File levelBlock levelVariable block versus fixed block Quantum/DD Variable Blocks
  2. Pretty much the same as all compression