SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Deduplication Best Practices
With Microsoft Windows Server 2012
  and Veeam Backup & Replication 6.5
Joep Piscaer, VMware vExpert, VCDX #101
   j.piscaer@virtuallifestyle.nl
   @jpiscaer
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
Introduction
 Joep Piscaer
   ● Consulting Architect at OGD ict-diensten
   ● VMware VCDX5 #101, vExpert 2009, 2011, 2012
   ● Know Veeam since 2007 and in love with them ever since
     (best. VMworld. parties. ever.)
Past Projects
 Past implementations of Veeam B&R
   ● Commonly see a VMware virtualization layer with Windows VMs on top
   ● Windows Server 2008 R2 as the base for Veeam implementation
   ● Starting to work with Windows Server 2012


 Notable projects include
   ● Bi-directional DR for 200-250 VMs with 2 infrastructures
   ● 150+ VM backup and replication within a single large datacenter
   ● Application consistent backups of Zarafa Collaboration Platform without
     bringing database down (or any other downtime)
   ● Numerous smaller projects for DR or backup at customer sites
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
What is deduplication?
What is deduplication?
 Deduplication identifies
  identical data blocks in
  source VMs and stores
  each unique block only
  once
Use case – why use deduplication?
Use case – why use deduplication?
 41,9% of CIOs
  affected by
  cost of storage
  required for
  storing backups
Use case – why use deduplication?
 Lack of disk space is a growing
  issue preventing adoption
  of replication.

 22,6% of CIOs reported lack of
  disk space as an issue preventing
  adoption of replication in 2011;
  35,5% report the same in 2013.
Use case – why use deduplication?
 Conclusions of Virtualization Data Protection Report 2013
   ● Decrease cost of storage
   ● Increase usage of existing storage
     Make backups as storage-friendly as possible


 To keep management cost down, don‟t introduce new or
  complex solutions
   ● Use „what we already know‟ and „what we already have‟ to solve these
     challenges
   ● Prevent re-training of administrators
   ● Increase ease of use of solution
Use case – why use deduplication?
 Why deduplication instead of feature x or technology y?
   ●   Veeam captures entire VM including Guest OS, applications and data
   ●   VMs usually share Guest OS type, middleware, etc.
   ●   Hence: there tends to be a lot of identical data across VMs
   ●   Deduplication slashes identical data


 But mix and match solutions if available
   ● Don‟t forget compression
   ● Excluded unneeded data (VM and Guest OS swap files, etc)
Use case – why use deduplication?
 Backup files can be separated into two categories:
   ● Recent backups for fast restore and recoverability
   ● Older backups for archival purposes


 With Veeam, these two types are „connected‟
   ● The backup file chain contains „recent‟ and „old‟ backups
   ● Nearly impossible to separate on the storage layer
Why use Windows Server 2012 deduplication?
 The killer use case for using Data Deduplication in
  Windows Server 2012 is longer-term (>60 day) retention
  or archival of VMs on the same on-site storage platform.
  Other use cases include off-site replication
  and increased performance of forward incremental jobs

 For 30-60 day retention, consider
  reverse incremental backup mode
  which offers similar storage efficiency.
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
Planning for disk-based deduplication
 Determine the amount of data being backed up
 Determine the data retention policy for each dataset
   ● Determine if and when to use an archival system
 Determine the biggest full uncompressed backup file size
   ● Windows Server 2012 Data Deduplication is post-process (and not in-line)
   ● So you need at least this amount of free storage space to complete the job
 Determine the daily change rate
   ● Determine if Data Deduplication can handle that amount of data
   ● MSFT recommends to design for ~100GB/hr deduplication data processing
Planning for disk-based deduplication
 Estimate dedupe space savings on existing dataset
   ● Use „Data Deduplication Savings Evaluation Tool‟ (DDPEval.exe)
   ● Installed in C:WindowsSystem32 on 2012 host with Deduplication role
     But can be copied to any system running 2012, 2008 R2 and 7
Planning for disk-based deduplication
 Estimated sizing:
   ● Applicable to environments with less
     than 1.5T of daily data ingestion daily (10.5T weekly)


 Limits
   ● Large full backups ( > 1.5T) require lengthy dedupe process
   ● Consider running active fulls monthly to split ingestion of large amounts of
     data
Live Demo
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
Tying it together with Veeam
 Veeam Backup & Replication 6.5 include full support for
  Microsoft Windows Server 2012 including ReFS volumes
  and global data deduplication
 But support for Storage Spaces is experimental.

 To restore files from deduplicated volumes, backup server
  must be installed on Windows Server 2012 with Data
  Deduplication feature enabled in the OS settings.
Backup Repository settings
 Deduplicating storage compatibility
   ● Align backup file data blocks
   ● Decompress backup data before storing
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
Backup Job settings
 Backup Mode
   ● Incremental
   ● Use „dedupe-friendly‟ compression
   ● Set storage optimization to „local target‟


 In my experience, enabling
  Veeam dedupe and compression
  significantly speeds up job
  processing and do not interfere
  with Windows deduplication
Backup Job settings
 Why forward incremental?
   ● Less ingestion of new data on deduplicating repository
   ● Reversed incremental requires 3x IOps
     Deduplicating repository is usually I/O bound
Backup Job settings
 Synthetic or periodic active full?
  No difference after dedupe but synthetic prevents „full‟ job to hit
  source. Synthetic fulls requires data re-hydration while last full is
  read from disk and is building the new synthetic full.
  Try synthetic fulls first and monitor performance (time needed to
  build new synthetic full).

 Synthetic full only takes up space before deduplication process
  kicks in; after dedupe, synthetic full is deduped againts previous
  synthetic full
  You want the previous synthetic full saved in deduped state the
  moment you create a new synthetic full, so make sure to keep
  enough restore points on disk.
Backup Job settings
 Why enable inline data deduplication?
   ● Source-based deduplication
      − Takes place at the backup source instead of target
   ● Decreases the amount of data sent over the network
      − Amount of network traffic is a consideration, too
   ● Speeds up job processing significantly
   ● Uses a large block size so doesn‟t interfere with Data Deduplication

   ● If we disable inline data deduplication:
     “The amount of data on disk remained the same, but the amount
     transferred across the wire is 40x more.”
Backup Job settings
 Why enable compression?
  ●   Source-based compression
  ●   Decreases the amount of data sent over the network
  ●   Speeds up job processing significantly
  ●   Dedupe-friendly compression uses a fixed dictionary and doesn‟t interfere
      with Data Deduplication

  ● Dedupe-friendly compression saves about 10-20% on the initial VBK/VIB
    file size. About the same (20-30%) is lost at Data Deduplication stage.
    Significantly faster restores and slightly faster instant recovery.

  ● “Effectively you are trading some hard disk space overall (because of less
    dedupe) for some up front network and disk bandwidth savings.”
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
Global Data Deduplication
 Deduplicate data across backup files produced by different
  backup jobs
   ● Veeam uses per-job deduplication, but space savings isn‟t the only
     consideration when choosing which VMs go into a job.
   ● Microsoft deduplicates on a per-volume basis
     Multiple backup files / jobs stored on a single volume
 One less variable to consider when planning jobs
   ● Makes „more small jobs‟ a viable solution
   ● Focus more on more functional separation of VMs in jobs
      − Separate per (multi-tier) application (vCenter Intenvory VM Folders)
      − Replicate off-site using Cloud Edition
Agenda
   Introduction
   Use case for deduplication
   Configure Data Deduplication in Windows Server 2012
   How to optimize backup repository for deduplication
   How to optimize backup jobs for deduplication
   How to leverage per-pool deduplication
   Demo
Live Demo
Demo – Dedupe: Jobs vs. Volumes
 3 VM‟s in 3 separate backup jobs
   ● 24,2 GB before dedupe, 13,3 GB after dedupe
 Copy a large file (~8GB) to all three VMs
 Run backup jobs (regular incremental run)
   ● 45,9 GB before dedupe, 19,5GB after dedupe
 Run deduplication process on repository
   ● Run dedup with „Start-DedupJob -type Optimization -volume E:‟
   ● Check status running dedupe job with „Get-DedupJob‟
   ● Check space savings after dedupe job with „Get-Dedupstatus‟
 Source VM’s occupy 65 GB on source datastore
   ● That’s a 3,33:1 deduplication rate!
Demo – Instant VM Restore
 Re-hydration of deduplicated backup files takes time and
  performance will suffer.
 Most recent backups typically not deduplicated yet
  because Data Deduplication is post-process, so instant
  recovery using vPower NFS won‟t suffer in performance
 Don‟t set Data Deduplication schedule to dedupe (most)
  recent backup files as these are used for restores in 99%
  of the cases. Set data aging to > 1-2 days.
 Weigh the benefits of disk savings against increased RTO.
Q&A




      Joep Piscaer
      VMware vExpert, VCDX #101

      j.piscaer@virtuallifestyle.nl
      @jpiscaer

Weitere ähnliche Inhalte

Was ist angesagt?

Role of Ecotourism in Sustainable Development
Role of Ecotourism in Sustainable DevelopmentRole of Ecotourism in Sustainable Development
Role of Ecotourism in Sustainable Developmentkabitakrmandal
 
Change Data Capture
Change Data CaptureChange Data Capture
Change Data Capturearnoud_otte
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overviewsameiralk
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...inwin stack
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Containerd + buildkit breakout
Containerd + buildkit breakoutContainerd + buildkit breakout
Containerd + buildkit breakoutDocker, Inc.
 
Advantage & Disadvantage of MySQL
Advantage & Disadvantage of MySQLAdvantage & Disadvantage of MySQL
Advantage & Disadvantage of MySQLKentAnderson43
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with DremioDimitarMitov4
 
Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)
Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)
Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)Chris Richardson
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"Sushant Choudhary
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable영원 서
 
Barcelona Strategic Tourism Plan for 2020
Barcelona Strategic Tourism Plan for 2020 Barcelona Strategic Tourism Plan for 2020
Barcelona Strategic Tourism Plan for 2020 Ajuntament de Barcelona
 

Was ist angesagt? (20)

Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Role of Ecotourism in Sustainable Development
Role of Ecotourism in Sustainable DevelopmentRole of Ecotourism in Sustainable Development
Role of Ecotourism in Sustainable Development
 
Change Data Capture
Change Data CaptureChange Data Capture
Change Data Capture
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overview
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Containerd + buildkit breakout
Containerd + buildkit breakoutContainerd + buildkit breakout
Containerd + buildkit breakout
 
Construindo um data lake na nuvem aws
Construindo um data lake na nuvem awsConstruindo um data lake na nuvem aws
Construindo um data lake na nuvem aws
 
Advantage & Disadvantage of MySQL
Advantage & Disadvantage of MySQLAdvantage & Disadvantage of MySQL
Advantage & Disadvantage of MySQL
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with Dremio
 
Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)
Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)
Deploying Spring Boot applications with Docker (east bay cloud meetup dec 2014)
 
Introdução ao Hive
Introdução ao HiveIntrodução ao Hive
Introdução ao Hive
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
 
Barcelona Strategic Tourism Plan for 2020
Barcelona Strategic Tourism Plan for 2020 Barcelona Strategic Tourism Plan for 2020
Barcelona Strategic Tourism Plan for 2020
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 

Andere mochten auch

Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication conceptsSaroj Sahu
 
47 restore scenarios from Veeam Backup & Replication v8
47 restore scenarios from Veeam Backup & Replication v847 restore scenarios from Veeam Backup & Replication v8
47 restore scenarios from Veeam Backup & Replication v8Veeam Software
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Agile meetup - user story mapping workshop
Agile meetup - user story mapping workshopAgile meetup - user story mapping workshop
Agile meetup - user story mapping workshopJen-Chieh Ko
 
Veeam Backup and Replication: Overview
Veeam  Backup and Replication: OverviewVeeam  Backup and Replication: Overview
Veeam Backup and Replication: OverviewDudley Smith
 
Veean Backup & Replication
Veean Backup & ReplicationVeean Backup & Replication
Veean Backup & ReplicationArnaud PAIN
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Capgemini
 
Veeam back up and replication presentation
Veeam back up and replication presentation Veeam back up and replication presentation
Veeam back up and replication presentation BlueChipICT
 
Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)Data Science Thailand
 
Veeam Backup & Replication Tips and Tricks
Veeam Backup & Replication Tips and TricksVeeam Backup & Replication Tips and Tricks
Veeam Backup & Replication Tips and TricksVeeam Software
 
Veeam Backup & Replication v8 for VMware — General Overview
Veeam Backup & Replication v8 for VMware — General OverviewVeeam Backup & Replication v8 for VMware — General Overview
Veeam Backup & Replication v8 for VMware — General OverviewVeeam Software
 
VMware vSphere technical presentation
VMware vSphere technical presentationVMware vSphere technical presentation
VMware vSphere technical presentationaleyeldean
 
52 tools for any company to innovate like a Startup /by @nickdemey @boardofinno
52 tools for any company to innovate like a Startup /by @nickdemey @boardofinno52 tools for any company to innovate like a Startup /by @nickdemey @boardofinno
52 tools for any company to innovate like a Startup /by @nickdemey @boardofinnoBoard of Innovation
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentalsemcbaltics
 
Virtualization 101: Everything You Need To Know To Get Started With VMware
Virtualization 101: Everything You Need To Know To Get Started With VMwareVirtualization 101: Everything You Need To Know To Get Started With VMware
Virtualization 101: Everything You Need To Know To Get Started With VMwareDatapath Consulting
 

Andere mochten auch (18)

Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication concepts
 
47 restore scenarios from Veeam Backup & Replication v8
47 restore scenarios from Veeam Backup & Replication v847 restore scenarios from Veeam Backup & Replication v8
47 restore scenarios from Veeam Backup & Replication v8
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Veeam backup and_replication_whats_new_in_v7
Veeam backup and_replication_whats_new_in_v7Veeam backup and_replication_whats_new_in_v7
Veeam backup and_replication_whats_new_in_v7
 
Advanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITIAdvanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITI
 
Agile meetup - user story mapping workshop
Agile meetup - user story mapping workshopAgile meetup - user story mapping workshop
Agile meetup - user story mapping workshop
 
Veeam Backup and Replication: Overview
Veeam  Backup and Replication: OverviewVeeam  Backup and Replication: Overview
Veeam Backup and Replication: Overview
 
Veean Backup & Replication
Veean Backup & ReplicationVeean Backup & Replication
Veean Backup & Replication
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
 
Veeam back up and replication presentation
Veeam back up and replication presentation Veeam back up and replication presentation
Veeam back up and replication presentation
 
Veeam Backup Essentials v9 Overview
Veeam Backup Essentials v9 OverviewVeeam Backup Essentials v9 Overview
Veeam Backup Essentials v9 Overview
 
Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)
 
Veeam Backup & Replication Tips and Tricks
Veeam Backup & Replication Tips and TricksVeeam Backup & Replication Tips and Tricks
Veeam Backup & Replication Tips and Tricks
 
Veeam Backup & Replication v8 for VMware — General Overview
Veeam Backup & Replication v8 for VMware — General OverviewVeeam Backup & Replication v8 for VMware — General Overview
Veeam Backup & Replication v8 for VMware — General Overview
 
VMware vSphere technical presentation
VMware vSphere technical presentationVMware vSphere technical presentation
VMware vSphere technical presentation
 
52 tools for any company to innovate like a Startup /by @nickdemey @boardofinno
52 tools for any company to innovate like a Startup /by @nickdemey @boardofinno52 tools for any company to innovate like a Startup /by @nickdemey @boardofinno
52 tools for any company to innovate like a Startup /by @nickdemey @boardofinno
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentals
 
Virtualization 101: Everything You Need To Know To Get Started With VMware
Virtualization 101: Everything You Need To Know To Get Started With VMwareVirtualization 101: Everything You Need To Know To Get Started With VMware
Virtualization 101: Everything You Need To Know To Get Started With VMware
 

Ähnlich wie Veeam webinar - Deduplication best practices

Reduce time to complete backups and restores with Transparent Snapshots with ...
Reduce time to complete backups and restores with Transparent Snapshots with ...Reduce time to complete backups and restores with Transparent Snapshots with ...
Reduce time to complete backups and restores with Transparent Snapshots with ...Principled Technologies
 
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager ApplianceBack up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager AppliancePrincipled Technologies
 
WSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsWSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsAccenture
 
Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...
Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...
Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...Principled Technologies
 
Scaling AEM (CQ5) Gem Session
Scaling AEM (CQ5) Gem SessionScaling AEM (CQ5) Gem Session
Scaling AEM (CQ5) Gem SessionMichael Marth
 
8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutions8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutionsServium
 
Veeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DRVeeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DRJoep Piscaer
 
Key Considerations For Deduplication In The Enterprise
Key Considerations For Deduplication In The EnterpriseKey Considerations For Deduplication In The Enterprise
Key Considerations For Deduplication In The EnterpriseQuantum
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overviewsolarisyougood
 
The Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup ExperiencesThe Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup Experiencesglbsolutions
 
2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.pptnadirpervez2
 
Backup exec 2015 end user presentation
Backup exec 2015 end user presentationBackup exec 2015 end user presentation
Backup exec 2015 end user presentationTania Macarlupú
 
Save time completing backups and restores with a Dell PowerProtect Data Manag...
Save time completing backups and restores with a Dell PowerProtect Data Manag...Save time completing backups and restores with a Dell PowerProtect Data Manag...
Save time completing backups and restores with a Dell PowerProtect Data Manag...Principled Technologies
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11CloudExpoEurope
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2Tony Pearson
 
The Best Storage For V Mware Environments Customer Presentation Jul201
The Best Storage For V Mware Environments Customer Presentation Jul201The Best Storage For V Mware Environments Customer Presentation Jul201
The Best Storage For V Mware Environments Customer Presentation Jul201Michael Hudak
 
Intro to Azure SQL database
Intro to Azure SQL databaseIntro to Azure SQL database
Intro to Azure SQL databaseSteve Knutson
 

Ähnlich wie Veeam webinar - Deduplication best practices (20)

Reduce time to complete backups and restores with Transparent Snapshots with ...
Reduce time to complete backups and restores with Transparent Snapshots with ...Reduce time to complete backups and restores with Transparent Snapshots with ...
Reduce time to complete backups and restores with Transparent Snapshots with ...
 
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager ApplianceBack up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
 
WSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsWSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutions
 
Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...
Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...
Upgrading to Windows Server 2019 on Dell EMC PowerEdge servers: A simple proc...
 
Scaling AEM (CQ5) Gem Session
Scaling AEM (CQ5) Gem SessionScaling AEM (CQ5) Gem Session
Scaling AEM (CQ5) Gem Session
 
8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutions8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutions
 
Veeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DRVeeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DR
 
Scaling CQ5
Scaling CQ5Scaling CQ5
Scaling CQ5
 
Key Considerations For Deduplication In The Enterprise
Key Considerations For Deduplication In The EnterpriseKey Considerations For Deduplication In The Enterprise
Key Considerations For Deduplication In The Enterprise
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overview
 
The Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup ExperiencesThe Pensions Trust - VM Backup Experiences
The Pensions Trust - VM Backup Experiences
 
2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt
 
Backup exec 2015 end user presentation
Backup exec 2015 end user presentationBackup exec 2015 end user presentation
Backup exec 2015 end user presentation
 
Save time completing backups and restores with a Dell PowerProtect Data Manag...
Save time completing backups and restores with a Dell PowerProtect Data Manag...Save time completing backups and restores with a Dell PowerProtect Data Manag...
Save time completing backups and restores with a Dell PowerProtect Data Manag...
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2
 
The Best Storage For V Mware Environments Customer Presentation Jul201
The Best Storage For V Mware Environments Customer Presentation Jul201The Best Storage For V Mware Environments Customer Presentation Jul201
The Best Storage For V Mware Environments Customer Presentation Jul201
 
Intro to Azure SQL database
Intro to Azure SQL databaseIntro to Azure SQL database
Intro to Azure SQL database
 

Veeam webinar - Deduplication best practices

  • 1. Deduplication Best Practices With Microsoft Windows Server 2012 and Veeam Backup & Replication 6.5 Joep Piscaer, VMware vExpert, VCDX #101 j.piscaer@virtuallifestyle.nl @jpiscaer
  • 2. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 3. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 4. Introduction  Joep Piscaer ● Consulting Architect at OGD ict-diensten ● VMware VCDX5 #101, vExpert 2009, 2011, 2012 ● Know Veeam since 2007 and in love with them ever since (best. VMworld. parties. ever.)
  • 5. Past Projects  Past implementations of Veeam B&R ● Commonly see a VMware virtualization layer with Windows VMs on top ● Windows Server 2008 R2 as the base for Veeam implementation ● Starting to work with Windows Server 2012  Notable projects include ● Bi-directional DR for 200-250 VMs with 2 infrastructures ● 150+ VM backup and replication within a single large datacenter ● Application consistent backups of Zarafa Collaboration Platform without bringing database down (or any other downtime) ● Numerous smaller projects for DR or backup at customer sites
  • 6. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 8. What is deduplication?  Deduplication identifies identical data blocks in source VMs and stores each unique block only once
  • 9. Use case – why use deduplication?
  • 10. Use case – why use deduplication?  41,9% of CIOs affected by cost of storage required for storing backups
  • 11. Use case – why use deduplication?  Lack of disk space is a growing issue preventing adoption of replication.  22,6% of CIOs reported lack of disk space as an issue preventing adoption of replication in 2011; 35,5% report the same in 2013.
  • 12. Use case – why use deduplication?  Conclusions of Virtualization Data Protection Report 2013 ● Decrease cost of storage ● Increase usage of existing storage Make backups as storage-friendly as possible  To keep management cost down, don‟t introduce new or complex solutions ● Use „what we already know‟ and „what we already have‟ to solve these challenges ● Prevent re-training of administrators ● Increase ease of use of solution
  • 13. Use case – why use deduplication?  Why deduplication instead of feature x or technology y? ● Veeam captures entire VM including Guest OS, applications and data ● VMs usually share Guest OS type, middleware, etc. ● Hence: there tends to be a lot of identical data across VMs ● Deduplication slashes identical data  But mix and match solutions if available ● Don‟t forget compression ● Excluded unneeded data (VM and Guest OS swap files, etc)
  • 14. Use case – why use deduplication?  Backup files can be separated into two categories: ● Recent backups for fast restore and recoverability ● Older backups for archival purposes  With Veeam, these two types are „connected‟ ● The backup file chain contains „recent‟ and „old‟ backups ● Nearly impossible to separate on the storage layer
  • 15. Why use Windows Server 2012 deduplication?  The killer use case for using Data Deduplication in Windows Server 2012 is longer-term (>60 day) retention or archival of VMs on the same on-site storage platform. Other use cases include off-site replication and increased performance of forward incremental jobs  For 30-60 day retention, consider reverse incremental backup mode which offers similar storage efficiency.
  • 16. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 17. Planning for disk-based deduplication  Determine the amount of data being backed up  Determine the data retention policy for each dataset ● Determine if and when to use an archival system  Determine the biggest full uncompressed backup file size ● Windows Server 2012 Data Deduplication is post-process (and not in-line) ● So you need at least this amount of free storage space to complete the job  Determine the daily change rate ● Determine if Data Deduplication can handle that amount of data ● MSFT recommends to design for ~100GB/hr deduplication data processing
  • 18. Planning for disk-based deduplication  Estimate dedupe space savings on existing dataset ● Use „Data Deduplication Savings Evaluation Tool‟ (DDPEval.exe) ● Installed in C:WindowsSystem32 on 2012 host with Deduplication role But can be copied to any system running 2012, 2008 R2 and 7
  • 19. Planning for disk-based deduplication  Estimated sizing: ● Applicable to environments with less than 1.5T of daily data ingestion daily (10.5T weekly)  Limits ● Large full backups ( > 1.5T) require lengthy dedupe process ● Consider running active fulls monthly to split ingestion of large amounts of data
  • 21. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 22. Tying it together with Veeam  Veeam Backup & Replication 6.5 include full support for Microsoft Windows Server 2012 including ReFS volumes and global data deduplication  But support for Storage Spaces is experimental.  To restore files from deduplicated volumes, backup server must be installed on Windows Server 2012 with Data Deduplication feature enabled in the OS settings.
  • 23. Backup Repository settings  Deduplicating storage compatibility ● Align backup file data blocks ● Decompress backup data before storing
  • 24. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 25. Backup Job settings  Backup Mode ● Incremental ● Use „dedupe-friendly‟ compression ● Set storage optimization to „local target‟  In my experience, enabling Veeam dedupe and compression significantly speeds up job processing and do not interfere with Windows deduplication
  • 26. Backup Job settings  Why forward incremental? ● Less ingestion of new data on deduplicating repository ● Reversed incremental requires 3x IOps Deduplicating repository is usually I/O bound
  • 27. Backup Job settings  Synthetic or periodic active full? No difference after dedupe but synthetic prevents „full‟ job to hit source. Synthetic fulls requires data re-hydration while last full is read from disk and is building the new synthetic full. Try synthetic fulls first and monitor performance (time needed to build new synthetic full).  Synthetic full only takes up space before deduplication process kicks in; after dedupe, synthetic full is deduped againts previous synthetic full You want the previous synthetic full saved in deduped state the moment you create a new synthetic full, so make sure to keep enough restore points on disk.
  • 28. Backup Job settings  Why enable inline data deduplication? ● Source-based deduplication − Takes place at the backup source instead of target ● Decreases the amount of data sent over the network − Amount of network traffic is a consideration, too ● Speeds up job processing significantly ● Uses a large block size so doesn‟t interfere with Data Deduplication ● If we disable inline data deduplication: “The amount of data on disk remained the same, but the amount transferred across the wire is 40x more.”
  • 29. Backup Job settings  Why enable compression? ● Source-based compression ● Decreases the amount of data sent over the network ● Speeds up job processing significantly ● Dedupe-friendly compression uses a fixed dictionary and doesn‟t interfere with Data Deduplication ● Dedupe-friendly compression saves about 10-20% on the initial VBK/VIB file size. About the same (20-30%) is lost at Data Deduplication stage. Significantly faster restores and slightly faster instant recovery. ● “Effectively you are trading some hard disk space overall (because of less dedupe) for some up front network and disk bandwidth savings.”
  • 30. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 31. Global Data Deduplication  Deduplicate data across backup files produced by different backup jobs ● Veeam uses per-job deduplication, but space savings isn‟t the only consideration when choosing which VMs go into a job. ● Microsoft deduplicates on a per-volume basis Multiple backup files / jobs stored on a single volume  One less variable to consider when planning jobs ● Makes „more small jobs‟ a viable solution ● Focus more on more functional separation of VMs in jobs − Separate per (multi-tier) application (vCenter Intenvory VM Folders) − Replicate off-site using Cloud Edition
  • 32. Agenda  Introduction  Use case for deduplication  Configure Data Deduplication in Windows Server 2012  How to optimize backup repository for deduplication  How to optimize backup jobs for deduplication  How to leverage per-pool deduplication  Demo
  • 34. Demo – Dedupe: Jobs vs. Volumes  3 VM‟s in 3 separate backup jobs ● 24,2 GB before dedupe, 13,3 GB after dedupe  Copy a large file (~8GB) to all three VMs  Run backup jobs (regular incremental run) ● 45,9 GB before dedupe, 19,5GB after dedupe  Run deduplication process on repository ● Run dedup with „Start-DedupJob -type Optimization -volume E:‟ ● Check status running dedupe job with „Get-DedupJob‟ ● Check space savings after dedupe job with „Get-Dedupstatus‟  Source VM’s occupy 65 GB on source datastore ● That’s a 3,33:1 deduplication rate!
  • 35. Demo – Instant VM Restore  Re-hydration of deduplicated backup files takes time and performance will suffer.  Most recent backups typically not deduplicated yet because Data Deduplication is post-process, so instant recovery using vPower NFS won‟t suffer in performance  Don‟t set Data Deduplication schedule to dedupe (most) recent backup files as these are used for restores in 99% of the cases. Set data aging to > 1-2 days.  Weigh the benefits of disk savings against increased RTO.
  • 36. Q&A Joep Piscaer VMware vExpert, VCDX #101 j.piscaer@virtuallifestyle.nl @jpiscaer

Hinweis der Redaktion

  1. 30112012
  2. http://forums.veeam.com/viewtopic.php?f=2&t=14910Note that I'm not saying this isn't cool, improving the storage efficiency for forward incremental using Windows 2012 dedupe is awesome, especially for the use cases where it makes sense (retention longer than 60 days, staging to tape or offsite, faster incremental performance, etc.), but just pointing out the fact that reverse incremental (a feature of Veeam since V1) offers similar storage efficiency for retention periods of 30-60 days and environments of this size. I'd love if you would continue to share your results as your retention builds as I'm continuing to collect both lab and real world results.
  3. http://forums.veeam.com/viewtopic.php?f=2&t=14002Determine if and when to use an archival system since online disk-based systems are great for shorter-term retention but might not be for archival and compliancy purposes
  4. http://technet.microsoft.com/en-us/library/hh831602.aspxhttp://technet.microsoft.com/en-us/library/hh831700.aspx
  5. http://forums.veeam.com/viewtopic.php?f=2&t=14910
  6. 01 AddRole02 Create NTFS volumewithdeduplicationenabled.Let op: werkt alleen op NTFS en niet op ReFSLet uit: waarom files olderthan ‘x’: kans op restoren dergelijke oude files niet aannemelijk.Leg uit: default dedupschedule en backupwindows bijten elkaar normaal gesproken; pas schedule aan. Backupserver doet overdag niet veel, dus juist dan dedup laten lopen03 & 04 simplededupexample without Veeam
  7. http://forums.veeam.com/viewtopic.php?f=2&t=15766About the repository configurations, is it the same machine as also proxy role? In that case you do not need to enable those configurations, they are used to reduce at the minimum data travelling via network from proxy to repository, and let nonetheless reporitory expand again data before writing them to the storage, this is pretty useful to save bandwidth while shipping data to a dedupe appliance with a linux/windows "head" in front of it. If it's the same server, there is really no need to compress and decompress data inside the same VM.http://www.veeam.com/blog/how-to-get-unbelievable-deduplication-results-with-windows-server-2012-and-veeam-backup-replication.htmlThe Align Backup file data blocks setting is recommnded for Dedupe appliances however Windows Server 2012 dedupe is not an appliance. It is volume based and uses software to break the data into chunks and store them in a chunk store. In my experience I have not seen a benefit from using this setting.
  8. Previous chain will only be removed by retention when the latest incremental backup in that chain is no longer needed for restorehttp://jpaul.me/?p=1729
  9. http://forums.veeam.com/viewtopic.php?f=2&t=14002&start=15#p73149“I would expect it to be very similar to other dedupe appliances. Typically "dedupe friendly" compression provides only a 10-20% reduction in the initial size of the VBK and VIB files, while costing roughly that same amount dedupe savings, perhaps slightly more. Saving 10-20% may not sound like much, however, for customers backing up 10's or 100's of TB, this can be a significant savings in network bandwidth and it also generally makes for faster restores, and sometimes even slightly faster instant recovery since 10-20% less data must be read from the backup repository, so it can be a reasonable compromise. Effectively you are trading some hard disk space overall (because of less dedupe) for some up front network and disk bandwidth savings. If you're happy with your current performance and want to maximize dedupe, I would leave compression disabled.”http://forums.veeam.com/viewtopic.php?f=2&t=8916&start=30#p61601To be completely fair, I took significant artistic license with this example, it's not 100% technically accurate, but the goal was to outline the concept, and show why Veeam dedupe does not interfere with the dedupe on the appliance, although it will slightly decrease the reported dedupe ratio from the appliance perspective since we obviously reduce the amount of data sent in the first place. In the example above, the dedupe appliance would likely report a 6:1 dedupe ratio of Veeam dedupe was disabled, but a 4:1 if Veeam dedupe was enabled, because we eliminated to data before it got to the appliance. The final amount of data on storage would be exactly the same.If you really want to use compression when writing to a dedupe appliance, using "Low" compression is probably the best bet. This compressing uses a fixed dictionary and is somewhat predictable. It will still lower dedupe, in testing by about 20-30% or so, but it will provide some reduced data going to the dedupe appliance which can make backups faster.http://forums.veeam.com/viewtopic.php?f=2&t=15166Typical compression using the "dedupe-friendly" method can be 10-20%, significantly less than the 50-75% (or more) compression available from the other algorithms, however, this will have some negative impact on dedupe, normally reducing it's effectiveness by a similar ratio.
  10. Benadrukdatdedupe op storagelaageengrote impact heeft op hoe je de jobs ‘organiseert’, waardoor de jobs veelflexibelerworden en gemakkelijker in tezetten en tewijzigen.Stel je voordat je een VM in eenandere job plaatst; zonderdedupkost je dat storage (vanwegeretentie in oude job); met dedupkost je datgeen storage.
  11. TomSightler