vSAN provides software-defined storage that pools server storage resources and delivers them as a shared datastore for VMs. It integrates deeply with VMware stacks for simplified management and supports a variety of use cases. vSAN leverages new hardware technologies to provide high performance at low cost through space efficiency techniques and storage policies that control availability, capacity reservation, and QoS.
4. HCI fills a growing market gap
Flexible Enterprise Storage
HCI Storage
Traditional Storage
Reliable and Fast …but complex
Cloud Storage
Simple, Flexible …but lacks
Enterprise control
SSD
SSD
SSD
SSD
SSD
SSD
Scale Up
or
Scale Out
5. Powered by VMware vSAN and vSphere
Runs on any standard x86 server
Pools HDD/SSD into a shared datastore
Delivers enterprise-grade scale and performance
Managed through per-VM storage policies
Deeply integrated with the VMware stack
vSphere vSAN
vSAN Datastore
6. Supporting a broad variety of use cases
vSAN
Business Critical Apps Virtual Desktops (VDI)
DR / DA
Test/DevDMZ
ROBOManagement
Staging
7. Storage devices becoming faster with better endurance
$5 $20$1
IOPS
250K
500K
1M
750K
$/Gig
$15$10
Future
SSD is the new Capacity Disk
High capacity NVMe
New Classes of Persistent Memory
Network latency >> Device Latency
Today
Low latency devices too expensive for
persistent storage
Device latency >> Network latency
NVDIMM
DRAM DRAM
8. vSAN with Next-Generation Hardware
Persistence Tier
vSAN
NVMeSSD NVDIMM
vSAN
NVMe
NVDIMM
vSAN leverages next-gen hardware
Re-Platform to deliver the lowest $/Gig and $/IOPS
Caching Tier
NVMe
All Flash up to150K IOPS Per Host
10. vSAN objects and components
stripe-2b
stripe-2a
RAID-0
Mirror Copy
stripe-1b
stripe-1a
RAID-0
witness
Mirror Copy
RAID-1
vSAN is an Object Store
Each Object has multiple Components
This to allow you to meet Availability and
Performance requirements
Data is distributed based on VM Storage
Policy
11. …
Ease of Day 0, 1 and 2 operations
vSphere vSAN vSphere vSAN
Health Checks Performance Monitoring Capacity Reports
vSphere APIs
vSphere UI
Configuration
12. Storage policies
VM / VMDK policy profile
Policy Gold
Availability
Capacity
reservation
IOPS Limits
Value
FTT = 2
Space efficient
40GB
1000
Primary Cluster
vSphere vSAN
Application lifecycle management through policy
Policy
Placement and configuration by
policy
Control of QoS at VM / VMDK
level
Simple, scalable automation
platform
14. Space efficiency
Nearline deduplication and compression per disk group level
– Enabled on a cluster level
– Deduplicated when de-staging from cache tier to capacity tier
– Fixed block length deduplication (4KB Blocks)
– Compression after deduplication
RAID-5 and RAID-6 (Inline Erasure coding)
– RAID-5 needs a 3+1 configuration, but only 33% overhead
– RAID-6 needs a 4+2 configuration, but only a 50% overhead
Beta
15. March 2015
All Flash
64 Node Cluster
2x Hybrid Speed
September 2015
Stretched Cluster
Replication: 5min RPO
2-node for ROBO
March 2016
Deduplication
Compression
Quality of Service
vSAN 6.0 vSAN 6.1 vSAN 6.2
Accelerating Innovation
vSAN 5.5
17. 2-node Direct Connect
vSAN
Datastore
witness
vsan
management witness
Ability to connect the two nodes directly using crossover cables
Separate vSAN data traffic from witness traffic (2 node only)
Two cables between hosts for higher availability of network
Allows for Layer-2 and Layer-3 topologies and strict separation
of traffic streams
18. Provide block storage through vSAN iSCSI
vSAN iSCSI Target Service enables Block Storage!
– Extends Support for Physical Oracle RAC
– Storage for physical workloads
Provides all core vSAN functionality for the iSCSI target
– Dedupe and Compression, RAID-1, RAID-5, RAID-6,
checksums…
iSCSI Target
iSCSI Initiator iSCSI Initiator
iSCSI Network
iSCSI Object
iSCSI Object
vSAN
Datastore
19. Support latest hardware innovations
vSphere vSAN
vSAN Datastore
512e
Support for NVMe for workloads requiring high
performance
512e drive support to enable larger capacity drives
Latest networking technologies like 25/40/100Gbps
Support for workloads that are highly transactional with
low latency using in-memory technologies
20. vSAN for Cloud Native Apps
vSphere Integrated Containers
• Run containerized storage in production
• Native vSphere container data volumes support
• Leverage existing vSphere/vSAN features
vSAN for Photon Platform
• Non-vSphere SDDC stack for rapidly deploying and managing
containerized applications at scale
• Support for container volumes shared among a cluster of hosts
• Developer-friendly APIs for storage provisioning and consumption
• IT-friendly APIs/GUI for infrastructure management and operation
vSphere Docker Volume Driver
• Test/Dev containerized storage vSphere Docker Volume Driver
• Works on existing vSphere deployments and datastores
• Download: https://github.com/vmware/docker-volume-vsphere
Photon Controller
vSAN for VMware Photon
vSphere Integrated Containers
docker swarm
vSphere vSANCluster
vSphere vSAN
This graph shows the projected decline of traditional storage (red area) and how it is replaced by server storage in the enterprise (blue) but also by cloud-based server-based storage offerings (you know, the EBS and S3s of the world).
It is an indicator of a fundamental shift that is going on in the IT industry.
So, where is all that data going?
business-critical applications,
end user computing (VDI),
disaster recovery,
remote office/branch office (ROBO) …
One growing area is around CNA, tie in with the DevOps movement.
Until recently, capacity of flash storage was more expensive than that of spinning disks. Even though their performance ($/IOPS) has been very cost effective, the industry was still using hybrid architectures with caching/tiering to get the best of both worlds: cheap performance with capacity costs close to that of HDDs.
High performing devices like PCIe and NVMe have been even more costly. But things are changing rapidly. The capacities of flash devices are growing and at the same time the capacity cost is going down. Performance is also going up rapidly thanks to more scalable hardware and the efficiency of NVMe, a protocol that has been designed with low latency devices in mind.
So, the entire curve of capacity cost vs. performance is shifting to the left and upper part of the graph, with forthcoming non-volatile memory products occupying the high cost / super high performance part of the market.
But there is another change that is fundamental for the design of storage products: latencies….
What do all these changes mean for vSAN.
Fast, high endurance devices are used to create a distributed, persistent write-back cache. The main goal is absorb the bulk of the write rate from apps and thus allow the use of endurance, low cost SSDs for the capacity tier. The capacity tier also serves the bulk of reads.
The rules of the game are changing:
as new 3D X-Point devices are emerging, capacities are increasing and endurance (amount of data that can be written a day) is becoming a non-issue. Low cost NVMe drives are beginning to ship.
Non-volatile Memory devices with very low latencies (but still expensive) Are coming to market.
Thanks to vSAN’s flexible tiered architecture, we can quickly adapt to the changes in storage technology.
So, where is all that data going?
RAID-0 and RAID-1 were the only distributed RAID options up to and including version 6.1.
IN 6.2 RAID 5 and 6 options were added.
Today, vSAN as part of vSphere offers a range of tools to give to the vSphere admin the capability to manage their infrastructure in a unified way.
Also, vSAN and HCI management flows are an inherent part of the vSphere UI, as well as integrated with other VMware products like vSphere Operations and LogInsight.
Infrastructure is only one part of the scalable management quest.
The other is Application lifecycle management. Move from dev, to testing, to QA to production – apply new policy to change performance and availability profile.
Today, Policy-based automation has completely revolutionized how customers manage their applications. No need for manual LUN provisioning and management. The admin specifies in terms of descriptive policies what they need for their apps, not how to do it. The platform, vSAN for example, automatically enforces them, monitors and performs automatic remediation when needed.
Much like how NSX policies allow automation of networking management, vSAN uses policies to simplify powerful management of storage.
Stretched storage with vSAN will allow you to split the vSAN cluster across 2 sites, so that if a site fails, you would be able to seamlessly failover to the other site without any loss of data. vSAN in a stretched storage deployment will accomplish this by synchronously mirroring data across the 2 sites. The failover will be initiated by a witness VM that resides in a central place, accessible by both sites.
All Flash Only.
“High level description”
Dedupe and compression happens during destaging from the caching tier to the capacity tier. You enable it on a cluster level and deduplication/compression happens on a per disk group basis. Bigger disk groups will result in a higher deduplication ratio. After the blocks are deduplicated they will be compressed. This results in up to 7x space reduction, of course fully dependent on the workload and type of VMs.
“Lower level description”
Compression (LZ4) would be performed during destaging from the caching tier to the capacity tier. 4KB is the block size for deduplication. For each unique 4k block compression would be performed and if the output block size is less than or equal to 2KB, a compressed block would be saved in place of the 4K block. If the output block size is greater than 2KB, the block would be written uncompressed and tracked as such. The reason is to avoid block alignment issues, as well as reduce the CPU hit for decompressing the data which is greater than compression for data with low compression ratios. All of this data reduction is after the write acknowledgement.
Deduplication domains are within each disk group. This avoids needing a global lookup table (significant resource overhead), and allows us to put those resources towards tracking a smaller and more meaningful block size. We purposefully avoid dedupe of “write hot data” In the cache, or decompressing uncompressible data significant CPU/memory resources can avoid being wasted.
RAID 5/6
Sometimes RAID 5 and RAID 6 over the network is also referred as erasure coding. This is done inline; there is no post-processing required.
Since VMware has a design goal of not relying on data locality, this implementation of erasure coding does not bring any negative results by distributing the RAID-5/6 stripe across multiple hosts.
In this case RAID-5 requires 4 hosts at a minimum as it uses a 3+1 logic. With 4 hosts 1 can fail without data loss. This results in a significant reduction of required disk capacity. Normally a 20GB disk would require 40GB of disk capacity, but in the case of RAID-5 over the network the requirement is only ~27GB. There is another option if higher availability is desired
Use case Information:
Erasure codes offer “guaranteed capacity reduction unlike deduplication and compression. For customers who have “no thin provisioning policies” have data that is already compressed and deduplicated or have encrypted data this offers “known/fixed” capacity gains.
Note: Feature is supported with stretch clusters, ROBO edition
Stretched Cluster for 0-minute RPO and near-continuous protection
For non-Stretched Cluster environments, lowered the recovery point objective (RPO) with vSphere replication on vSAN to only 5 minutes.
In addition, key management features were added included the integrated Health Service solution with new monitoring capabilities.
vSAN 6.2 represents another major advance.
Space saving features like deduplication, compression and erasure coding we added.
In addition, new Quality of Service controls help guarantee SLAs.
The 3 C’s this brings…
1. Cost. Allows customers with old slow switching (Sometimes 100Mbps!) to use 10Gbps for vMotion and vSAN using a pair of low cost Cat6 or TwinAX cables.
2. Complexity. No need for multicast configuration, and VLAN’s for vMotion and vSAN.
3. Compliance. Large enterprises have concerns with exposing storage networks even over encrypted VPN or private MPLS networks. By separating out the Witness traffic from the vSAN data traffic this reduces the footprint for remotely accessing data. Note this function can be used even without cross over.
We made a decision to use some of the FreeBSD implementations publicly available under BSD-style license, using direct integration with vmkernel rather that a dedicated virtual appliance.
Support for SCSI version 3 and above
Support CHAP and mutual CHAP
Multi-pathing support (MPIO)
All hosts are participating in iSCSI.
vSAN will continue integrating with the latest enhancements in hardware for drives, memory, controllers and servers.
vSAN was the first HCI vendor to support NVMe and we plan to take advantage of the latest hardware innovations in Next Gen NVRAM, NVDIMM and faster networking.
512e Drives will allow support 1.8TB 10K RPM drives for Hybrid vSAN.
Storage industry is hitting capacity limit with 512N sector size used currently in rotating storage media. To address this issue, storage industry has proposed new Advanced Format drives which use 4K native sector size. These AF drives allows disk drive vendors to build high capacity drives and also provide better performance, efficient space utilization and above all, improved reliability and error correction capability.
To understand the difference between Photon and VIC: http://cormachogan.com/2016/06/28/compare-contrast-photon-controller-vs-vic-vsphere-integrated-containers/
KEY MESSAGES / TALK TRACK:
These new features help simplify your tasks today but we are also looking ahead to make sure we create a platform that simplifies future IT demands
To that end, we are excited to provide a glimpse into where we want to take vSAN
In the future we look forward to complementing the larger VMware Photon investment into Cloud Native Apps with vSAN for VMware Photon
We are designing what we think will be the best storage for DevOps
What It Is
Persistent storage for cloud native apps and next gen apps provided by vSAN with all existing features
Tightly integrated with cluster managers (Mesos, Kubernetes, Swarm, etc)
Managed solely via REST APIs and accessible to both IT admins and developers
SPBM policies exposed as disk flavors in Photon platform
Why It Matters
End-to-end VMware: Tightly integrated solution for cloud native applications
DevOps Focus: Promotes the developer to first class user in the datacenter and in control of storage provisioning activities (DevOps)
Agile Storage Operations: Enables agility and scalability of storage operations with an API-friendly management model
Stretched storage with vSAN will allow you to split the vSAN cluster across 2 sites, so that if a site fails, you would be able to seamlessly failover to the other site without any loss of data. vSAN in a stretched storage deployment will accomplish this by synchronously mirror data across the 2 sites. The failover will be initiated by a witness VM that resides in a central place, accessible by both sites.
Bandwidth to witness is 10Mbps, or 2MB per 1000 components (worse case scenario - very little traffic is observed during steady state, but we need to calculate for owner migration, or site failure)
Stretched storage with vSAN will allow you to split the vSAN cluster across 2 sites, so that if a site fails, you would be able to seamlessly failover to the other site without any loss of data. vSAN in a stretched storage deployment will accomplish this by synchronously mirror data across the 2 sites. The failover will be initiated by a witness VM that resides in a central place, accessible by both sites.
Note that currently the VR limit is only 2000 protected / replicated vm’s compared to vSAN’s 6400 vm limit so you do need to think about that.!
ROBO – Remote Office/Branch Office
Many customers have been asking for 2-Node vSAN support. This will be a special use case for stretched clusters with 2-Node in the ROBO and a very tiny witness VM in the central data center.
Resource requirements for the witness (Tiny config):
Memory: 8 GB
CPU: 2vCPU
Storage: 8GB for boot disk, 15GB for capacity & 10GB for catching tier (Both cache and capacity tier are VMDKs created on HDD, there is no physical flash needed)
Larger configs are needed to support more witness components/virtual machines.
All Flash Only.
“High level description”
Dedupe and compression happens during destaging from the caching tier to the capacity tier. You on a cluster level and deduplication/compression happens on a per disk group basis. Bigger disk groups will result in a higher deduplication ratio. After the blocks are deduplicated they will be compressed. A significant saving already, combined with deduplication and the results achieved can be up to 7x space reduction, of course fully dependent on the workload and type of VMs.
“Lower level description”
Compression (LZ4) would be performed during destaging from the caching tier to the capacity tier. 4KB is the block size for deduplication. For each unique 4k block compression would be performed and if the output block size is less than or equal to 2KB, a compressed block would be saved in place of the 4K block. If the output block size is greater than 2KB, the block would be written uncompressed and tracked as such. The reason is to avoid block alignment issues, as well as reduce the CPU hit for decompressing the data which is greater than compression for data with low compression ratios. All of this data reduction is after the write acknowledgement.
Deduplication domains are within each disk group. This avoids needing a global lookup table (significant resource overhead), and allows us to put those resources towards tracking a smaller and more meaningful block size. We purposefully avoid dedupe of “write hot data” In the cache, or decompressing uncompressible data significant CPU/memory resources can avoid being wasted.
Note: Feature is supported with stretch clusters, ROBO edition
Sometimes RAID 5 and RAID 6 over the network is also referred as erasure coding. This is done inline; there is no post-processing required.
Since VMware has a design goal of not relying on data locality, this implementation of erasure coding does not bring any negative results by distributing the RAID-5/6 stripe across multiple hosts.
In this case RAID-5 requires 4 hosts at a minimum as it uses a 3+1 logic. With 4 hosts 1 can fail without data loss. This results in a significant reduction of required disk capacity. Normally a 20GB disk would require 40GB of disk capacity, but in the case of RAID-5 over the network the requirement is only ~27GB. There is another option if higher availability is desired
Use case Information:
Erasure codes offer “guaranteed capacity reduction unlike deduplication and compression. For customers who have “no thin provisioning policies” have data that is already compressed and deduplicated or have encrypted data this offers “known/fixed” capacity gains.
This can be applied on a granular basis (Per VMDK) using the Storage Policy Based Management system.
30% Savings.
Note: All Flash vSAN only.
Note: Not supported with stretched clusters
Note: this does not require the cluster size be a multiple of 4, just 4or more.
With RAID-6 two host failures can be tolerated, similar to FTT=2 using RAID-1.
In the traditional scenario for a 20GB disk the required disk capacity would be 60GB, but with RAID-6 over the network this is just 30GB.
Note that the parity is distributed across all hosts and there is no dedicated parity host or anything like that.
Since VMware has a design goal of not relying on data locality, this implementation of erasure coding does not bring any negative results by distributing the RAID-5/6 stripe across multiple hosts.
Again, this is sometimes by others referred to as erasure coding. In this case a 4+2 configuration is used, which means that 6 hosts is the minimum to be able to use this configuration.
Use case Information:
Erasure codes offer “guaranteed capacity reduction unlike deduplication and compression. For customers who have “no thin provisioning policies” have data that is already compressed and deduplicated or have encrypted data this offers “known/fixed” capacity gains.
This can be applied on a granular basis (Per VMDK) using the Storage Policy Based Management system.
50% savings
Note: All Flash vSAN only
Note: this does not require the cluster size be a multiple of six, just six or more.
Not supported with stretched clusters
Cluster wide setting (Default is on). Can be disabled on a per object basis using storage policies.
Software checksum will enable customers to detect the corruptions that could be caused by hardware/software components including memory, drives, etc during the read or write operations. In case of drives, there are two basic kinds of corruption. The first is “latent sector errors”, which are typically the result of a physical disk drive malfunction. The other type is silent corruption, which can happen without warning (These are typically called silent data corruption). Undetected or completely silent errors could lead to lost or inaccurate data and significant downtime. There is no effective means of detection without end-to-end integrity checking.
During the read/write operations vSAN will check for the validity of the data based on checksum. If the data is not valid then it should take the necessary steps to either correct the data or report it to the user to take action. These actions could be:
Fetch the data from other copy of the data for RAID1, RAID5/6, etc.
This is what we call recoverable data.
If there is no valid copy of the data the error SHALL be returned
This is what we call Non-recoverable errors
Reporting:
In case of errors the issues will be reported in the UI and logs. This will include impacted blocks and their associated VMs.
A customer will be able to see the list of the VMs/Blocks that are hit by non-recoverable errors.
A customer will be able to see the historical/trending errors on each drive
CRC32 is the algorithm used (CPU offload support reduces overhead)
There will be two level of scrubbing:
Component level scrubbing: every block of each component is checked. If checksum mismatch, the scrubber tries to repair the block by reading other components.
Object level scrubbing: for every block of the object, data of each mirror (or the parity blocks in RAID-5/6) is read and checked. For inconsistent data, mark all data in this stripe as bad.
Repair can happen during normal I/O at DOM Owner or by scrubber.
The repair path for mirror and RAID-5/6 are different. When checksum verification fails, the scrubber or DOM Owner will read the other copy of the data (or other data in the same stripe in case of RAID-5/6), rebuild the correct data and write it out to the bad location.
End-to-end checksum of the data to prevent data integrity issues that could be caused by silent disk errors ( checksum is calculated and stored on the write path )
Detect silent corruptions when reading the data through checksum data
When checksum verification fails, vSAN will read the other copy of the data (or other data in the same stripe in case of RAID-5/6), rebuild the correct data and write it out to the bad location
It is based on 4K block size
This is Enables per VMDK IOP Limits. They can be deployed by SPBM, tying them to existing policy frameworks.
Service providers can use this to create differentiated service offerings using the same cluster/pool of storage.
Customers wanting to mix diverse workloads will be interested in being able to keeping workloads from impacting each other.
This is by default normalized down at a 32KB block size, so a 500 IOP limit will result in only 250 64KB blocks, while you would expect only 500 IOPS of 8/16/32KB blocks.
vSAN can support IPv4-only, IPv6-only, and also IPv4/IPv6-both enabled.