SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Sheepdog Overview
Liu Yuan
2013.4.27
Sheepdog – Distributed Object Storage
● Replicated shared storage for VM
● Most intelligent storage in OSS
– Self-healing
– Self-managing
– No configuration file
– One-liner setup
● Scale-out (more than 1000+ nodes)
● Integrate well in QEMU/Libvirt/Openstack
Agenda
● Background Knowledge
● Node management
● Data management
● Thin-provisioning
● Sheepfs
● Features from the future
Background Knowledge
● VM Storage stack
● QEMU/KVM stack
● Virtual Disk
● IO Requests Type
● Write Cache
● QEMU Snapshot
VM Storage Stack
Guset File System
Guset Block Driver
QEMU Image Format
QEMU Disk Emulation
QEMU Format Protocol
POSIX file, Raw device, Sheepdog, Ceph
Sheepdog block driver in
QEMU is implemented at
protocol layer
● Support all the formats of
QEMU
● Raw format as default
● Best performance
● Snapshot is supported by
the Sheepdog protocol
QEMU/KVM Stack
VCPU VCPU
Kernel
VM
PCPU PCPU
VM_ENTRY
IO Requests
KVM
eventfd
Virtual
Disk
VM_EXIT
Sheepdog
QEMU
Network
Virtual Disk
● Transports
– ATA, SCSI, Virtio
– Virtio – Designed for VM
● Simpler interface, better performance
● Virtio-scsi
– Enhancement of virtio-blk
– Advanced DISCARD operation supports
● Write-cache
– Essential for distributed backend storage to boost
performance
IO Requests Type of VD
● Read/Write
● Discard
– VM's FS (EXT4, XFS) transparently inform
underlying storage backend to release blocks
● FLUSH
– Assure dirty bits reach the underlying backend storage
● Write Cache Enable (WCE)
– VM uses it to change the VD cache mode on the fly
Write Cache
● Not a memory cache like page cache
– DirectIO(O_DIRECT) bypass page cache but not
bypass write cache
– O_SYNC or fsync(2) flush write cache
● All modern disks have it and have well-support
from OS
● Most virtual devices emulate write cache
– As safe as well-behaved hard-disk cache
QEMU Snapshot
● Two type of states
– Memory state (VM state) and disk state
● Users can optional save
– VM state only
– VM state + disk state
– Disk state only
● Internal snapshot & external snapshot
– Sheepdog choose external snapshot
Node management
● Node Add/Delete
● Dual NIC
Node Add/Delete
● One-liner to add or delete node
– Add node
● $ sheep /store # use corosync or
● $ sheep /store -c zookeeper:IP
– Delete node
● $ kill sheep
– Support group add/kill
● Rely on Corosync or Zookeeper
– Membership change events
– Cluster-wide ordered message
Pic. from http://www.osrg.net/sheepdog/
Dual NIC
● One for control messages(heart-beat), the
other for data transfer
– If data NIC is down, data transfer will fallback on
control NIC
– But if control NIC is down, the node is considered as
dead
● Single NIC
– Control and data will share it
Data Management
● Object Management
● VM Requst Management
● Auto-weighting
● Multi-disk
● Object Cache
● Journaling
Object Management
● Data are stored as replicated objects
– Object is plain fix-sized POSIX file
● objects are auto-rebalanced at node
add/delete/crash events
● Replica are auto-recovered
● Different copy number for each VDI
● Support SAN-like or SAN-less or even mixed
architeture
Pic. from http://www.osrg.net/sheepdog/
VM Requst Management
● Parallel requests handling
– Every node can handle the requests concurrently
● Serve the requests even in the node change
events
– VM requests are prioritized againt replica recovery
requests
– VM requests will retry until it succeeds at node
change events
Auto-weighting
● Node storage is auto-weighted
– Different sized nodes will only store its proportional
share
● Use consistent hashing + virtual node
● Users can specify exported space
– Use all the free space as default
Multi-disk
● Single deamon manage multi-disks
– $ sheep /disk1,/disk2{,disk3...}
– Auto-weighting
– Auto-rebalance
– Recover objects from other Sheep
● Simply put, MD = raid0 + auto-recovery
● Eliminate need of hardware RAID
– Support hot-plug/unplug
Object cache
● Sheepdog's write cache of Virtual Disk
– $ sheep -w size=100G /store
● $ qemu -drive cache={writeback|writethrough|off}
– Support writeback, writethrough, directio
– LRU algorithm for reclaiming
– Share objects between the VM from the same base
● Use SSD for object cache to get a boost
Object cache
Virtual Disk
Object Cache
R&W
FLUSH
VM
PUSH & PULL
Sheepdog Cluster
Journaling
● $ sheep -j dir=/path/to/journal /store
● Sheepdog use O_SYNC write as default
● Object writes are fairly random
● Log all the write opertions as append write on
the rotated log file
– Transform random write into sequential write
– Objects write can then drop O_SYNC
● Boost performance + avoid partial write
Thin-provisioning
● Sparse Volume
● Discard Operation
● COW Snapshot
Sparse Volume
● Only allocate one inode object for new VDI as
default
– Instant creation of new VDI
● Create data objects on demand
● Users can preallocate data objects
– Not recommended, performance gain is very
limited
Discard operation
● Release objects when users delete files inside
VM
● Only support IDE and virtio-scsi device
– CentOS 6.3+
– OS running vanilla kernel 3.4+
– We need QEMU 1.5+
Snapshot
● Live snapshot (VM state + vdisk)
– Save the snapshot in the sheepdog
● QEMU monitor > savevm tag
– Restore the snapshot on the fly
● QEMU monitor > loadvm tag
– Restore the snapshot at boot
● $ qemu -hda sheepdog -loadvm tag
● Live or off-line snapshot (vdisk only)
– $ qemu-img snapshot sheepdog:disk
Snapshot cont.
● Tree structure snapshots
base
● Rollback to any snapshot and make your branch
Snapshot cont.
● All snapshots are COW based
– Only create inode object for the snapshot
– Instantly taken
● Support incremental snapshot backup
● Read the snapshot out of cluster
– $ collie vdi read -s tag disk
● Snapshots are stored in the Sheepdog storage
so shared by all the nodes
Sheepfs
● FUSE-based pseudo file system to export
Sheepdog's virtual disks
– $ sheepfs /mountpoint
● Mount vdisk into local file system hierarchy as
a block file
– $ echo vdisk > /mountpoint/vdi/mount
– Then /mountpoint/volume/vdisk will show up
Features from the future
● Cluster-wide snapshot
– Useful for backup and inter-cluster VDI-
migration/sharing
– Dedup, compression, incremental snapshot
● QEMU-SD connection auto-restart
– Useful for upgrading sheep without stopping the VM
● QEMU-SD multi-connection
– Higher Availibility VM
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Stephen Gordon
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red_Hat_Storage
 
Managing ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderManaging ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderMaor Lipchuk
 
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan HoracekOpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan HoracekNETWAYS
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebula Project
 
Ovirt and gluster_hyperconvergence_devconf-2016
Ovirt and gluster_hyperconvergence_devconf-2016Ovirt and gluster_hyperconvergence_devconf-2016
Ovirt and gluster_hyperconvergence_devconf-2016Ramesh Nachimuthu
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networkingSim Janghoon
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Community
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDPGConf APAC
 
Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)Stephen Gordon
 
OSv – The OS designed for the Cloud
OSv – The OS designed for the CloudOSv – The OS designed for the Cloud
OSv – The OS designed for the CloudYandex
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance NetApp
 
oVirt 3.5 Storage Features Overview
oVirt 3.5 Storage Features OverviewoVirt 3.5 Storage Features Overview
oVirt 3.5 Storage Features OverviewAllon Mureinik
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency CephShapeBlue
 
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixXPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPUTomer Gabel
 
Disaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDDisaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDViswesuwara Nathan
 

Was ist angesagt? (20)

Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016
 
Managing ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderManaging ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_Cinder
 
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan HoracekOpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 
Ovirt and gluster_hyperconvergence_devconf-2016
Ovirt and gluster_hyperconvergence_devconf-2016Ovirt and gluster_hyperconvergence_devconf-2016
Ovirt and gluster_hyperconvergence_devconf-2016
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networking
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XID
 
Kvm optimizations
Kvm optimizationsKvm optimizations
Kvm optimizations
 
Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)
 
OSv – The OS designed for the Cloud
OSv – The OS designed for the CloudOSv – The OS designed for the Cloud
OSv – The OS designed for the Cloud
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance
 
oVirt 3.5 Storage Features Overview
oVirt 3.5 Storage Features OverviewoVirt 3.5 Storage Features Overview
oVirt 3.5 Storage Features Overview
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixXPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
 
Disaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDDisaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBD
 

Andere mochten auch

Sheepdog内部实现机制
Sheepdog内部实现机制Sheepdog内部实现机制
Sheepdog内部实现机制Liu Yuan
 
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)Masahiro Tsuji
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報Emma Haruka Iwao
 
Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)JungIn Jung
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
Ceph アーキテクチャ概説
Ceph アーキテクチャ概説Ceph アーキテクチャ概説
Ceph アーキテクチャ概説Emma Haruka Iwao
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 

Andere mochten auch (8)

Sheepdog内部实现机制
Sheepdog内部实现机制Sheepdog内部实现机制
Sheepdog内部实现机制
 
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報
 
Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Ceph アーキテクチャ概説
Ceph アーキテクチャ概説Ceph アーキテクチャ概説
Ceph アーキテクチャ概説
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 

Ähnlich wie Overview of sheepdog

Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012Lance Albertson
 
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebulaOpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebulaOpenNebula Project
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingMarian Marinov
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler Peeyush Gupta
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovOpenVZ
 
Malware analysis
Malware analysisMalware analysis
Malware analysisxabean
 
KVM tools and enterprise usage
KVM tools and enterprise usageKVM tools and enterprise usage
KVM tools and enterprise usagevincentvdk
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMdata://disrupted®
 
Using CloudStack With Clustered LVM
Using CloudStack With Clustered LVMUsing CloudStack With Clustered LVM
Using CloudStack With Clustered LVMMarcus L Sorensen
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Backing up thousands of containers
Backing up thousands of containersBacking up thousands of containers
Backing up thousands of containersMarian Marinov
 
Resource Monitoring and management
Resource Monitoring and management  Resource Monitoring and management
Resource Monitoring and management Duressa Teshome
 
High-performance high-availability Plone
High-performance high-availability PloneHigh-performance high-availability Plone
High-performance high-availability PloneGuido Stevens
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMDevOps.com
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM ShapeBlue
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
 
OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101Trinath Somanchi
 

Ähnlich wie Overview of sheepdog (20)

Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
 
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebulaOpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
OpenStack Cinder
OpenStack CinderOpenStack Cinder
OpenStack Cinder
 
Cinder havana-131111230629-phpapp02
Cinder havana-131111230629-phpapp02Cinder havana-131111230629-phpapp02
Cinder havana-131111230629-phpapp02
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
KVM tools and enterprise usage
KVM tools and enterprise usageKVM tools and enterprise usage
KVM tools and enterprise usage
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 
Using CloudStack With Clustered LVM
Using CloudStack With Clustered LVMUsing CloudStack With Clustered LVM
Using CloudStack With Clustered LVM
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Backing up thousands of containers
Backing up thousands of containersBacking up thousands of containers
Backing up thousands of containers
 
Resource Monitoring and management
Resource Monitoring and management  Resource Monitoring and management
Resource Monitoring and management
 
High-performance high-availability Plone
High-performance high-availability PloneHigh-performance high-availability Plone
High-performance high-availability Plone
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101OpenStack DRaaS - Freezer - 101
OpenStack DRaaS - Freezer - 101
 

Kürzlich hochgeladen

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Kürzlich hochgeladen (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Overview of sheepdog

  • 2. Sheepdog – Distributed Object Storage ● Replicated shared storage for VM ● Most intelligent storage in OSS – Self-healing – Self-managing – No configuration file – One-liner setup ● Scale-out (more than 1000+ nodes) ● Integrate well in QEMU/Libvirt/Openstack
  • 3. Agenda ● Background Knowledge ● Node management ● Data management ● Thin-provisioning ● Sheepfs ● Features from the future
  • 4. Background Knowledge ● VM Storage stack ● QEMU/KVM stack ● Virtual Disk ● IO Requests Type ● Write Cache ● QEMU Snapshot
  • 5. VM Storage Stack Guset File System Guset Block Driver QEMU Image Format QEMU Disk Emulation QEMU Format Protocol POSIX file, Raw device, Sheepdog, Ceph Sheepdog block driver in QEMU is implemented at protocol layer ● Support all the formats of QEMU ● Raw format as default ● Best performance ● Snapshot is supported by the Sheepdog protocol
  • 6. QEMU/KVM Stack VCPU VCPU Kernel VM PCPU PCPU VM_ENTRY IO Requests KVM eventfd Virtual Disk VM_EXIT Sheepdog QEMU Network
  • 7. Virtual Disk ● Transports – ATA, SCSI, Virtio – Virtio – Designed for VM ● Simpler interface, better performance ● Virtio-scsi – Enhancement of virtio-blk – Advanced DISCARD operation supports ● Write-cache – Essential for distributed backend storage to boost performance
  • 8. IO Requests Type of VD ● Read/Write ● Discard – VM's FS (EXT4, XFS) transparently inform underlying storage backend to release blocks ● FLUSH – Assure dirty bits reach the underlying backend storage ● Write Cache Enable (WCE) – VM uses it to change the VD cache mode on the fly
  • 9. Write Cache ● Not a memory cache like page cache – DirectIO(O_DIRECT) bypass page cache but not bypass write cache – O_SYNC or fsync(2) flush write cache ● All modern disks have it and have well-support from OS ● Most virtual devices emulate write cache – As safe as well-behaved hard-disk cache
  • 10. QEMU Snapshot ● Two type of states – Memory state (VM state) and disk state ● Users can optional save – VM state only – VM state + disk state – Disk state only ● Internal snapshot & external snapshot – Sheepdog choose external snapshot
  • 11. Node management ● Node Add/Delete ● Dual NIC
  • 12. Node Add/Delete ● One-liner to add or delete node – Add node ● $ sheep /store # use corosync or ● $ sheep /store -c zookeeper:IP – Delete node ● $ kill sheep – Support group add/kill ● Rely on Corosync or Zookeeper – Membership change events – Cluster-wide ordered message
  • 14. Dual NIC ● One for control messages(heart-beat), the other for data transfer – If data NIC is down, data transfer will fallback on control NIC – But if control NIC is down, the node is considered as dead ● Single NIC – Control and data will share it
  • 15. Data Management ● Object Management ● VM Requst Management ● Auto-weighting ● Multi-disk ● Object Cache ● Journaling
  • 16. Object Management ● Data are stored as replicated objects – Object is plain fix-sized POSIX file ● objects are auto-rebalanced at node add/delete/crash events ● Replica are auto-recovered ● Different copy number for each VDI ● Support SAN-like or SAN-less or even mixed architeture
  • 18. VM Requst Management ● Parallel requests handling – Every node can handle the requests concurrently ● Serve the requests even in the node change events – VM requests are prioritized againt replica recovery requests – VM requests will retry until it succeeds at node change events
  • 19. Auto-weighting ● Node storage is auto-weighted – Different sized nodes will only store its proportional share ● Use consistent hashing + virtual node ● Users can specify exported space – Use all the free space as default
  • 20. Multi-disk ● Single deamon manage multi-disks – $ sheep /disk1,/disk2{,disk3...} – Auto-weighting – Auto-rebalance – Recover objects from other Sheep ● Simply put, MD = raid0 + auto-recovery ● Eliminate need of hardware RAID – Support hot-plug/unplug
  • 21. Object cache ● Sheepdog's write cache of Virtual Disk – $ sheep -w size=100G /store ● $ qemu -drive cache={writeback|writethrough|off} – Support writeback, writethrough, directio – LRU algorithm for reclaiming – Share objects between the VM from the same base ● Use SSD for object cache to get a boost
  • 22. Object cache Virtual Disk Object Cache R&W FLUSH VM PUSH & PULL Sheepdog Cluster
  • 23. Journaling ● $ sheep -j dir=/path/to/journal /store ● Sheepdog use O_SYNC write as default ● Object writes are fairly random ● Log all the write opertions as append write on the rotated log file – Transform random write into sequential write – Objects write can then drop O_SYNC ● Boost performance + avoid partial write
  • 24. Thin-provisioning ● Sparse Volume ● Discard Operation ● COW Snapshot
  • 25. Sparse Volume ● Only allocate one inode object for new VDI as default – Instant creation of new VDI ● Create data objects on demand ● Users can preallocate data objects – Not recommended, performance gain is very limited
  • 26. Discard operation ● Release objects when users delete files inside VM ● Only support IDE and virtio-scsi device – CentOS 6.3+ – OS running vanilla kernel 3.4+ – We need QEMU 1.5+
  • 27. Snapshot ● Live snapshot (VM state + vdisk) – Save the snapshot in the sheepdog ● QEMU monitor > savevm tag – Restore the snapshot on the fly ● QEMU monitor > loadvm tag – Restore the snapshot at boot ● $ qemu -hda sheepdog -loadvm tag ● Live or off-line snapshot (vdisk only) – $ qemu-img snapshot sheepdog:disk
  • 28. Snapshot cont. ● Tree structure snapshots base ● Rollback to any snapshot and make your branch
  • 29. Snapshot cont. ● All snapshots are COW based – Only create inode object for the snapshot – Instantly taken ● Support incremental snapshot backup ● Read the snapshot out of cluster – $ collie vdi read -s tag disk ● Snapshots are stored in the Sheepdog storage so shared by all the nodes
  • 30. Sheepfs ● FUSE-based pseudo file system to export Sheepdog's virtual disks – $ sheepfs /mountpoint ● Mount vdisk into local file system hierarchy as a block file – $ echo vdisk > /mountpoint/vdi/mount – Then /mountpoint/volume/vdisk will show up
  • 31. Features from the future ● Cluster-wide snapshot – Useful for backup and inter-cluster VDI- migration/sharing – Dedup, compression, incremental snapshot ● QEMU-SD connection auto-restart – Useful for upgrading sheep without stopping the VM ● QEMU-SD multi-connection – Higher Availibility VM