SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Petabyte Scale Out Rocks
CEPH as replacement for Openstack's Swift & Co.
Udo Seidel
22JAN2014
Agenda
● Background
● Architecture of CEPH
● CEPH Storage
● Openstack
● Dream team: CEPH and Openstack
● Summary
22JAN2014
Background
22JAN2014
CEPH – what?
● So-called parallel distributed cluster file system
● Started as part of PhD studies at UCSC
●
Public announcement in 2006 at 7th
OSDI
● File system shipped with Linux kernel since
2.6.34
● Name derived from pet octopus - Cephalopods
22JAN2014
Shared file systems – short intro
● Multiple server access the same data
● Different approaches
● Network based, e.g. NFS, CIFS
● Clustered
– Shared disk, e.g. CXFS, CFS, GFS(2), OCFS2
– Distributed parallel, e.g. Lustre .. GlusterFS and CEPHFS
22JAN2014
Architecture of CEPH
22JAN2014
CEPH and storage
● Distributed file system => distributed storage
● Does not use traditional disks or RAID arrays
● Does use so-called OSD’s
– Object based Storage Devices
– Intelligent disks
22JAN2014
Object Based Storage I
● Objects of quite general nature
● Files
● Partitions
● ID for each storage object
● Separation of meta data operation and storing
file data
● HA not covered at all
● Object based Storage Devices
22JAN2014
Object Based Storage II
● OSD software implementation
● Usual an additional layer between between
computer and storage
● Presents object-based file system to the computer
● Use a “normal” file system to store data on the
storage
● Delivered as part of CEPH
● File systems: LUSTRE, EXOFS
22JAN2014
CEPH – the full architecture I
● 4 components
● Object based Storage Devices
– Any computer
– Form a cluster (redundancy and load balancing)
● Meta Data Servers
– Any computer
– Form a cluster (redundancy and load balancing)
● Cluster Monitors
– Any computer
● Clients ;-)
22JAN2014
CEPH – the full architecture II
22JAN2014
CEPH and Storage
22JAN2014
CEPH and OSD
● User land implementation
● Any computer can act as OSD
● Uses BTRFS as native file system
● Since 2009
● Before self-developed EBOFS
● Provides functions of OSD-2 standard
– Copy-on-write
– snapshots
● No redundancy on disk or even computer level
22JAN2014
CEPH and OSD – file systems
● BTRFS preferred
● Non-default configuration for mkfs
● XFS and EXT4 possible
● XATTR (size) is key -> EXT4 less recommended
● XFS recommended for Production
22JAN2014
OSD failure approach
● Any OSD expected to fail
● New OSD dynamically added/integrated
● Data distributed and replicated
● Redistribution of data after change in OSD
landscape
22JAN2014
Data distribution
● File stripped
● File pieces mapped to object IDs
● Assignment of so-called placement group to
object ID
● Via hash function
● Placement group (PG): logical container of storage
objects
● Calculation of list of OSD’s out of PG
● CRUSH algorithm
22JAN2014
CRUSH I
● Controlled Replication Under Scalable Hashing
● Considers several information
● Cluster setup/design
● Actual cluster landscape/map
● Placement rules
● Pseudo random -> quasi statistical distribution
● Cannot cope with hot spots
● Clients, MDS and OSD can calculate object
location
22JAN2014
CRUSH II
22JAN2014
Data replication
● N-way replication
● N OSD’s per placement group
● OSD’s in different failure domains
● First non-failed OSD in PG -> primary
● Read and write to primary only
● Writes forwarded by primary to replica OSD’s
● Final write commit after all writes on replica OSD
● Replication traffic within OSD network
22JAN2014
CEPH cluster monitors
● Status information of CEPH components critical
● First contact point for new clients
● Monitor track changes of cluster landscape
● Update cluster map
● Propagate information to OSD’s
22JAN2014
CEPH cluster map I
● Objects: computers and containers
● Container: bucket for computers or containers
● Each object has ID and weight
● Maps physical conditions
● rack location
● fire cells
22JAN2014
CEPH cluster map II
● Reflects data rules
● Number of copies
● Placement of copies
● Updated version sent to OSD’s
● OSD’s distribute cluster map within OSD cluster
● OSD re-calculates via CRUSH PG membership
– data responsibilities
– Order: primary or replica
● New I/O accepted after information synch
22JAN2014
CEPH - RADOS
● Reliable Autonomic Distributed Object Storage
● Direct access to OSD cluster via librados
● Support: C, C++, Java, Python, Ruby, PHP
● Drop/skip of POSIX layer (CEPHFS) on top
● Visible to all CEPH cluster members => shared
storage
22JAN2014
CEPH Block Device
● Aka RADOS block device (RBD)
● Upstream since kernel 2.6.37
● RADOS storage exposed as block device
● /dev/rdb
● qemu/KVM storage driver via librados/librdb
● Alternative to
● shared SAN/iSCSI for HA environments
● Storage HA solutions for qemu/KVM
22JAN2014
The RADOS picture
22JAN2014
CEPH Object Gateway
● Aka RADOS Gateway (RGW)
● RESTful API
● Amazon S3 -> s3 tools work
● SWIFT API's
● Proxy HTTP to RADOS
● Tested with apache and lighthttpd
22JAN2014
CEPH Take Aways
● Scalable
● Flexible configuration
● No SPOF but build on commodity hardware
● Accessible via different interfaces
22JAN2014
Openstack
22JAN2014
What?
● Infrastructure as a Service (IaaS)
● 'Open source' version of AWS
● New versions every 6 months
● Current called Havana
● Previous called Grizzly
● Managed by Openstack Foundation
22JAN2014
Openstack architecture
22JAN2014
Openstack Components
● Keystone – identity
● Glance - image
● Nova - compute
● Cinder - block
● Swift - object
● Neutron - network
● Horizon - dashboard
22JAN2014
About Glance
● There since almost the beginning
● Image store
● Server
● Disk
● Several formats
● raw
● qcow2
● ...
● Shared/non-shared
22JAN2014
About Cinder
● Kind of new
● Part of Nova before
● Separate since Folsom
● Block storage
● For compute instances
● Managed by Openstack users
● Several storage back-ends possible
22JAN2014
About Swift
● Since the beginning
● Replace Amazon S3 -> cloud storage
● Scalable
● Redundant
● Object store
22JAN2014
Openstack Object Store
● Proxy
● Object
● Container
● Account
● Auth
22JAN2014
Dream Team: CEPH and Openstack
22JAN2014
Why CEPH in the first place?
● Full blown storage solution
● Support
● Operational model
● Cloud'ish
● Separation of duties
● One solution for different storage needs
22JAN2014
Integration
● Full integration with RADOS
● Since Folsom
● Two parts
● Authentication
● Technical access
● Both parties must be aware
● Independent for each of the storage
components
● Different RADOS pools
22JAN2014
Authentication
● CEPH part
● Key rings
● Configuration
● For Cinder and Glance
● Openstack part
● Keystone
– Only for Swift
– Needs RGW
22JAN2014
Access to RADOS/RBD I
● Via API/libraries => no CEPHFS needed
● Easy for Glance/Cinder
● CEPH keyring configuration
● Update of CEPH.conf
● Update of Glance/Cinder API configuration
22JAN2014
Access to RADOS/RBD II
● Swift => more work needed
● CEPHFS => again: not needed
● CEPH Object Gateway
– Web server
– RGW software
– Keystone certificates
– No support for v2 Swift authentication
● Keystone authentication
– Endlist configuration -> RGW
22JAN2014
Integration the full picture
RADOS
qemu/kvm
CEPH Object Gateway CEPH Block Device
SwiftKeystone CinderGlance Nova
22JAN2014
Integration pitfalls
● CEPH versions not in sync
● Authentication
● CEPH Object Gateway setup
22JAN2014
Why CEPH - reviewed
● Previous arguments still valid :-)
● Modular usage
● High integration
● Built-in HA
● No need for POSIX compatible interface
● Works even with other IaaS implementations
22JAN2014
Summary
22JAN2014
Take Aways
● Sophisticated storage engine
● Mature OSS distributed storage solution
● Other use cases
● Can be used elsewhere
● Full integration in Openstack
22JAN2014
References
● http://ceph.com
● http://www.openstack.org
22JAN2014
Thank you!
22JAN2014
Petabyte Scale Out Rocks
CEPH as replacement for Openstack's Swift & Co.
Udo Seidel

Weitere ähnliche Inhalte

Was ist angesagt?

Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Atin Mukherjee
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionGluster.org
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmapGluster.org
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challengesGluster.org
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmapGluster.org
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSbipin kunal
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster.org
 
20160130 Gluster-roadmap
20160130 Gluster-roadmap20160130 Gluster-roadmap
20160130 Gluster-roadmapGluster.org
 
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Gluster.org
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Ceph Community
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoringAtin Mukherjee
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Community
 

Was ist angesagt? (20)

Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
Cncf meetup-rook
Cncf meetup-rookCncf meetup-rook
Cncf meetup-rook
 
Cncf meetup-rook
Cncf meetup-rookCncf meetup-rook
Cncf meetup-rook
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFS
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
Gdeploy 2.0
Gdeploy 2.0Gdeploy 2.0
Gdeploy 2.0
 
20160130 Gluster-roadmap
20160130 Gluster-roadmap20160130 Gluster-roadmap
20160130 Gluster-roadmap
 
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoring
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 

Andere mochten auch

Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryCeph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryThe Linux Foundation
 
Ceph Loves OpenStack: Why and How
Ceph Loves OpenStack: Why and HowCeph Loves OpenStack: Why and How
Ceph Loves OpenStack: Why and HowEmma Haruka Iwao
 
Private Cloud mit Ceph und OpenStack
Private Cloud mit Ceph und OpenStackPrivate Cloud mit Ceph und OpenStack
Private Cloud mit Ceph und OpenStackDaniel Schneller
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster PerformanceGluster.org
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackOpenStack_Online
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph clusterMirantis
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 

Andere mochten auch (18)

Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryCeph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
 
Ceph Loves OpenStack: Why and How
Ceph Loves OpenStack: Why and HowCeph Loves OpenStack: Why and How
Ceph Loves OpenStack: Why and How
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
Private Cloud mit Ceph und OpenStack
Private Cloud mit Ceph und OpenStackPrivate Cloud mit Ceph und OpenStack
Private Cloud mit Ceph und OpenStack
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
My SQL on Ceph
My SQL on CephMy SQL on Ceph
My SQL on Ceph
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 

Ähnlich wie adp.ceph.openstack.talk

2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
ceph openstack dream team
ceph openstack dream teamceph openstack dream team
ceph openstack dream teamUdo Seidel
 
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.ioDávid Kőszeghy
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelNETWAYS
 
Introduction to rook
Introduction to rookIntroduction to rook
Introduction to rookRohan Gupta
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH Ceph Community
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project UpdateCeph Community
 
Ceph storage for ocp deploying and managing ceph on top of open shift conta...
Ceph storage for ocp   deploying and managing ceph on top of open shift conta...Ceph storage for ocp   deploying and managing ceph on top of open shift conta...
Ceph storage for ocp deploying and managing ceph on top of open shift conta...OrFriedmann
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyCeph Community
 
Ceph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Community
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Lt2013 glusterfs.talk
Lt2013 glusterfs.talkLt2013 glusterfs.talk
Lt2013 glusterfs.talkUdo Seidel
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmapGluster.org
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCeph Community
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph Ceph Community
 

Ähnlich wie adp.ceph.openstack.talk (20)

2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
ceph openstack dream team
ceph openstack dream teamceph openstack dream team
ceph openstack dream team
 
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io
7. Cloud Native Computing - Kubernetes - Bratislava - Rook.io
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
 
Introduction to rook
Introduction to rookIntroduction to rook
Introduction to rook
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project Update
 
Ceph storage for ocp deploying and managing ceph on top of open shift conta...
Ceph storage for ocp   deploying and managing ceph on top of open shift conta...Ceph storage for ocp   deploying and managing ceph on top of open shift conta...
Ceph storage for ocp deploying and managing ceph on top of open shift conta...
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
Ceph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade in
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Lt2013 glusterfs.talk
Lt2013 glusterfs.talkLt2013 glusterfs.talk
Lt2013 glusterfs.talk
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
RubiX
RubiXRubiX
RubiX
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph
 

Kürzlich hochgeladen

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Kürzlich hochgeladen (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

adp.ceph.openstack.talk

  • 1. Petabyte Scale Out Rocks CEPH as replacement for Openstack's Swift & Co. Udo Seidel
  • 2. 22JAN2014 Agenda ● Background ● Architecture of CEPH ● CEPH Storage ● Openstack ● Dream team: CEPH and Openstack ● Summary
  • 4. 22JAN2014 CEPH – what? ● So-called parallel distributed cluster file system ● Started as part of PhD studies at UCSC ● Public announcement in 2006 at 7th OSDI ● File system shipped with Linux kernel since 2.6.34 ● Name derived from pet octopus - Cephalopods
  • 5. 22JAN2014 Shared file systems – short intro ● Multiple server access the same data ● Different approaches ● Network based, e.g. NFS, CIFS ● Clustered – Shared disk, e.g. CXFS, CFS, GFS(2), OCFS2 – Distributed parallel, e.g. Lustre .. GlusterFS and CEPHFS
  • 7. 22JAN2014 CEPH and storage ● Distributed file system => distributed storage ● Does not use traditional disks or RAID arrays ● Does use so-called OSD’s – Object based Storage Devices – Intelligent disks
  • 8. 22JAN2014 Object Based Storage I ● Objects of quite general nature ● Files ● Partitions ● ID for each storage object ● Separation of meta data operation and storing file data ● HA not covered at all ● Object based Storage Devices
  • 9. 22JAN2014 Object Based Storage II ● OSD software implementation ● Usual an additional layer between between computer and storage ● Presents object-based file system to the computer ● Use a “normal” file system to store data on the storage ● Delivered as part of CEPH ● File systems: LUSTRE, EXOFS
  • 10. 22JAN2014 CEPH – the full architecture I ● 4 components ● Object based Storage Devices – Any computer – Form a cluster (redundancy and load balancing) ● Meta Data Servers – Any computer – Form a cluster (redundancy and load balancing) ● Cluster Monitors – Any computer ● Clients ;-)
  • 11. 22JAN2014 CEPH – the full architecture II
  • 13. 22JAN2014 CEPH and OSD ● User land implementation ● Any computer can act as OSD ● Uses BTRFS as native file system ● Since 2009 ● Before self-developed EBOFS ● Provides functions of OSD-2 standard – Copy-on-write – snapshots ● No redundancy on disk or even computer level
  • 14. 22JAN2014 CEPH and OSD – file systems ● BTRFS preferred ● Non-default configuration for mkfs ● XFS and EXT4 possible ● XATTR (size) is key -> EXT4 less recommended ● XFS recommended for Production
  • 15. 22JAN2014 OSD failure approach ● Any OSD expected to fail ● New OSD dynamically added/integrated ● Data distributed and replicated ● Redistribution of data after change in OSD landscape
  • 16. 22JAN2014 Data distribution ● File stripped ● File pieces mapped to object IDs ● Assignment of so-called placement group to object ID ● Via hash function ● Placement group (PG): logical container of storage objects ● Calculation of list of OSD’s out of PG ● CRUSH algorithm
  • 17. 22JAN2014 CRUSH I ● Controlled Replication Under Scalable Hashing ● Considers several information ● Cluster setup/design ● Actual cluster landscape/map ● Placement rules ● Pseudo random -> quasi statistical distribution ● Cannot cope with hot spots ● Clients, MDS and OSD can calculate object location
  • 19. 22JAN2014 Data replication ● N-way replication ● N OSD’s per placement group ● OSD’s in different failure domains ● First non-failed OSD in PG -> primary ● Read and write to primary only ● Writes forwarded by primary to replica OSD’s ● Final write commit after all writes on replica OSD ● Replication traffic within OSD network
  • 20. 22JAN2014 CEPH cluster monitors ● Status information of CEPH components critical ● First contact point for new clients ● Monitor track changes of cluster landscape ● Update cluster map ● Propagate information to OSD’s
  • 21. 22JAN2014 CEPH cluster map I ● Objects: computers and containers ● Container: bucket for computers or containers ● Each object has ID and weight ● Maps physical conditions ● rack location ● fire cells
  • 22. 22JAN2014 CEPH cluster map II ● Reflects data rules ● Number of copies ● Placement of copies ● Updated version sent to OSD’s ● OSD’s distribute cluster map within OSD cluster ● OSD re-calculates via CRUSH PG membership – data responsibilities – Order: primary or replica ● New I/O accepted after information synch
  • 23. 22JAN2014 CEPH - RADOS ● Reliable Autonomic Distributed Object Storage ● Direct access to OSD cluster via librados ● Support: C, C++, Java, Python, Ruby, PHP ● Drop/skip of POSIX layer (CEPHFS) on top ● Visible to all CEPH cluster members => shared storage
  • 24. 22JAN2014 CEPH Block Device ● Aka RADOS block device (RBD) ● Upstream since kernel 2.6.37 ● RADOS storage exposed as block device ● /dev/rdb ● qemu/KVM storage driver via librados/librdb ● Alternative to ● shared SAN/iSCSI for HA environments ● Storage HA solutions for qemu/KVM
  • 26. 22JAN2014 CEPH Object Gateway ● Aka RADOS Gateway (RGW) ● RESTful API ● Amazon S3 -> s3 tools work ● SWIFT API's ● Proxy HTTP to RADOS ● Tested with apache and lighthttpd
  • 27. 22JAN2014 CEPH Take Aways ● Scalable ● Flexible configuration ● No SPOF but build on commodity hardware ● Accessible via different interfaces
  • 29. 22JAN2014 What? ● Infrastructure as a Service (IaaS) ● 'Open source' version of AWS ● New versions every 6 months ● Current called Havana ● Previous called Grizzly ● Managed by Openstack Foundation
  • 31. 22JAN2014 Openstack Components ● Keystone – identity ● Glance - image ● Nova - compute ● Cinder - block ● Swift - object ● Neutron - network ● Horizon - dashboard
  • 32. 22JAN2014 About Glance ● There since almost the beginning ● Image store ● Server ● Disk ● Several formats ● raw ● qcow2 ● ... ● Shared/non-shared
  • 33. 22JAN2014 About Cinder ● Kind of new ● Part of Nova before ● Separate since Folsom ● Block storage ● For compute instances ● Managed by Openstack users ● Several storage back-ends possible
  • 34. 22JAN2014 About Swift ● Since the beginning ● Replace Amazon S3 -> cloud storage ● Scalable ● Redundant ● Object store
  • 35. 22JAN2014 Openstack Object Store ● Proxy ● Object ● Container ● Account ● Auth
  • 36. 22JAN2014 Dream Team: CEPH and Openstack
  • 37. 22JAN2014 Why CEPH in the first place? ● Full blown storage solution ● Support ● Operational model ● Cloud'ish ● Separation of duties ● One solution for different storage needs
  • 38. 22JAN2014 Integration ● Full integration with RADOS ● Since Folsom ● Two parts ● Authentication ● Technical access ● Both parties must be aware ● Independent for each of the storage components ● Different RADOS pools
  • 39. 22JAN2014 Authentication ● CEPH part ● Key rings ● Configuration ● For Cinder and Glance ● Openstack part ● Keystone – Only for Swift – Needs RGW
  • 40. 22JAN2014 Access to RADOS/RBD I ● Via API/libraries => no CEPHFS needed ● Easy for Glance/Cinder ● CEPH keyring configuration ● Update of CEPH.conf ● Update of Glance/Cinder API configuration
  • 41. 22JAN2014 Access to RADOS/RBD II ● Swift => more work needed ● CEPHFS => again: not needed ● CEPH Object Gateway – Web server – RGW software – Keystone certificates – No support for v2 Swift authentication ● Keystone authentication – Endlist configuration -> RGW
  • 42. 22JAN2014 Integration the full picture RADOS qemu/kvm CEPH Object Gateway CEPH Block Device SwiftKeystone CinderGlance Nova
  • 43. 22JAN2014 Integration pitfalls ● CEPH versions not in sync ● Authentication ● CEPH Object Gateway setup
  • 44. 22JAN2014 Why CEPH - reviewed ● Previous arguments still valid :-) ● Modular usage ● High integration ● Built-in HA ● No need for POSIX compatible interface ● Works even with other IaaS implementations
  • 46. 22JAN2014 Take Aways ● Sophisticated storage engine ● Mature OSS distributed storage solution ● Other use cases ● Can be used elsewhere ● Full integration in Openstack
  • 49. 22JAN2014 Petabyte Scale Out Rocks CEPH as replacement for Openstack's Swift & Co. Udo Seidel