SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
ceph 介紹
果凍
簡介
● 任職於 inwinstack
○ 過去的迎廣科技雲端應用研發中
心
● python, django, linux,
openstack, docker
● kjellytw at gmail dot com
● http://www.blackwhite.
tw/
大綱
● 儲存系統是什麼
● 儲存系統的演進
● ceph 介紹
● python 如何用 ceph
● ceph 的一些指令介紹
● yahoo 的 ceph 架構介紹
儲存系統是什麼?
● 容量有限,資料無窮
● 一顆硬碟不夠,就在多一顆硬碟
● 儲存系統,是管理很多資料的系統
Storage System
● Feature:
○ Replication
○ High capacity
○ Consistency
● Optional feature:
○ Over Allocation
○ Snapshot
○ Dereplication
儲存系統的演進 - 1. 自己的硬碟自己用
Host
DISK
DISK
DISK
磁碟控
制器
檔案系統 目錄層
JBOD Host
DISK
DISK
DISK
磁碟控
制器
檔案系統 目錄層
儲存系統的演進 - 2
Host
DISK
DISK
DISK
磁碟控制器 or
SCSI 控制器
等
檔案系統 目錄層LUN FC protocal or iscsi
Host
儲存系統的演進 - 3
NAS
DISK
DISK
DISK
磁碟控制器 or
SCSI 控制器
等
檔案系統LUN
FC protocal
or iscsi
目錄層
What’s Next?
● storage cluster:
○ capacity
○ performance
● stroage cluster 例子:
○ lbrix
○ Panfs
○ ceph
Storage Cluster - 1
Client
Controller
Storage
node
1. Client send data
to controller to store
data.
2. Controller store
the data to storage.
Storage
node
Storage Cluster - 2
Client Controller1. Get where the data
sotre
2. Client store the data to
stroage directly.
Storage
node
Storage
node
Storage Cluster - 3
Client
monitor
Storage
1. Get the cluster
information.
3. Cient store the data to
stroa“ge directly.
2. Client compute the position where the data
should put based on cluster information
Storage
node
Ceph
● One of clustered storage
● Software defined storage
○ Cost-performance tradeoff
○ Flexible interfaces
○ Different storage abstractions
Ceph
● A distributed object store and file system
designed to provide excellent performance,
reliability and scalability.
● Open source and freely-available, and it
always will be.
● Object Storage (rados)
● Block Storage (rbd)
● File System (cephfs)
● Object Storage:
○ You can get/put object using key based on the
interface Object Storage provides.
○ example: S3
● Block Storage:
○ Block storage provide you virtual disk.
The virtual disk is just like real disk.
● File System:
○ Just like nas
Ceph Motivating Principles
● Everything must scale horizontally
● No single point of failure
● Commodity hardware
● Self-manage whenever possible
Ceph Architectural Features
● Object locations get computed.
○ CRUSH algorithm
○ Ceph OSD Daemon uses it to compute where
replicas of objects should be stored (and for
rebalancing).
○ Ceph Clients use the CRUSH algorithm to efficiently
compute information about object location
● No centralized interface.
● OSDs Service Clients Directly
Ceph Component
RADOS
● A Scalable, Reliable Storage Service for
Petabyte-scale Storage Clusters
● Component:
○ ceph-mon
○ ceph-osd
○ librados
● data hierarchy:
○ pool
○ object
RADOS
● ceph-mon:
○ maintaining a master copy of the cluster map
○ These do not serve stored objects to clients
RADOS
● ceph-osd:
○ Storing objects on a local file system
○ Providing access to objects over the network to
client directly.
○ Use CRUSH algorithm to get where the object is.
○ One per disk(RAID group)
○ Peering
○ Checks its own state and the state of other OSDs
and reports back to monitors.
RADOS
● librados:
○ Client retrieve the latest copy of the Cluster Map,
so it knows about all of the monitors, OSDs, and
metadata servers in the cluster
○ Client use CRUSH and Cluster Map to get where the
object is.
○ Directly access to OSD.
CRUSH
● Controlled Replication Under Scalable
Hashing
● Hash algorithm
● Generate position based on:
○ pg(Placement Group)
○ cluster map
○ rule set
What’s the Problem CRUSH Solved?
How to Decide Where the Object
Put?
● Way 1: look up the table.
○ Easy to implement
○ Hard to scale horizontally
key1 node1
key2 node1
key3 node2
key4 node1
How to Decide Where the Object
Put?
● Way 2: hash:
○ Easy to implement
○ But too many data movement when rebanlance
A:
0~33
B:
C:
66~99
A:
0~25
B:
25~50
D:
50~75
C:
75~50
Add new node
How to Decide Where the Object
Put?
● Way 3: hash
with static
table:
○ look up table
after hashing
○ openstack swift
hash
data1 data2 data3
node1
data4
node2 node3
Virtual
partitio
n1
Virtual
partitio
n2
Virtual
partitio
n3
Virtual
partitio
n4
data5
map
How to Decide Where the Object
Put?
● Way 4: CRUSH
○ Fast calculation, no lookup
○ Repeatable, deterministic
○ Statistically uniform distribution
○ Stable mapping
○ Rule-based configuration
Why We Need Placement Group
● A layer of indirection between the Ceph
OSD Daemon and the Ceph Client.
○ Decoupling OSD and client.
○ Easy to rebanlance.
Placement Group
Pool Pool
Placement Group Placement Group Placement Group
OSD OSD OSD OSD
Object Object Object
How to Compute Object Location
object id: foo
pool: bar
hash(“foo”) % 256 = 0x23
“bar” => 3
Placement
Group: 3.23
OSDMap.h
How to Compute Object Location
Placement
Group: 3.23
CRUSH OSD
OSD
OSD
How does the Client Access Object
Client monitor
Storage
1. get the cluster
information.
3. client store the data to
stroage directly.
2. Client compute the position where the data
should put based on cluster information
Storage
node
How does the Client Write Object
How does the Client Read Object
● Read from primary osd
● Send reads to all replicas. The quickest
reply would "win" and others would be
ignored.
Client
Primary
OSD
Client
OSD
OSD
OSD
Rados Gateway
● HTTP REST gateway for the RADOS object
store
● Rich api
○ S3 api
○ swift api
● Integrate with openstack keystone.
● stateless
radowgw with s3 client
radosgw with swift client
Why We Need radowgw?
● We want to use RESTFul api
○ S3 api
○ swift api
● We don’t want other to know the cluster
status.
RBD
● RADOS Block Devices
○ Provide virtual disk just like real disk.
● Image strip (by librbd)
● integrate with linux kernel, kvm
Why RBD need to Strip Image?
● Avoid big file.
● Parallelism
● Random access
0~2500 2500~5000 5000~7500 7500~10000
01XXXX OSD1 OSD3 OSD4 OSD6
02XXXX OSD8 OSD2 OSD3 OSD5
03XXXX OSD1 OSD6 OSD2 OSD3
librbd
OSD1
OSD4
OSD3
librados Support Stripping?
● No, librados doesn’t support stripping.
● But you can use libradosstriper
○ Poor document.
Openstack use rbd
● You can use rbd with nova, glance, or
cinder.
● Cinder use rbd to provide volumes.
● Glance use rbd to store image.
● Why glance use rbd instead of libradow or
radowgw?
○ Copy-on-write
librados(python)
librados(python)
librbd(python)
librbd(python)
rados command
● rados lspools
● rados mkpool my_pool
● rados create obj_name -p my
● rados put file_path obj_name -p my
● rados get obj_name file_path -p my
rados command
● rados getxattr obj_name attr_name
● rados setxattr obj_name attr_name value
● rados rmxattr obj_name attr_name
rados command
● rados lssnap -p pool_name
● rados mksnap snap_name -p pool_name
● rados rmsnap snap_name -p pool_name
● rados obj_name snap_name -p pool_name
rados command
● rados import backup_dir pool_name
● rados export pool_name backup_dir
rbd command
● rbd create --size 1000 volume -p pools
● rbd map volume -p pools
○ /dev/rbd*
● rbd unmap /dev/rbd0
● rbd import file_path image_name
● rbd export image_name file_path
rbd command
● snap ls <image-name>
● snap create <snap-name>
● snap rollback <snap-name>
● snap rm <snap-name>
● snap purge <image-name>
● snap protect <snap-name>
● snap unprotect <snap-name>
Yahoo Ceph Cluster Architecture
● COS contains
many ceph
culster
○ limit OSD number.
● There are many
gateway behind
of load
banlancer.
Summary
● 儲存系統演進
● 介紹 ceph
● CRUSH 和其他方法的比較
徵人
● Interesting in
○ openstack
○ ceph
Q & A
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ontico
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldSage Weil
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph clusterMirantis
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous updateinwin stack
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to CephCeph Community
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about cephEmma Haruka Iwao
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageSage Weil
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookDanny Al-Gaaf
 
Unified readonly cache for ceph
Unified readonly cache for cephUnified readonly cache for ceph
Unified readonly cache for cephzhouyuan
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and BeyondSage Weil
 
Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelColleen Corrice
 

Was ist angesagt? (19)

Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud world
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous update
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
Unified readonly cache for ceph
Unified readonly cache for cephUnified readonly cache for ceph
Unified readonly cache for ceph
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
librados
libradoslibrados
librados
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and Beyond
 
Ceph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to JewelCeph Performance: Projects Leading up to Jewel
Ceph Performance: Projects Leading up to Jewel
 

Andere mochten auch

Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬Hang Geng
 
Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05
Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05
Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05Hang Geng
 
Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03Hang Geng
 
Openstack swift, how does it work?
Openstack swift, how does it work?Openstack swift, how does it work?
Openstack swift, how does it work?kao kuo-tung
 
Immutable infrastructure 介紹與實做:以 kolla 為例
Immutable infrastructure 介紹與實做:以 kolla 為例Immutable infrastructure 介紹與實做:以 kolla 為例
Immutable infrastructure 介紹與實做:以 kolla 為例kao kuo-tung
 
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Jens Hadlich
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed_Hat_Storage
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明National Cheng Kung University
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
Divein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetupDivein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetupJiaying Ren
 
Rgw multisite-overview v2
Rgw multisite-overview v2Rgw multisite-overview v2
Rgw multisite-overview v2Jiaying Ren
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 

Andere mochten auch (15)

Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
Ceph中国社区9.19 Ceph FS-基于RADOS的高性能分布式文件系统02-袁冬
 
Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05
Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05
Ceph中国社区9.19 Ceph IO 路径 和性能分析-王豪迈05
 
Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03Ceph中国社区9.19 Some Ceph Story-朱荣泽03
Ceph中国社区9.19 Some Ceph Story-朱荣泽03
 
Openstack swift, how does it work?
Openstack swift, how does it work?Openstack swift, how does it work?
Openstack swift, how does it work?
 
Immutable infrastructure 介紹與實做:以 kolla 為例
Immutable infrastructure 介紹與實做:以 kolla 為例Immutable infrastructure 介紹與實做:以 kolla 為例
Immutable infrastructure 介紹與實做:以 kolla 為例
 
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere Mortals
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Python to scala
Python to scalaPython to scala
Python to scala
 
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Divein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetupDivein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetup
 
Rgw multisite-overview v2
Rgw multisite-overview v2Rgw multisite-overview v2
Rgw multisite-overview v2
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 

Ähnlich wie Introduction to Ceph Storage - A Distributed Object Store and File System

Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfClyso GmbH
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETNikos Kormpakis
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH Ceph Community
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganetikawamuray
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storageKarol Chrapek
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Orit Wasserman
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Community
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyCeph Community
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureThomas Uhl
 

Ähnlich wie Introduction to Ceph Storage - A Distributed Object Store and File System (20)

Discoblocks.pptx.pdf
Discoblocks.pptx.pdfDiscoblocks.pptx.pdf
Discoblocks.pptx.pdf
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Scale 10x 01:22:12
Scale 10x 01:22:12Scale 10x 01:22:12
Scale 10x 01:22:12
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
Strata - 03/31/2012
Strata - 03/31/2012Strata - 03/31/2012
Strata - 03/31/2012
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Ceph Research at UCSC
Ceph Research at UCSCCeph Research at UCSC
Ceph Research at UCSC
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storage
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
 

Mehr von kao kuo-tung

用 Open source 改造鍵盤
用 Open source 改造鍵盤用 Open source 改造鍵盤
用 Open source 改造鍵盤kao kuo-tung
 
Why is a[1] fast than a.get(1)
Why is a[1]  fast than a.get(1)Why is a[1]  fast than a.get(1)
Why is a[1] fast than a.get(1)kao kuo-tung
 
減少重複的測試程式碼的一些方法
減少重複的測試程式碼的一些方法減少重複的測試程式碼的一些方法
減少重複的測試程式碼的一些方法kao kuo-tung
 
Openstack taskflow 簡介
Openstack taskflow 簡介Openstack taskflow 簡介
Openstack taskflow 簡介kao kuo-tung
 
Async: ways to store state
Async:  ways to store stateAsync:  ways to store state
Async: ways to store statekao kuo-tung
 
Docker 原理與實作
Docker 原理與實作Docker 原理與實作
Docker 原理與實作kao kuo-tung
 
那些年,我們一起看的例外
那些年,我們一起看的例外那些年,我們一起看的例外
那些年,我們一起看的例外kao kuo-tung
 
Python 中 += 與 join比較
Python 中 += 與 join比較Python 中 += 與 join比較
Python 中 += 與 join比較kao kuo-tung
 
Garbage collection 介紹
Garbage collection 介紹Garbage collection 介紹
Garbage collection 介紹kao kuo-tung
 
Python 如何執行
Python 如何執行Python 如何執行
Python 如何執行kao kuo-tung
 
C python 原始碼解析 投影片
C python 原始碼解析 投影片C python 原始碼解析 投影片
C python 原始碼解析 投影片kao kuo-tung
 
recover_pdb 原理與介紹
recover_pdb 原理與介紹recover_pdb 原理與介紹
recover_pdb 原理與介紹kao kuo-tung
 

Mehr von kao kuo-tung (13)

用 Open source 改造鍵盤
用 Open source 改造鍵盤用 Open source 改造鍵盤
用 Open source 改造鍵盤
 
Why is a[1] fast than a.get(1)
Why is a[1]  fast than a.get(1)Why is a[1]  fast than a.get(1)
Why is a[1] fast than a.get(1)
 
減少重複的測試程式碼的一些方法
減少重複的測試程式碼的一些方法減少重複的測試程式碼的一些方法
減少重複的測試程式碼的一些方法
 
Openstack taskflow 簡介
Openstack taskflow 簡介Openstack taskflow 簡介
Openstack taskflow 簡介
 
Async: ways to store state
Async:  ways to store stateAsync:  ways to store state
Async: ways to store state
 
Openstack 簡介
Openstack 簡介Openstack 簡介
Openstack 簡介
 
Docker 原理與實作
Docker 原理與實作Docker 原理與實作
Docker 原理與實作
 
那些年,我們一起看的例外
那些年,我們一起看的例外那些年,我們一起看的例外
那些年,我們一起看的例外
 
Python 中 += 與 join比較
Python 中 += 與 join比較Python 中 += 與 join比較
Python 中 += 與 join比較
 
Garbage collection 介紹
Garbage collection 介紹Garbage collection 介紹
Garbage collection 介紹
 
Python 如何執行
Python 如何執行Python 如何執行
Python 如何執行
 
C python 原始碼解析 投影片
C python 原始碼解析 投影片C python 原始碼解析 投影片
C python 原始碼解析 投影片
 
recover_pdb 原理與介紹
recover_pdb 原理與介紹recover_pdb 原理與介紹
recover_pdb 原理與介紹
 

Kürzlich hochgeladen

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Kürzlich hochgeladen (20)

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Introduction to Ceph Storage - A Distributed Object Store and File System

  • 2. 簡介 ● 任職於 inwinstack ○ 過去的迎廣科技雲端應用研發中 心 ● python, django, linux, openstack, docker ● kjellytw at gmail dot com ● http://www.blackwhite. tw/
  • 3. 大綱 ● 儲存系統是什麼 ● 儲存系統的演進 ● ceph 介紹 ● python 如何用 ceph ● ceph 的一些指令介紹 ● yahoo 的 ceph 架構介紹
  • 5. Storage System ● Feature: ○ Replication ○ High capacity ○ Consistency ● Optional feature: ○ Over Allocation ○ Snapshot ○ Dereplication
  • 6. 儲存系統的演進 - 1. 自己的硬碟自己用 Host DISK DISK DISK 磁碟控 制器 檔案系統 目錄層 JBOD Host DISK DISK DISK 磁碟控 制器 檔案系統 目錄層
  • 7. 儲存系統的演進 - 2 Host DISK DISK DISK 磁碟控制器 or SCSI 控制器 等 檔案系統 目錄層LUN FC protocal or iscsi
  • 8. Host 儲存系統的演進 - 3 NAS DISK DISK DISK 磁碟控制器 or SCSI 控制器 等 檔案系統LUN FC protocal or iscsi 目錄層
  • 9. What’s Next? ● storage cluster: ○ capacity ○ performance ● stroage cluster 例子: ○ lbrix ○ Panfs ○ ceph
  • 10. Storage Cluster - 1 Client Controller Storage node 1. Client send data to controller to store data. 2. Controller store the data to storage. Storage node
  • 11. Storage Cluster - 2 Client Controller1. Get where the data sotre 2. Client store the data to stroage directly. Storage node Storage node
  • 12. Storage Cluster - 3 Client monitor Storage 1. Get the cluster information. 3. Cient store the data to stroa“ge directly. 2. Client compute the position where the data should put based on cluster information Storage node
  • 13. Ceph ● One of clustered storage ● Software defined storage ○ Cost-performance tradeoff ○ Flexible interfaces ○ Different storage abstractions
  • 14. Ceph ● A distributed object store and file system designed to provide excellent performance, reliability and scalability. ● Open source and freely-available, and it always will be. ● Object Storage (rados) ● Block Storage (rbd) ● File System (cephfs)
  • 15. ● Object Storage: ○ You can get/put object using key based on the interface Object Storage provides. ○ example: S3 ● Block Storage: ○ Block storage provide you virtual disk. The virtual disk is just like real disk. ● File System: ○ Just like nas
  • 16. Ceph Motivating Principles ● Everything must scale horizontally ● No single point of failure ● Commodity hardware ● Self-manage whenever possible
  • 17. Ceph Architectural Features ● Object locations get computed. ○ CRUSH algorithm ○ Ceph OSD Daemon uses it to compute where replicas of objects should be stored (and for rebalancing). ○ Ceph Clients use the CRUSH algorithm to efficiently compute information about object location ● No centralized interface. ● OSDs Service Clients Directly
  • 19. RADOS ● A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters ● Component: ○ ceph-mon ○ ceph-osd ○ librados ● data hierarchy: ○ pool ○ object
  • 20. RADOS ● ceph-mon: ○ maintaining a master copy of the cluster map ○ These do not serve stored objects to clients
  • 21. RADOS ● ceph-osd: ○ Storing objects on a local file system ○ Providing access to objects over the network to client directly. ○ Use CRUSH algorithm to get where the object is. ○ One per disk(RAID group) ○ Peering ○ Checks its own state and the state of other OSDs and reports back to monitors.
  • 22. RADOS ● librados: ○ Client retrieve the latest copy of the Cluster Map, so it knows about all of the monitors, OSDs, and metadata servers in the cluster ○ Client use CRUSH and Cluster Map to get where the object is. ○ Directly access to OSD.
  • 23. CRUSH ● Controlled Replication Under Scalable Hashing ● Hash algorithm ● Generate position based on: ○ pg(Placement Group) ○ cluster map ○ rule set
  • 24. What’s the Problem CRUSH Solved?
  • 25. How to Decide Where the Object Put? ● Way 1: look up the table. ○ Easy to implement ○ Hard to scale horizontally key1 node1 key2 node1 key3 node2 key4 node1
  • 26. How to Decide Where the Object Put? ● Way 2: hash: ○ Easy to implement ○ But too many data movement when rebanlance A: 0~33 B: C: 66~99 A: 0~25 B: 25~50 D: 50~75 C: 75~50 Add new node
  • 27.
  • 28. How to Decide Where the Object Put? ● Way 3: hash with static table: ○ look up table after hashing ○ openstack swift hash data1 data2 data3 node1 data4 node2 node3 Virtual partitio n1 Virtual partitio n2 Virtual partitio n3 Virtual partitio n4 data5 map
  • 29. How to Decide Where the Object Put? ● Way 4: CRUSH ○ Fast calculation, no lookup ○ Repeatable, deterministic ○ Statistically uniform distribution ○ Stable mapping ○ Rule-based configuration
  • 30. Why We Need Placement Group ● A layer of indirection between the Ceph OSD Daemon and the Ceph Client. ○ Decoupling OSD and client. ○ Easy to rebanlance.
  • 31. Placement Group Pool Pool Placement Group Placement Group Placement Group OSD OSD OSD OSD Object Object Object
  • 32. How to Compute Object Location object id: foo pool: bar hash(“foo”) % 256 = 0x23 “bar” => 3 Placement Group: 3.23 OSDMap.h
  • 33. How to Compute Object Location Placement Group: 3.23 CRUSH OSD OSD OSD
  • 34. How does the Client Access Object Client monitor Storage 1. get the cluster information. 3. client store the data to stroage directly. 2. Client compute the position where the data should put based on cluster information Storage node
  • 35. How does the Client Write Object
  • 36. How does the Client Read Object ● Read from primary osd ● Send reads to all replicas. The quickest reply would "win" and others would be ignored. Client Primary OSD Client OSD OSD OSD
  • 37. Rados Gateway ● HTTP REST gateway for the RADOS object store ● Rich api ○ S3 api ○ swift api ● Integrate with openstack keystone. ● stateless
  • 38. radowgw with s3 client
  • 40. Why We Need radowgw? ● We want to use RESTFul api ○ S3 api ○ swift api ● We don’t want other to know the cluster status.
  • 41. RBD ● RADOS Block Devices ○ Provide virtual disk just like real disk. ● Image strip (by librbd) ● integrate with linux kernel, kvm
  • 42. Why RBD need to Strip Image? ● Avoid big file. ● Parallelism ● Random access 0~2500 2500~5000 5000~7500 7500~10000 01XXXX OSD1 OSD3 OSD4 OSD6 02XXXX OSD8 OSD2 OSD3 OSD5 03XXXX OSD1 OSD6 OSD2 OSD3 librbd OSD1 OSD4 OSD3
  • 43. librados Support Stripping? ● No, librados doesn’t support stripping. ● But you can use libradosstriper ○ Poor document.
  • 44. Openstack use rbd ● You can use rbd with nova, glance, or cinder. ● Cinder use rbd to provide volumes. ● Glance use rbd to store image. ● Why glance use rbd instead of libradow or radowgw? ○ Copy-on-write
  • 49. rados command ● rados lspools ● rados mkpool my_pool ● rados create obj_name -p my ● rados put file_path obj_name -p my ● rados get obj_name file_path -p my
  • 50. rados command ● rados getxattr obj_name attr_name ● rados setxattr obj_name attr_name value ● rados rmxattr obj_name attr_name
  • 51. rados command ● rados lssnap -p pool_name ● rados mksnap snap_name -p pool_name ● rados rmsnap snap_name -p pool_name ● rados obj_name snap_name -p pool_name
  • 52. rados command ● rados import backup_dir pool_name ● rados export pool_name backup_dir
  • 53. rbd command ● rbd create --size 1000 volume -p pools ● rbd map volume -p pools ○ /dev/rbd* ● rbd unmap /dev/rbd0 ● rbd import file_path image_name ● rbd export image_name file_path
  • 54. rbd command ● snap ls <image-name> ● snap create <snap-name> ● snap rollback <snap-name> ● snap rm <snap-name> ● snap purge <image-name> ● snap protect <snap-name> ● snap unprotect <snap-name>
  • 55. Yahoo Ceph Cluster Architecture ● COS contains many ceph culster ○ limit OSD number. ● There are many gateway behind of load banlancer.
  • 56. Summary ● 儲存系統演進 ● 介紹 ceph ● CRUSH 和其他方法的比較
  • 57. 徵人 ● Interesting in ○ openstack ○ ceph
  • 58. Q & A