SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Performance Analysis with ceph
雲儲存性能分析
Alex Lau 劉俊賢
Software Consultant 研發工程師/顧問
(AvengerMoJo) alau@suse.com
Agenda 議程
SES5 is base on Luminous – The Why? 為何分析性能?
Ceph performance – The How? 如何分析性能?
Ceph analysis – The What? 分析結果是怎樣?
The Future? 大家有什麼展望?
SES5 is base on Luminous – The Why?
為何分析性能?
SUSE Enterprise Storage 5 base on Ceph Luminous
4
•  Support x86-64 and AArch64
•  Easy to use WebUI openATTIC 3.x
•  Simple DeepSea Cluster orchestration
•  BlueStore ready with data compression
•  Cephfs and NFS-Ganesha ready
Ceph Info 基本資料
Code Developers
782
Core Regular Casual
22 53 705
Total downloads
160,015,454
Unique downloads
21,264,047
Client Servers
(Windows, Linux, Unix)
RADOS (Common Object Store)
Block Devices
Server
Object Storage File Interface
Storage
Server
Storage
Server
Storage
Server
Storage
Server
Server
Server
Storage
Server
Storage
Server
Applications File Share
OSD OSD OSD OSD OSD OSD
NetworkCluster
MON
RBD
iSCSI
S3
SWIFT
CephFS*
Monitors
MONMON
5
SUSE Enterprise Storage其實做了什麼?
基於Ceph	
2013: SUSE 加入 Ceph 社區	
2015.01 : SUSE Enterprise Storage 1.0
2015.11: SUSE Enterprise Storage 2.0
2016.01: SUSE Enterprise Storage 2.1
2016.07: SUSE Enterprise Storage 3.0
2016: SUSE 收购 IT-Novum(主要開發存儲管理工具 openATTIC)
2016.09: SUSE 合併 HPE 軟體	
2016.11: SUSE Enterprise Storage 4.0
2017.09: SUSE Enterprise Storage 5.0 (Beta Now)
	
6
7
SUSE Enterprise Storage
Roadmap
2016 2017 2018
V3
V4
V5
SUSE Enterprise Storage 4 SUSE Enterprise Storage 5*
Built On
•  Ceph Jewel release
•  SLES 12 SP 2 (Server)
Manageability
•  SES openATTIC management
•  Initial Salt integration DeepSea
Interoperability
•  AArch64
•  CephFS (production use cases)
•  NFS Ganesha (Tech Preview)
•  NFS access to S3 buckets (Tech Preview)
•  CIFS Samba (Tech Preview)
•  RDMA/Infiniband (Tech Preview)
Availability
•  Multisite object replication
•  Asynchronous block mirroring
Built On
•  Ceph Luminous release
•  SLES 12 SP 3 (Server)
Manageability
•  SES openATTIC management phase 2
•  SUSE Manager integration
Interoperability
•  NFS Ganesha
•  NFS access to S3 buckets
Availability
•  Asynchronous block mirroring
•  Erasure coded block pool
Efficiency
•  BlueStore back-end
•  Data compression
SUSE 和 Ceph社區的關系
2013年建立Ceph開發團隊,正式加入Ceph的代碼支持	
SUSE 是 Ceph理事會8大理事會員之一	
	
	
一直以來代碼貢獻頭3名, 上次v12 Luminous發報第2多, 單個人第1 Ricardo
1.  10648 Ricardo Dias rdias@suse.com
2.  6422 Sage Weil sweil@redhat.com	
有收到 Inktank 和 Mirantis 開發人員	
加大研發投入: 2016年中收購存儲管理工具: it-novum (openATTIC)	
	
8
Customer needs! 客戶的需要!
•  How high do we need?
•  How much do we want to pay?
•  Are we counting 4K Random Write?
•  Do we need Erasure Coding?
•  Can we have the number on BlueStore
vs FileStore?
•  What about cephfs over nfs?
•  ……
What’s our cluster IOPS?	
•  我們需要多高的IOPS?
•  我們想花多少錢?
•  我們只需要看4K隨機寫?
•  我們有糾刪碼的需要嗎?
•  我們能看看BlueStore對比FileStore嗎?
•  我們能看看cephfs上的NFS能怎樣?
•  …….
我們的存儲叢集能跑多少IOPS?	
9
System Engineer needs! 售前售後工程師的需要!
Putting them here is already show how much we care!
	
10
11
The DevOps Needs!開發運維的需要!
Can we have some base line number?
Can we have tools to re-do performance analysis?
BTW we need all that to our live cluster!
能否告訴正常時叢集該跑多快?
能否每天再測試報告叢集現在的性能?
不是分開測而是實際運作中的叢集性能哦!
12
The Developers Needs! 開發者的需要!
How much is BlueStore vs Filestore 4K, 16K, 64K, 1M, 4M write seq, random, read
write performance in hdd vs ssd vs nvme pure and mixed pool, when an OSD is
down, and the recovery process is started and meanwhile tracking how much
memory needed when the capacity are 50% full without stopping the client keep
writing to the cluster….
In short … very complicated analysis
…. 不寫了.. 很多很多要求就是.
p.s. 大家有硬體提供我們測試請找我, 會後我請你… J
13
The Community Project Need! 社區的性能有關項目!
Ceph Brag
•  It require to use CBT
- https://github.com/ceph/cbt
•  Setting up the CBT environment could
be complicated
•  It require another set of multiple nodes
setup
•  It is intrusive test that can’t be operate in
a live cluster
•  http://ceph.com/performance/	
Ceph Brag
•  CBT是現在測試的標準
- https://github.com/ceph/cbt
•  設定CBT環境並不簡單
•  需要多台機器誰行測試
•  影響到內容不能向運作中的系統做測試
•  每週的性能有關會
- http://ceph.com/performance/	
14
The Goal
•  Discover hardware, driver issue from
time to time to check for degrade.
•  Intrusive / Non-intrusive performance
testing to do before and after the cluster
being setup
•  Able to perform to live and production
cluster without stopping operation
•  Low dependence, flexible, open source,
and able to share eventually and align
wth Ceph Brag
•  Dynamic using different tools to do
monitoring and tracing but able to
orchestrated by a central admin that fit
the container and cloud story
•  可以不定時發現硬體或驅動出現問題
•  在叢集設定前可以具破壞性的測試但在運
作中的叢集可以用一樣的工具去做同類但
不具破壞性的做法
•  測試可以在正在運作中的系統上進行而不
影響一般運作
•  不用依賴太多三方面的需要, 可以靈活使
用, 開源, 最終可以和Ceph Brag共享一樣
的發報方法
•  可以使用不同的管理和跟蹤工具, 同時可
以利用中央管理工具預備可以和容器和雲
管理整合	
15
Ceph Performance – The How?
如何分析性能?
16
Hardware Testing
Network:
ping –c1 –q –W1
Specific for Jumbo Frame
ping –Mdo –s8972 –c1 –q –W1
Bandwidth:
iperf3 –fm –A0 –t10 –c<server ip> -p<port>
Harddisk:
Read:
hdparm –t <osd_partition>
Write:
dd if=/dev/zero of=`mount <osd_data_partition>/test/
conv=fdatasync bs=4K count=10000
17
Cluster Testing
OSD Bench (1G write with 4M block default)
ceph tell osd.0 bench
RADOS Bench
rados –p <pool> <time> <write,seq,rand> -t <thread> --no-cleanup
RBD bench
rbd –p <pool> bench-write <image> --io-size <e.g 4096>
--io-threads <1,4,16 etc> --io-total<total size e.g.209715200>
--io-pattern <rand | seq>
New version has io-type
rbd -p <pool> bench --io-type read rbd_test --io-size 4096
--io-threads 1 --io-total 209715200 --io-pattern rand
18
Client Testing
FIO can be use against any of the following:
fio --ioengine=libaio --iodepth=32 --direct=1 --rw=write
--bs=4K --size=200MB --filename=/mnt/200M_data --numjobs=1
--name=cephfs_seqwrite_4K --output=cephfs_write_seq_4K.out
RBD Block
--ioengine can be either librbd or mount using libaio as default
ISCSI export
we can use it in windows or other OS that support iscsi
CephFS mount
mount.cephfs mon1,mon2,mon3:/ /mnt/
NFS/CIFS mount
mount.nfs nfs-server-ip:/ /mnt/	
19
Lttng Testing
Kernel Tracing make easy!
lttng create –o .
lttng enable-channel --num-subbuf 16 --subbuf-size 8M -k c0
lttng enable-event --kernel --all
lttng enable-event --syscall -a -k -c c0
lttng start
<do something>
lttng stop
lttng destroy	
20
Mixing with Salt
It is already in next version of DeepSea
Network:
salt-run net.ping exclude=<non-cluster-ip>
salt-run net.jumbo_ping exclude=<non-cluster-ip>
Bandwidth:
salt-run net.iperf cluster=ceph output=full
salt-run net.iperf exclude=<non-cluster-ip>
21
Experiment
https://github.com/AvengerMoJo/Ceph-Saltstack
Since some of those test could be intrusive to OSD it is not in DeepSea yet.
salt “*” ceph_sles.disk_info
salt “*” ceph_slse.bench_disk /dev/sdb /dev/sdc /dev/sde
( dd write to a mount point a partition can mount to )
22
Lttng ust user space tracing
Client-side tracing only with librbd
compile ceph librbd with using -finstrument-functions
compile ceph with -DWITH_LTTNG=ON
export LD_PRELOAD=/usr/lib64/liblttng-ust-cyg-profile.so
Calling rbd to do something < write >
Stop tracing and collect the result for reporting
Can use Salt to do a multiple nodes tracing and collect all the report for you
23
Collecting Reporting with Salt
Calling salt to start lttng in every nodes of the cluster
salt-run lttng.run
cmd="[fio --ioengine=libaio --iodepth=256 --direct=1
--rw=write --bs=4K --size=20MB
--filename=/mnt/cephfs/20M_data --numjobs=1
–name=write_4K --output=/tmp/cephfs/write_seq_4K.out]"
cmd_server=salt-master
Then collecting all the result back to the master.
	
24
Ceph Analysis – The What?
分析結是怎樣?
25
Example we use in the following hardware from SUSE
Enterprise Storage Partners
HPE - OEM
- Apollo – Apollo 4200 * 3
• osd with 18 hdd 2 ssd
- Proliant – DL380/DL360 * 3
• mon
Network Ping
Regular Ping:
ping –c1 –q –W1 (rtt higher then 1ms maybe something wrong)
--- 192.168.128.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.298/0.298/0.298/0.000 ms
Jumbo Frame:
ping –Mdo –s8972 –c1 –q –W1 ( double normal ping)
--- 192.168.128.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.649/0.649/0.649/0.000 ms
27
Network Bandwidth
Bandwidth: iperf3 –fm –A0 –t10 –c<server ip> -p<port>
Connecting to host 192.168.128.1, port 5201
[ 4] local 192.168.128.77 port 41008 connected to 192.168.128.1 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 119 MBytes 1002 Mbits/sec 0 368 KBytes
[ 4] 1.00-2.00 sec 118 MBytes 990 Mbits/sec 0 402 KBytes
[ 4] 2.00-3.00 sec 118 MBytes 992 Mbits/sec 0 420 KBytes
[ 4] 3.00-4.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes
[ 4] 4.00-5.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes
[ 4] 5.00-6.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes
[ 4] 6.00-7.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes
[ 4] 7.00-8.00 sec 118 MBytes 992 Mbits/sec 0 420 KBytes
[ 4] 8.00-9.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes
[ 4] 9.00-10.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.16 GBytes 992 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.15 GBytes 991 Mbits/sec receiver
	
 28
DeepSea Network Test
salt-run net.ping exclude=<non-cluster-ip>
Succeeded: 8 addresses from 7 minions average rtt 0.15 ms
salt-run net.jumbo_ping exclude=<non-cluster-ip>
Succeeded: 8 addresses from 7 minions average rtt 0.26 ms
salt-run net.iperf exclude=192.168.128.9 cluster=ceph output=full
	
29
192.168.128.1:
8644.0 Mbits/sec
192.168.128.2:
10360.0 Mbits/sec
192.168.128.3:
9336.0 Mbits/sec
	
192.168.128.4:
9588.56 Mbits/sec
192.168.128.5:
10187.0 Mbits/sec
192.168.128.6:
10465.0 Mbits/sec
Disk Direct Read Write
Read: hdparm –t
HDD: Timing buffered disk reads: 618 MB in 3.01 seconds = 205.58 MB/sec
SSD: Timing buffered disk reads: 1510 MB in 3.00 seconds = 503.25 MB/sec
Write: Direct dd right zero into disk mount point
/usr/bin/dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/test conv=fdatasync
bs=4K count=10000
10000+0 records in
10000+0 records out
HDD: 40960000 bytes (41 MB, 39 MiB) copied, 0.257385 s, 159 MB/s
SSD: 40960000 bytes (41 MB, 39 MiB) copied, 0.117944 s, 347 MB/s
With RAID Card Cache Enable
HDD: 40960000 bytes (41 MB, 39 MiB) copied, 0.038911 s, 1.1 GB/s
30
OSD Bench (without external communication)
OSD Bench (1G write with 4M block default)
ceph tell osd.<num> bench
SSD
{
"bytes_written": 1073741824, (1G)
"blocksize": 4194304, (4M)
"bytes_per_sec": 260063627 (248M)
}
RAID 0 HDD with cache
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"bytes_per_sec": 464233957 (442M)
}
31
RADOS Bench: rados –p hdd3 bench 20 write -t 32 –b 4096 --no-cleanup
Maintaining 32 concurrent writes of 4096 bytes to objects of size 4096 for up
to 20 seconds or 0 objects
2017-07-11 13:59:02.450884 min lat: 0.000861743 max lat: 0.193705 avg
lat: 0.0033328
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
20 9 191944 191935 37.4835 36.7539 0.00325612 0.0033328
Total time run: 20.008286
Total writes made: 191944
Write size: 4096
Object size: 4096
Bandwidth (MB/sec): 37.4735
Stddev Bandwidth: 2.88217
Max bandwidth (MB/sec): 44.332
Min bandwidth (MB/sec): 32.9258
Average IOPS: 9593
Stddev IOPS: 737
Max IOPS: 11349
Min IOPS: 8429
Average Latency(s): 0.00333329
Stddev Latency(s): 0.00611436
Max latency(s): 0.193705
Min latency(s): 0.000861743
	
Byte Size = 4M
Total time run: 20.186976
Total writes made: 4985
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 987.766
Stddev Bandwidth: 178.349
Max bandwidth (MB/sec): 1200
Min bandwidth (MB/sec): 596
Average IOPS: 246
Stddev IOPS: 44
Max IOPS: 300
Min IOPS: 149
Average Latency(s): 0.129235
Stddev Latency(s): 0.228253
Max latency(s): 6.50408
Min latency(s): 0.0230147
	
32
RBD Bench: rbd write with seq and rand
rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern seq
elapsed: 2 ops: 51200 ops/sec: 17947.54 bytes/sec: 73513129.85
rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern seq
elapsed: 2 ops: 51200 ops/sec: 22725.83 bytes/sec: 93085010.58
rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern rand
elapsed: 12 ops: 51200 ops/sec: 4010.68 bytes/sec: 16427739.46
rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern rand
elapsed: 13 ops: 51200 ops/sec: 3844.75 bytes/sec: 15748101.36
HDD with RAID card cache run faster in Sequence compare to Random	
33
RBD Feature enable vs disable
To enable all the features in rbd: striping, object-map, fast-diff, deep-flatten, journaling
rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern seq
elapsed: 2 ops: 51200 ops/sec: 17143.89 bytes/sec: 70221393.28
rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern seq
elapsed: 2 ops: 51200 ops/sec: 22173.81 bytes/sec: 90823913.76
rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern rand
elapsed: 13 ops: 51200 ops/sec: 3711.78 bytes/sec: 15203449.44
rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16
--io-total 209715200 --io-pattern rand
elapsed: 33 ops: 51200 ops/sec: 1508.59 bytes/sec: 6179179.33
HDD with RAID cache run much worst in Random	
	
34
RBD Block XFS mount hdd (cache) 4K IOPS benchmark
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=write --bs=4K --size=200MB
--filename=/mnt/hdd/200M_data --numjobs=1 –name=seq_write_4K
write: io=204800KB, bw=17667KB/s, iops=4416, runt= 11592msec
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randwrite --bs=4K --size=200MB
--filename=/mnt/hdd/200M_data --numjobs=1 --name=write_4K
write: io=204800KB, bw=104118KB/s, iops=26029, runt= 1967msec
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=read --bs=4K --size=200MB
--filename=/mnt/hdd/200M_data --numjobs=1 --name=read_4K
read : io=204800KB, bw=71235KB/s, iops=17808, runt= 2875msec
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randread --bs=4K --size=200MB
--filename=/mnt/hdd/200M_data --numjobs=1 --name=read_4K
read : io=204800KB, bw=228317KB/s, iops=57079, runt= 897msec
35
RBD Block XFS mount ssd 4K IOPS benchmark
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=write --bs=4K --size=200MB
--filename=/mnt/ssd/200M_data --numjobs=1 –name=seq_write_4K
write: io=204800KB, bw=16394KB/s, iops=4098, runt= 12492msec
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randwrite --bs=4K --size=200MB
--filename=/mnt/ssd/200M_data --numjobs=1 --name=write_4K
write: io=204800KB, bw=104757KB/s, iops=26189, runt= 1955msec
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=read --bs=4K --size=200MB
--filename=/mnt/ssd/200M_data --numjobs=1 --name=read_4K
read : io=204800KB, bw=90619KB/s, iops=22654, runt= 2260msec
fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randread --bs=4K --size=200MB
--filename=/mnt/ssd/200M_data --numjobs=1 --name=read_4K
read : io=204800KB, bw=302959KB/s, iops=75739, runt= 676msec
36
RBD Block XFS hdd (cache) 1M ThoughtPut benchmark
fio --ioengine=libaio --iodepth=16 --direct=1 --rw=write –bs=1M --size=4GB
--filename=/mnt/hdd/4G_data --numjobs=1 –name=seq_write_1M
write: io=4096.0MB, bw=1043.6MB/s, iops=1043, runt= 3925msec
fio --ioengine=libaio –iodepth=16 --direct=1 --rw=randwrite --bs=1M --size=4GB
--filename=/mnt/hdd/4G_data --numjobs=1 --name=write_1M
write: io=4096.0MB, bw=1146.6MB/s, iops=1146, runt= 3574msec
fio --ioengine=libaio --iodepth=16 --direct=1 --rw=read --bs=1M –size=4GB
--filename=/mnt/hdd/4G_data --numjobs=1 --name=read_1M
read : io=4096.0MB, bw=316790KB/s, iops=309, runt= 13240msec
fio --ioengine=libaio --iodepth=16 --direct=1 --rw=randread --bs=1M –size=4GB
--filename=/mnt/hdd/4G_data --numjobs=1 --name=read_1M
read : io=4096.0MB, bw=183647KB/s, iops=179, runt= 22839msec
37
RBD Block XFS ssd 1M ThoughtPut benchmark
fio --ioengine=libaio --iodepth=16 --direct=1 --rw=write --bs=1M --size=4GB
--filename=/mnt/ssd/4G_data --numjobs=1 –name=seq_write_4G
write: io=4096.0MB, bw=540225KB/s, iops=527, runt= 7764msec
fio --ioengine=libaio –iodepth=16 --direct=1 --rw=randwrite --bs=1M --size=4GB
--filename=/mnt/ssd/4G_data --numjobs=1 --name=write_4G
write: io=4096.0MB, bw=554142KB/s, iops=541, runt= 7569msec
fio --ioengine=libaio --iodepth=16 --direct=1 --rw=read --bs=1M --size=4GB
--filename=/mnt/ssd/4G_data --numjobs=1 --name=read_4G
read : io=4096.0MB, bw=96998KB/s, iops=94, runt= 43241msec
fio --ioengine=libaio --iodepth=16 --direct=1 --rw=randread --bs=1M --size=4GB
--filename=/mnt/ssd/4G_data --numjobs=1 --name=read_4G
read : io=4096.0MB, bw=113158KB/s, iops=110, runt= 37066msec
38
Lttng tracing regular RBD, Journaling enable RBD
Default:
SEC OPS OPS/SEC BYTES/SEC
elapsed: 0 ops: 1 ops/sec: 23.64 bytes/sec: 94.56
Journaling:
bench type write io_size 4 io_threads 1 bytes 4 pattern random
SEC OPS OPS/SEC BYTES/SEC
elapsed: 0 ops: 1 ops/sec: 9.54 bytes/sec: 38.14
39
Using Tracecompress
Eclipse base UI, easy to use, cross platform (java)
http://tracecompass.org/	
40
Regular Layering RBD “tp_librbd”
41
RBD with remote Journaling “tp_librbd_journ”
42
https://docs.google.com/drawings/d/1s9pRVygdPzBtQyYrvPueF8Cm6Xi1d4oy5gNMOjx2Wzo
43
[…]	
 Waiting on lock() 	
AioCompletion()	
 Aio_write2()	
rbd	
Copy bufferlist	
 aio_write()	
ImageWriteRequest()
::queue()
tp_librbd	
ThreadPoolPointer::void_process()	
ImageWriteRequest()
append_journal_event()
Send_object_cache_request ()
https://docs.google.com/drawings/d/1s9pRVygdPzBtQyYrvPueF8Cm6Xi1d4oy5gNMOjx2Wzo
44
Waiting on lock() 	
tp_librbd	
ThreadPoolPointer::void_process()	
ImageWriteRequest()
append_journal_event()
Send_object_cache_request ()
_void_process()
FunctionContext::finish()
ObjectWriteOperation::append()
IoCtx::aio_operate()
Waiting on lock() 	
Context::complete()	
fn_anonymous	
tp_librbd_journ
Let’s look at real trace
45
The Future?
以後的展望?
46
The Future
•  Setup partner with production hardware in community that can run the non-
intrusive test case in regular bases to compare with different ceph cluster.
•  Include more client side multiple nodes tracing ( container / cloud ready )
•  Filestore and Bluestore with detail features on and off
•  More in-depth ARM AArch64 ceph performance report
•  Upstream is already doing a lot of recovery related to timing issue
•  Chinese Translation related to performance analysis tools and method
•  提供服務給社區運作中的叢集非破壞式的測試方法, 給大家參考和對比差異
•  提供更完整的多點跟蹤功能 (容器內/雲內)
•  針對Filestore和Bluestore做更深人的測試,觀察打開不同功能後的性能比較
•  再深人去了解ARM64運作ceph時的性能報告
•  上游正著手觀察Recovery有關的不同選項做分析
•  翻譯有關這麼多不同性能測試工具和方法的文檔
47
硬體調優
•  使用 SSD 或者 NVMe SSD 作 Journal	
•  MON Node 數據放置在 SSD 上	
•  HDD 硬碟控制器修改為 write-back 模式	
•  設定CPU為 performance 模式
•  設定CPU為HyperThread 模式	
	
	
 48
OS 調優
•  網絡設定: 使用 jumbo frame, 配置 MTU	
•  IO調整:deadline or noop
•  NUMA 設定: 注意按 socket 分配 IRQs 到不同CPU上
•  iSCSI 性能參數調整	
•  内核參數調整	
49
Ceph調優
一般原則
•  根据應用負載進行性能調整
•  使用不同的 pools設定一定的 pg 數	
•  性能測試時 停用 OSD scrub 	
•  性能由最差的硬碟決定 – 定位每個硬碟的性能	
•  禁用debug信息	
•  禁用認証功能	
•  如果RAID控制器有電源保護:
•  osd mount options xfs = nobarrier, rw, noatime, inode64, logbufs=8
50
Questions and Answers
52
257-000041-001

Weitere ähnliche Inhalte

Was ist angesagt?

Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuCeph Community
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Community
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
MySQL Head to Head Performance
MySQL Head to Head PerformanceMySQL Head to Head Performance
MySQL Head to Head PerformanceKyle Bader
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraCeph Community
 
Ceph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage WeilCeph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage WeilCeph Community
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ian Colle
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red_Hat_Storage
 
Ceph Performance Profiling and Reporting
Ceph Performance Profiling and ReportingCeph Performance Profiling and Reporting
Ceph Performance Profiling and ReportingCeph Community
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph Community
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph SolutionsRed_Hat_Storage
 
2015 open storage workshop ceph software defined storage
2015 open storage workshop   ceph software defined storage2015 open storage workshop   ceph software defined storage
2015 open storage workshop ceph software defined storageAndrew Underwood
 
Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph Community
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red_Hat_Storage
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Community
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCCeph Community
 

Was ist angesagt? (20)

Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community Update
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
MySQL Head to Head Performance
MySQL Head to Head PerformanceMySQL Head to Head Performance
MySQL Head to Head Performance
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Ceph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage WeilCeph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage Weil
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Ceph Performance Profiling and Reporting
Ceph Performance Profiling and ReportingCeph Performance Profiling and Reporting
Ceph Performance Profiling and Reporting
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph Solutions
 
2015 open storage workshop ceph software defined storage
2015 open storage workshop   ceph software defined storage2015 open storage workshop   ceph software defined storage
2015 open storage workshop ceph software defined storage
 
Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance Barriers
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 

Ähnlich wie SUSE - performance analysis-with_ceph

Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_cephAlex Lau
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...Rakuten Group, Inc.
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSCeph Community
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputePatrick McGarry
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightRed_Hat_Storage
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…Sergey Dzyuban
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Common Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementCommon Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementMatt Ray
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Community
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephDanielle Womboldt
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuAzure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuMarco Obinu
 

Ähnlich wie SUSE - performance analysis-with_ceph (20)

Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_ceph
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Common Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementCommon Challenges in DevOps Change Management
Common Challenges in DevOps Change Management
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuAzure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
 

Mehr von inwin stack

Migrating to Cloud Native Solutions
Migrating to Cloud Native SolutionsMigrating to Cloud Native Solutions
Migrating to Cloud Native Solutionsinwin stack
 
Cloud Native 下的應用網路設計
Cloud Native 下的應用網路設計Cloud Native 下的應用網路設計
Cloud Native 下的應用網路設計inwin stack
 
當電子發票遇見 Google Cloud Function
當電子發票遇見 Google Cloud Function當電子發票遇見 Google Cloud Function
當電子發票遇見 Google Cloud Functioninwin stack
 
運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發inwin stack
 
The last mile of digital transformation AI大眾化:數位轉型的最後一哩
The last mile of digital transformation AI大眾化:數位轉型的最後一哩The last mile of digital transformation AI大眾化:數位轉型的最後一哩
The last mile of digital transformation AI大眾化:數位轉型的最後一哩inwin stack
 
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案inwin stack
 
An Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native JourneyAn Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native Journeyinwin stack
 
維運Kubernetes的兩三事
維運Kubernetes的兩三事維運Kubernetes的兩三事
維運Kubernetes的兩三事inwin stack
 
Serverless framework on kubernetes
Serverless framework on kubernetesServerless framework on kubernetes
Serverless framework on kubernetesinwin stack
 
Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】inwin stack
 
Web後端技術的演變
Web後端技術的演變Web後端技術的演變
Web後端技術的演變inwin stack
 
以 Kubernetes 部屬 Spark 大數據計算環境
以 Kubernetes 部屬 Spark 大數據計算環境以 Kubernetes 部屬 Spark 大數據計算環境
以 Kubernetes 部屬 Spark 大數據計算環境inwin stack
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federationinwin stack
 
基於 K8S 開發的 FaaS 專案 - riff
基於 K8S 開發的 FaaS 專案 - riff基於 K8S 開發的 FaaS 專案 - riff
基於 K8S 開發的 FaaS 專案 - riffinwin stack
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster inwin stack
 
Extend the Kubernetes API with CRD and Custom API Server
Extend the Kubernetes API with CRD and Custom API ServerExtend the Kubernetes API with CRD and Custom API Server
Extend the Kubernetes API with CRD and Custom API Serverinwin stack
 
利用K8S實現高可靠應用
利用K8S實現高可靠應用利用K8S實現高可靠應用
利用K8S實現高可靠應用inwin stack
 
Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)
Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)
Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)inwin stack
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetesinwin stack
 
Build your own kubernetes apiserver and resource type
Build your own kubernetes apiserver and resource typeBuild your own kubernetes apiserver and resource type
Build your own kubernetes apiserver and resource typeinwin stack
 

Mehr von inwin stack (20)

Migrating to Cloud Native Solutions
Migrating to Cloud Native SolutionsMigrating to Cloud Native Solutions
Migrating to Cloud Native Solutions
 
Cloud Native 下的應用網路設計
Cloud Native 下的應用網路設計Cloud Native 下的應用網路設計
Cloud Native 下的應用網路設計
 
當電子發票遇見 Google Cloud Function
當電子發票遇見 Google Cloud Function當電子發票遇見 Google Cloud Function
當電子發票遇見 Google Cloud Function
 
運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發
 
The last mile of digital transformation AI大眾化:數位轉型的最後一哩
The last mile of digital transformation AI大眾化:數位轉型的最後一哩The last mile of digital transformation AI大眾化:數位轉型的最後一哩
The last mile of digital transformation AI大眾化:數位轉型的最後一哩
 
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
 
An Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native JourneyAn Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native Journey
 
維運Kubernetes的兩三事
維運Kubernetes的兩三事維運Kubernetes的兩三事
維運Kubernetes的兩三事
 
Serverless framework on kubernetes
Serverless framework on kubernetesServerless framework on kubernetes
Serverless framework on kubernetes
 
Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】
 
Web後端技術的演變
Web後端技術的演變Web後端技術的演變
Web後端技術的演變
 
以 Kubernetes 部屬 Spark 大數據計算環境
以 Kubernetes 部屬 Spark 大數據計算環境以 Kubernetes 部屬 Spark 大數據計算環境
以 Kubernetes 部屬 Spark 大數據計算環境
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federation
 
基於 K8S 開發的 FaaS 專案 - riff
基於 K8S 開發的 FaaS 專案 - riff基於 K8S 開發的 FaaS 專案 - riff
基於 K8S 開發的 FaaS 專案 - riff
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster
 
Extend the Kubernetes API with CRD and Custom API Server
Extend the Kubernetes API with CRD and Custom API ServerExtend the Kubernetes API with CRD and Custom API Server
Extend the Kubernetes API with CRD and Custom API Server
 
利用K8S實現高可靠應用
利用K8S實現高可靠應用利用K8S實現高可靠應用
利用K8S實現高可靠應用
 
Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)
Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)
Integrate Kubernetes into CORD(Central Office Re-architected as a Datacenter)
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetes
 
Build your own kubernetes apiserver and resource type
Build your own kubernetes apiserver and resource typeBuild your own kubernetes apiserver and resource type
Build your own kubernetes apiserver and resource type
 

Kürzlich hochgeladen

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

SUSE - performance analysis-with_ceph

  • 1. Performance Analysis with ceph 雲儲存性能分析 Alex Lau 劉俊賢 Software Consultant 研發工程師/顧問 (AvengerMoJo) alau@suse.com
  • 2. Agenda 議程 SES5 is base on Luminous – The Why? 為何分析性能? Ceph performance – The How? 如何分析性能? Ceph analysis – The What? 分析結果是怎樣? The Future? 大家有什麼展望?
  • 3. SES5 is base on Luminous – The Why? 為何分析性能?
  • 4. SUSE Enterprise Storage 5 base on Ceph Luminous 4 •  Support x86-64 and AArch64 •  Easy to use WebUI openATTIC 3.x •  Simple DeepSea Cluster orchestration •  BlueStore ready with data compression •  Cephfs and NFS-Ganesha ready
  • 5. Ceph Info 基本資料 Code Developers 782 Core Regular Casual 22 53 705 Total downloads 160,015,454 Unique downloads 21,264,047 Client Servers (Windows, Linux, Unix) RADOS (Common Object Store) Block Devices Server Object Storage File Interface Storage Server Storage Server Storage Server Storage Server Server Server Storage Server Storage Server Applications File Share OSD OSD OSD OSD OSD OSD NetworkCluster MON RBD iSCSI S3 SWIFT CephFS* Monitors MONMON 5
  • 6. SUSE Enterprise Storage其實做了什麼? 基於Ceph 2013: SUSE 加入 Ceph 社區 2015.01 : SUSE Enterprise Storage 1.0 2015.11: SUSE Enterprise Storage 2.0 2016.01: SUSE Enterprise Storage 2.1 2016.07: SUSE Enterprise Storage 3.0 2016: SUSE 收购 IT-Novum(主要開發存儲管理工具 openATTIC) 2016.09: SUSE 合併 HPE 軟體 2016.11: SUSE Enterprise Storage 4.0 2017.09: SUSE Enterprise Storage 5.0 (Beta Now) 6
  • 7. 7 SUSE Enterprise Storage Roadmap 2016 2017 2018 V3 V4 V5 SUSE Enterprise Storage 4 SUSE Enterprise Storage 5* Built On •  Ceph Jewel release •  SLES 12 SP 2 (Server) Manageability •  SES openATTIC management •  Initial Salt integration DeepSea Interoperability •  AArch64 •  CephFS (production use cases) •  NFS Ganesha (Tech Preview) •  NFS access to S3 buckets (Tech Preview) •  CIFS Samba (Tech Preview) •  RDMA/Infiniband (Tech Preview) Availability •  Multisite object replication •  Asynchronous block mirroring Built On •  Ceph Luminous release •  SLES 12 SP 3 (Server) Manageability •  SES openATTIC management phase 2 •  SUSE Manager integration Interoperability •  NFS Ganesha •  NFS access to S3 buckets Availability •  Asynchronous block mirroring •  Erasure coded block pool Efficiency •  BlueStore back-end •  Data compression
  • 8. SUSE 和 Ceph社區的關系 2013年建立Ceph開發團隊,正式加入Ceph的代碼支持 SUSE 是 Ceph理事會8大理事會員之一 一直以來代碼貢獻頭3名, 上次v12 Luminous發報第2多, 單個人第1 Ricardo 1.  10648 Ricardo Dias rdias@suse.com 2.  6422 Sage Weil sweil@redhat.com 有收到 Inktank 和 Mirantis 開發人員 加大研發投入: 2016年中收購存儲管理工具: it-novum (openATTIC) 8
  • 9. Customer needs! 客戶的需要! •  How high do we need? •  How much do we want to pay? •  Are we counting 4K Random Write? •  Do we need Erasure Coding? •  Can we have the number on BlueStore vs FileStore? •  What about cephfs over nfs? •  …… What’s our cluster IOPS? •  我們需要多高的IOPS? •  我們想花多少錢? •  我們只需要看4K隨機寫? •  我們有糾刪碼的需要嗎? •  我們能看看BlueStore對比FileStore嗎? •  我們能看看cephfs上的NFS能怎樣? •  ……. 我們的存儲叢集能跑多少IOPS? 9
  • 10. System Engineer needs! 售前售後工程師的需要! Putting them here is already show how much we care! 10
  • 11. 11
  • 12. The DevOps Needs!開發運維的需要! Can we have some base line number? Can we have tools to re-do performance analysis? BTW we need all that to our live cluster! 能否告訴正常時叢集該跑多快? 能否每天再測試報告叢集現在的性能? 不是分開測而是實際運作中的叢集性能哦! 12
  • 13. The Developers Needs! 開發者的需要! How much is BlueStore vs Filestore 4K, 16K, 64K, 1M, 4M write seq, random, read write performance in hdd vs ssd vs nvme pure and mixed pool, when an OSD is down, and the recovery process is started and meanwhile tracking how much memory needed when the capacity are 50% full without stopping the client keep writing to the cluster…. In short … very complicated analysis …. 不寫了.. 很多很多要求就是. p.s. 大家有硬體提供我們測試請找我, 會後我請你… J 13
  • 14. The Community Project Need! 社區的性能有關項目! Ceph Brag •  It require to use CBT - https://github.com/ceph/cbt •  Setting up the CBT environment could be complicated •  It require another set of multiple nodes setup •  It is intrusive test that can’t be operate in a live cluster •  http://ceph.com/performance/ Ceph Brag •  CBT是現在測試的標準 - https://github.com/ceph/cbt •  設定CBT環境並不簡單 •  需要多台機器誰行測試 •  影響到內容不能向運作中的系統做測試 •  每週的性能有關會 - http://ceph.com/performance/ 14
  • 15. The Goal •  Discover hardware, driver issue from time to time to check for degrade. •  Intrusive / Non-intrusive performance testing to do before and after the cluster being setup •  Able to perform to live and production cluster without stopping operation •  Low dependence, flexible, open source, and able to share eventually and align wth Ceph Brag •  Dynamic using different tools to do monitoring and tracing but able to orchestrated by a central admin that fit the container and cloud story •  可以不定時發現硬體或驅動出現問題 •  在叢集設定前可以具破壞性的測試但在運 作中的叢集可以用一樣的工具去做同類但 不具破壞性的做法 •  測試可以在正在運作中的系統上進行而不 影響一般運作 •  不用依賴太多三方面的需要, 可以靈活使 用, 開源, 最終可以和Ceph Brag共享一樣 的發報方法 •  可以使用不同的管理和跟蹤工具, 同時可 以利用中央管理工具預備可以和容器和雲 管理整合 15
  • 16. Ceph Performance – The How? 如何分析性能? 16
  • 17. Hardware Testing Network: ping –c1 –q –W1 Specific for Jumbo Frame ping –Mdo –s8972 –c1 –q –W1 Bandwidth: iperf3 –fm –A0 –t10 –c<server ip> -p<port> Harddisk: Read: hdparm –t <osd_partition> Write: dd if=/dev/zero of=`mount <osd_data_partition>/test/ conv=fdatasync bs=4K count=10000 17
  • 18. Cluster Testing OSD Bench (1G write with 4M block default) ceph tell osd.0 bench RADOS Bench rados –p <pool> <time> <write,seq,rand> -t <thread> --no-cleanup RBD bench rbd –p <pool> bench-write <image> --io-size <e.g 4096> --io-threads <1,4,16 etc> --io-total<total size e.g.209715200> --io-pattern <rand | seq> New version has io-type rbd -p <pool> bench --io-type read rbd_test --io-size 4096 --io-threads 1 --io-total 209715200 --io-pattern rand 18
  • 19. Client Testing FIO can be use against any of the following: fio --ioengine=libaio --iodepth=32 --direct=1 --rw=write --bs=4K --size=200MB --filename=/mnt/200M_data --numjobs=1 --name=cephfs_seqwrite_4K --output=cephfs_write_seq_4K.out RBD Block --ioengine can be either librbd or mount using libaio as default ISCSI export we can use it in windows or other OS that support iscsi CephFS mount mount.cephfs mon1,mon2,mon3:/ /mnt/ NFS/CIFS mount mount.nfs nfs-server-ip:/ /mnt/ 19
  • 20. Lttng Testing Kernel Tracing make easy! lttng create –o . lttng enable-channel --num-subbuf 16 --subbuf-size 8M -k c0 lttng enable-event --kernel --all lttng enable-event --syscall -a -k -c c0 lttng start <do something> lttng stop lttng destroy 20
  • 21. Mixing with Salt It is already in next version of DeepSea Network: salt-run net.ping exclude=<non-cluster-ip> salt-run net.jumbo_ping exclude=<non-cluster-ip> Bandwidth: salt-run net.iperf cluster=ceph output=full salt-run net.iperf exclude=<non-cluster-ip> 21
  • 22. Experiment https://github.com/AvengerMoJo/Ceph-Saltstack Since some of those test could be intrusive to OSD it is not in DeepSea yet. salt “*” ceph_sles.disk_info salt “*” ceph_slse.bench_disk /dev/sdb /dev/sdc /dev/sde ( dd write to a mount point a partition can mount to ) 22
  • 23. Lttng ust user space tracing Client-side tracing only with librbd compile ceph librbd with using -finstrument-functions compile ceph with -DWITH_LTTNG=ON export LD_PRELOAD=/usr/lib64/liblttng-ust-cyg-profile.so Calling rbd to do something < write > Stop tracing and collect the result for reporting Can use Salt to do a multiple nodes tracing and collect all the report for you 23
  • 24. Collecting Reporting with Salt Calling salt to start lttng in every nodes of the cluster salt-run lttng.run cmd="[fio --ioengine=libaio --iodepth=256 --direct=1 --rw=write --bs=4K --size=20MB --filename=/mnt/cephfs/20M_data --numjobs=1 –name=write_4K --output=/tmp/cephfs/write_seq_4K.out]" cmd_server=salt-master Then collecting all the result back to the master. 24
  • 25. Ceph Analysis – The What? 分析結是怎樣? 25
  • 26. Example we use in the following hardware from SUSE Enterprise Storage Partners HPE - OEM - Apollo – Apollo 4200 * 3 • osd with 18 hdd 2 ssd - Proliant – DL380/DL360 * 3 • mon
  • 27. Network Ping Regular Ping: ping –c1 –q –W1 (rtt higher then 1ms maybe something wrong) --- 192.168.128.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.298/0.298/0.298/0.000 ms Jumbo Frame: ping –Mdo –s8972 –c1 –q –W1 ( double normal ping) --- 192.168.128.5 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.649/0.649/0.649/0.000 ms 27
  • 28. Network Bandwidth Bandwidth: iperf3 –fm –A0 –t10 –c<server ip> -p<port> Connecting to host 192.168.128.1, port 5201 [ 4] local 192.168.128.77 port 41008 connected to 192.168.128.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 119 MBytes 1002 Mbits/sec 0 368 KBytes [ 4] 1.00-2.00 sec 118 MBytes 990 Mbits/sec 0 402 KBytes [ 4] 2.00-3.00 sec 118 MBytes 992 Mbits/sec 0 420 KBytes [ 4] 3.00-4.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes [ 4] 4.00-5.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes [ 4] 5.00-6.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes [ 4] 6.00-7.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes [ 4] 7.00-8.00 sec 118 MBytes 992 Mbits/sec 0 420 KBytes [ 4] 8.00-9.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes [ 4] 9.00-10.00 sec 118 MBytes 991 Mbits/sec 0 420 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.16 GBytes 992 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 1.15 GBytes 991 Mbits/sec receiver 28
  • 29. DeepSea Network Test salt-run net.ping exclude=<non-cluster-ip> Succeeded: 8 addresses from 7 minions average rtt 0.15 ms salt-run net.jumbo_ping exclude=<non-cluster-ip> Succeeded: 8 addresses from 7 minions average rtt 0.26 ms salt-run net.iperf exclude=192.168.128.9 cluster=ceph output=full 29 192.168.128.1: 8644.0 Mbits/sec 192.168.128.2: 10360.0 Mbits/sec 192.168.128.3: 9336.0 Mbits/sec 192.168.128.4: 9588.56 Mbits/sec 192.168.128.5: 10187.0 Mbits/sec 192.168.128.6: 10465.0 Mbits/sec
  • 30. Disk Direct Read Write Read: hdparm –t HDD: Timing buffered disk reads: 618 MB in 3.01 seconds = 205.58 MB/sec SSD: Timing buffered disk reads: 1510 MB in 3.00 seconds = 503.25 MB/sec Write: Direct dd right zero into disk mount point /usr/bin/dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/test conv=fdatasync bs=4K count=10000 10000+0 records in 10000+0 records out HDD: 40960000 bytes (41 MB, 39 MiB) copied, 0.257385 s, 159 MB/s SSD: 40960000 bytes (41 MB, 39 MiB) copied, 0.117944 s, 347 MB/s With RAID Card Cache Enable HDD: 40960000 bytes (41 MB, 39 MiB) copied, 0.038911 s, 1.1 GB/s 30
  • 31. OSD Bench (without external communication) OSD Bench (1G write with 4M block default) ceph tell osd.<num> bench SSD { "bytes_written": 1073741824, (1G) "blocksize": 4194304, (4M) "bytes_per_sec": 260063627 (248M) } RAID 0 HDD with cache { "bytes_written": 1073741824, "blocksize": 4194304, "bytes_per_sec": 464233957 (442M) } 31
  • 32. RADOS Bench: rados –p hdd3 bench 20 write -t 32 –b 4096 --no-cleanup Maintaining 32 concurrent writes of 4096 bytes to objects of size 4096 for up to 20 seconds or 0 objects 2017-07-11 13:59:02.450884 min lat: 0.000861743 max lat: 0.193705 avg lat: 0.0033328 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 20 9 191944 191935 37.4835 36.7539 0.00325612 0.0033328 Total time run: 20.008286 Total writes made: 191944 Write size: 4096 Object size: 4096 Bandwidth (MB/sec): 37.4735 Stddev Bandwidth: 2.88217 Max bandwidth (MB/sec): 44.332 Min bandwidth (MB/sec): 32.9258 Average IOPS: 9593 Stddev IOPS: 737 Max IOPS: 11349 Min IOPS: 8429 Average Latency(s): 0.00333329 Stddev Latency(s): 0.00611436 Max latency(s): 0.193705 Min latency(s): 0.000861743 Byte Size = 4M Total time run: 20.186976 Total writes made: 4985 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 987.766 Stddev Bandwidth: 178.349 Max bandwidth (MB/sec): 1200 Min bandwidth (MB/sec): 596 Average IOPS: 246 Stddev IOPS: 44 Max IOPS: 300 Min IOPS: 149 Average Latency(s): 0.129235 Stddev Latency(s): 0.228253 Max latency(s): 6.50408 Min latency(s): 0.0230147 32
  • 33. RBD Bench: rbd write with seq and rand rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern seq elapsed: 2 ops: 51200 ops/sec: 17947.54 bytes/sec: 73513129.85 rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern seq elapsed: 2 ops: 51200 ops/sec: 22725.83 bytes/sec: 93085010.58 rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern rand elapsed: 12 ops: 51200 ops/sec: 4010.68 bytes/sec: 16427739.46 rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern rand elapsed: 13 ops: 51200 ops/sec: 3844.75 bytes/sec: 15748101.36 HDD with RAID card cache run faster in Sequence compare to Random 33
  • 34. RBD Feature enable vs disable To enable all the features in rbd: striping, object-map, fast-diff, deep-flatten, journaling rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern seq elapsed: 2 ops: 51200 ops/sec: 17143.89 bytes/sec: 70221393.28 rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern seq elapsed: 2 ops: 51200 ops/sec: 22173.81 bytes/sec: 90823913.76 rbd –p ssd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern rand elapsed: 13 ops: 51200 ops/sec: 3711.78 bytes/sec: 15203449.44 rbd –p hdd bench-write rbd_test --io-size 4096 --io-threads 16 --io-total 209715200 --io-pattern rand elapsed: 33 ops: 51200 ops/sec: 1508.59 bytes/sec: 6179179.33 HDD with RAID cache run much worst in Random 34
  • 35. RBD Block XFS mount hdd (cache) 4K IOPS benchmark fio --ioengine=libaio --iodepth=256 --direct=1 --rw=write --bs=4K --size=200MB --filename=/mnt/hdd/200M_data --numjobs=1 –name=seq_write_4K write: io=204800KB, bw=17667KB/s, iops=4416, runt= 11592msec fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randwrite --bs=4K --size=200MB --filename=/mnt/hdd/200M_data --numjobs=1 --name=write_4K write: io=204800KB, bw=104118KB/s, iops=26029, runt= 1967msec fio --ioengine=libaio --iodepth=256 --direct=1 --rw=read --bs=4K --size=200MB --filename=/mnt/hdd/200M_data --numjobs=1 --name=read_4K read : io=204800KB, bw=71235KB/s, iops=17808, runt= 2875msec fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randread --bs=4K --size=200MB --filename=/mnt/hdd/200M_data --numjobs=1 --name=read_4K read : io=204800KB, bw=228317KB/s, iops=57079, runt= 897msec 35
  • 36. RBD Block XFS mount ssd 4K IOPS benchmark fio --ioengine=libaio --iodepth=256 --direct=1 --rw=write --bs=4K --size=200MB --filename=/mnt/ssd/200M_data --numjobs=1 –name=seq_write_4K write: io=204800KB, bw=16394KB/s, iops=4098, runt= 12492msec fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randwrite --bs=4K --size=200MB --filename=/mnt/ssd/200M_data --numjobs=1 --name=write_4K write: io=204800KB, bw=104757KB/s, iops=26189, runt= 1955msec fio --ioengine=libaio --iodepth=256 --direct=1 --rw=read --bs=4K --size=200MB --filename=/mnt/ssd/200M_data --numjobs=1 --name=read_4K read : io=204800KB, bw=90619KB/s, iops=22654, runt= 2260msec fio --ioengine=libaio --iodepth=256 --direct=1 --rw=randread --bs=4K --size=200MB --filename=/mnt/ssd/200M_data --numjobs=1 --name=read_4K read : io=204800KB, bw=302959KB/s, iops=75739, runt= 676msec 36
  • 37. RBD Block XFS hdd (cache) 1M ThoughtPut benchmark fio --ioengine=libaio --iodepth=16 --direct=1 --rw=write –bs=1M --size=4GB --filename=/mnt/hdd/4G_data --numjobs=1 –name=seq_write_1M write: io=4096.0MB, bw=1043.6MB/s, iops=1043, runt= 3925msec fio --ioengine=libaio –iodepth=16 --direct=1 --rw=randwrite --bs=1M --size=4GB --filename=/mnt/hdd/4G_data --numjobs=1 --name=write_1M write: io=4096.0MB, bw=1146.6MB/s, iops=1146, runt= 3574msec fio --ioengine=libaio --iodepth=16 --direct=1 --rw=read --bs=1M –size=4GB --filename=/mnt/hdd/4G_data --numjobs=1 --name=read_1M read : io=4096.0MB, bw=316790KB/s, iops=309, runt= 13240msec fio --ioengine=libaio --iodepth=16 --direct=1 --rw=randread --bs=1M –size=4GB --filename=/mnt/hdd/4G_data --numjobs=1 --name=read_1M read : io=4096.0MB, bw=183647KB/s, iops=179, runt= 22839msec 37
  • 38. RBD Block XFS ssd 1M ThoughtPut benchmark fio --ioengine=libaio --iodepth=16 --direct=1 --rw=write --bs=1M --size=4GB --filename=/mnt/ssd/4G_data --numjobs=1 –name=seq_write_4G write: io=4096.0MB, bw=540225KB/s, iops=527, runt= 7764msec fio --ioengine=libaio –iodepth=16 --direct=1 --rw=randwrite --bs=1M --size=4GB --filename=/mnt/ssd/4G_data --numjobs=1 --name=write_4G write: io=4096.0MB, bw=554142KB/s, iops=541, runt= 7569msec fio --ioengine=libaio --iodepth=16 --direct=1 --rw=read --bs=1M --size=4GB --filename=/mnt/ssd/4G_data --numjobs=1 --name=read_4G read : io=4096.0MB, bw=96998KB/s, iops=94, runt= 43241msec fio --ioengine=libaio --iodepth=16 --direct=1 --rw=randread --bs=1M --size=4GB --filename=/mnt/ssd/4G_data --numjobs=1 --name=read_4G read : io=4096.0MB, bw=113158KB/s, iops=110, runt= 37066msec 38
  • 39. Lttng tracing regular RBD, Journaling enable RBD Default: SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 1 ops/sec: 23.64 bytes/sec: 94.56 Journaling: bench type write io_size 4 io_threads 1 bytes 4 pattern random SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 1 ops/sec: 9.54 bytes/sec: 38.14 39
  • 40. Using Tracecompress Eclipse base UI, easy to use, cross platform (java) http://tracecompass.org/ 40
  • 41. Regular Layering RBD “tp_librbd” 41
  • 42. RBD with remote Journaling “tp_librbd_journ” 42
  • 43. https://docs.google.com/drawings/d/1s9pRVygdPzBtQyYrvPueF8Cm6Xi1d4oy5gNMOjx2Wzo 43 […] Waiting on lock() AioCompletion() Aio_write2() rbd Copy bufferlist aio_write() ImageWriteRequest() ::queue() tp_librbd ThreadPoolPointer::void_process() ImageWriteRequest() append_journal_event() Send_object_cache_request ()
  • 44. https://docs.google.com/drawings/d/1s9pRVygdPzBtQyYrvPueF8Cm6Xi1d4oy5gNMOjx2Wzo 44 Waiting on lock() tp_librbd ThreadPoolPointer::void_process() ImageWriteRequest() append_journal_event() Send_object_cache_request () _void_process() FunctionContext::finish() ObjectWriteOperation::append() IoCtx::aio_operate() Waiting on lock() Context::complete() fn_anonymous tp_librbd_journ
  • 45. Let’s look at real trace 45
  • 47. The Future •  Setup partner with production hardware in community that can run the non- intrusive test case in regular bases to compare with different ceph cluster. •  Include more client side multiple nodes tracing ( container / cloud ready ) •  Filestore and Bluestore with detail features on and off •  More in-depth ARM AArch64 ceph performance report •  Upstream is already doing a lot of recovery related to timing issue •  Chinese Translation related to performance analysis tools and method •  提供服務給社區運作中的叢集非破壞式的測試方法, 給大家參考和對比差異 •  提供更完整的多點跟蹤功能 (容器內/雲內) •  針對Filestore和Bluestore做更深人的測試,觀察打開不同功能後的性能比較 •  再深人去了解ARM64運作ceph時的性能報告 •  上游正著手觀察Recovery有關的不同選項做分析 •  翻譯有關這麼多不同性能測試工具和方法的文檔 47
  • 48. 硬體調優 •  使用 SSD 或者 NVMe SSD 作 Journal •  MON Node 數據放置在 SSD 上 •  HDD 硬碟控制器修改為 write-back 模式 •  設定CPU為 performance 模式 •  設定CPU為HyperThread 模式 48
  • 49. OS 調優 •  網絡設定: 使用 jumbo frame, 配置 MTU •  IO調整:deadline or noop •  NUMA 設定: 注意按 socket 分配 IRQs 到不同CPU上 •  iSCSI 性能參數調整 •  内核參數調整 49
  • 50. Ceph調優 一般原則 •  根据應用負載進行性能調整 •  使用不同的 pools設定一定的 pg 數 •  性能測試時 停用 OSD scrub •  性能由最差的硬碟決定 – 定位每個硬碟的性能 •  禁用debug信息 •  禁用認証功能 •  如果RAID控制器有電源保護: •  osd mount options xfs = nobarrier, rw, noatime, inode64, logbufs=8 50
  • 52. 52