SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
오픈인프라데이
Ceph issue 사례
2019.07.11
오픈소스컨설팅 이 영주
( yjlee@osci.kr )
Contents
01. 구성도
02. Issue 발생
03. 해결 과정
구성도
01
01. 구성도
●
전체 구성도
Controller Node
Compute NodeStorage Node
Deploy
FireWall
Router
5/26
01. 구성도
●
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
6/26
01. 구성도
●
Ceph OBJ 흐름
PG: Placement Group
Object를 저장하기 위한 OSD의 group.
복제본 수에 맞춰 member의 수가 달라짐.
OSD: Object Storage Daemon
object를 최종 저장하는 곳
Monitor: ceph OSD의 
변화를  monitoring
하여  crush map을 
만드는 주체 주체.
[root@ceph-osd01 ~]# rados ls -p vms
rbd_data.1735e637a64d5.0000000000000000
rbd_header.1735e637a64d5
rbd_directory
rbd_children
rbd_info
rbd_data.1735e637a64d5.0000000000000003
rbd_data.1735e637a64d5.0000000000000002
rbd_id.893f4f3d-f6d9-4521-997c-72caa861ac24_disk
rbd_data.1735e637a64d5.0000000000000001
rbd_object_map.1735e637a64d5
[root@ceph-osd01 ~]#
OBJ의  기본 크기는 크기는 주체4MBMB
CRUSH: Controlled Replication Under
Scalable Hashing
Object를 분산 저장하기위한 알고리즘.
Issue 발생
02
02. Issue 발생
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
OSD 중 1개가 90%가 되어
Read/Write가 안되는 주체 상태
[root@osc-ceph01 ~]# ceph pg dump |grep -i full_ratio
dumped all in format plain
full_ratio 0.9
nearfull_ratio 0.8
[root@osc-ceph01 ~]# ceph daemon mon.`hostname` config show |grep -i osd_full_ratio
"mon_osd_full_ratio": "0.9",
[root@osc-ceph01 ~]#
02. Issue 발생
- Ceph community Trouble shooting guide
참조  : http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space
02. Issue 발생
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
osd.8의 pg 1.11f를 삭제
02. Issue 발생
[root@ceph-mon02 ~]# ceph -s
cluster f5078395-0236-47fd-ad02-8a6daadc7475
health HEALTH_ERR
1 pgs are stuck inactive for more than 300 seconds
162 pgs backfill_wait
37 pgs backfilling
322 pgs degraded
1 pgs down
2 pgs peering
4 pgs recovering
119 pgs recovery_wait
1 pgs stuck inactive
322 pgs stuck unclean
199 pgs undersized
recovery 592647/43243812 objects degraded (1.370%)
recovery 488046/43243812 objects misplaced (1.129%)
1 mons down, quorum 0,2 ceph-mon01,ceph-mon03
monmap e1: 3 mons at {ceph-mon01=10.10.50.201:6789/0,ceph-mon02=10.10.50.202:6789/0,ceph-mon03=10.10.50.203:6789/0}
election epoch 480, quorum 0,2 ceph-mon01,ceph-mon03
osdmap e27606: 128 osds: 125 up, 125 in; 198 remapped pgs
flags sortbitwise
pgmap v58287759: 10240 pgs, 4 pools, 54316 GB data, 14076 kobjects
157 TB used, 71440 GB / 227 TB avail
592647/43243812 objects degraded (1.370%)
488046/43243812 objects misplaced (1.129%)
9916 active+clean
162 active+undersized+degraded+remapped+wait_backfill
119 active+recovery_wait+degraded
37 active+undersized+degraded+remapped+backfilling
4 active+recovering+degraded
1 down+peering
1 peering
300초 넘게 통신이 안되는 pg가 1개 (1.11f) ...
osd가 down되어 backfill을 기다리고 있는 pg가 162개
pglog의 범위를 벗어나
backfill을 진행 하고 있는 pg가 37개
3copy를 채우지 못해 성능이 떨어진 pg가 322개
문제의 down된 pg 1개... (1.11f) )
상태를 결정중인 pg 2개
(recovery, backfill)
recovery를 기다리고 있는 pg 119개
pglog를 보고 복구중인 pg 4개
(해당 pg I/O block됨)
up상태의 osd가 없어서 inactive 된 pg 1개
(1.11f)
3벌 복제에 못미치는 pg가 322개
pool의 복제본 수에 못미치는 pg가 199개
Monitor 1개 죽음
pg 1.11f를 갖고 있는
OSD 3개 죽음
02. Issue 발생
구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
Images Pool
Openstack image들이
들어가 있음.
Volumes Pool
pg 1.11f는 모든 openstack
volume들의 정보를
조금씩 갖고 있음.
pg 1개가 down되면 해당 pool의
모든 data들을 쓸 수가 없다.
[root@osc-ceph01 ~]# ceph pg dump |head
dumped all in format plain
...
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
1.11f 0 0 0 0 0 0 3080 3080 active+clean 2019-07-10 08:12:46.623592 921'8580 10763:10591 [8,4,7] 8 [8,4,7] 8
921'8580 2019-07-10 08:12:46.623572 921'8580 2019-07-07 19:44:32.652377
...
Primary pg가
모든 I/O를 책임진다.
해결 과정
03
03. 해결 과정
writeout_from: 30174'649621, trimmed:
-1> 2018-10-24 15:28:44.487997 7fb622e2d700 5 write_log with: dirty_to: 0'0,
dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0,
writeout_from: 30174'593316, trimmed:
0> 2018-10-24 15:28:44.502006 7fb61de23700 -1 osd/SnapMapper.cc: In function
'void SnapMapper::add_oid(const hobject_t&,
const std::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
thread 7fb61de23700 time 2018-10-24 15:28:44.497739
osd/SnapMapper.cc: 228: FAILED assert(r == -2)
분석 결과...
3벌 복제된 pg간 충돌이나서 해당 pg를
갖고 있는 osd가 down된다.
이것은 redhat ceph 3.1(luminous)에서 fix되었으니
upgrade를 해라!!
그러나...
- Redhat Openstack 9(Mitaka)는 Redhat ceph 3.1을 지원 안함.
- Redhat ceph 3.1로 upgrade하기전에 openstack을
10(Newton)까지 upgrade 필요.
- Redhat openstack 9는 TripleO로 되어져 있음.
(Upgrade process가 굉장히 복잡함...)
- Redhat ceph upgrade 시 Error상태에서 해야 함.
- 렁나ㅣ러아니ㅓㄹ아ㅣㄴ;ㅓㄹ아ㅣ;ㄴ며랴ㅓ야냋
03. 해결 과정
Openstack upgrade
- 실패...
- 재설치 후 모든  후 모든  모든  vm 복구 
Ceph 3.1 upgrade
- ceph ansible을  사용하지 않고  않고  manualy upgrade 함.
03. 해결 과정
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
vms Pool
Nova에 의해 생성되는
vm image를 저장
12345_disk 기존 vm rbd
67890_disk 신규 vm rbd
신규 VM1
ID=67890
기존 VM1
ID=12345
복구 과정
- 신규 vm생성 (ID 67890)
- vms pool에 있는 rbd 67890_disk삭제
- 12345_disk를 67890_disk로 이름변경
- 이걸 모든vm에 적용...
[root@ceph01 ~]# rbd list -p vms
12345_disk
67890_disk
[root@ceph01 ~]# rbd rm -p vms 67890_disk
Removing image: 100% complete...done.
[root@ceph01 ~]#
[root@ceph01 ~]# rbd mv -p vms 12345_disk 67890_disk
[root@ceph01 ~]# rbd ls -p vms
67890_disk
03. 해결 과정
Redhat Ceph 3.1 upgrade 후 ...
- 비슷한 문제 발생
- pg 1.11f 를 갖고 있는 osd들이 up down을 반복 함.
[root@ceph-mon01 osc]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
noscrub,nodeep-scrub flag(s) set
5 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
flags noscrub,nodeep-scrub
data:
pools: 4 pools, 10240 pgs
objects: 12200k objects, 46865 GB
usage: 137 TB used, 97628 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+inconsistent
io:
client: 0 B/s rd, 1232 kB/s wr, 19 op/s rd, 59 op/s wr
[root@ceph-mon01 osc]# ceph health detail
HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 5 scrub errors; Possible data
damage: 1 pg inconsistent
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
OSD_SCRUB_ERRORS 5 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 1.11f is active+clean+inconsistent, acting [113,105,10]
[root@ceph-mon01 osc]#
OTL...
03. 해결 과정
하지만 문제되는 Object를 특정지을 수 있었음.
[root@ceph-mon01 ~]# rados list-inconsistent-obj 1.11f --format=json-pretty
{
"epoch": 34376,
"inconsistents": [
{
"object": {
"name": "rbd_data.39edab651c7b53.0000000000003600",
"nspace": "",
"locator": "",
03. 해결 과정
Object rbd_data.39edab651c7b53.0000000000003600는 고객 DB Service vm의 root filesystem volume이었음.
다행이도 DB data에는 문제가 없었고...
문제가 된 DB vm의 root filesystem을 담고 있는 RBD image를 삭제 함. 하지만 여전히 상태는 HEALTH_ERR ...
[root@ceph-mon01 osc]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
4 scrub errors
Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
data:
pools: 4 pools, 10240 pgs
objects: 12166k objects, 46731 GB
usage: 136 TB used, 98038 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+inconsistent+snaptrim_error
io:
client: 0 B/s rd, 351 kB/s wr, 15 op/s rd, 51 op/s wr
[root@ceph-mon01 osc]# ceph health detail
HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent, 1 pg
snaptrim_error
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error
pg 1.11f is active+clean+inconsistent+snaptrim_error, acting [113,105,10]
[root@ceph-mon01 osc]#
03. 해결 과정
- 문제되는 object의 snapshot id 54(0x36)이 문제가 되어서 error가 발생중임.
- ??? 이미 지웠는데??
2018-11-16 18:45:00.163319 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 10: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
2018-11-16 18:45:00.163330 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 105: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
2018-11-16 18:45:00.163333 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 113: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
$ printf "%dn" 0x36
54
[root@ceph-osd08 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-113 --pgid 1.11f --op list | grep 39edab651c7b53
Error getting attr on : 1.11f_head,#-3:f8800000:::scrub_1.11f:head#, (61) No data available
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":54,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":63,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":-2,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
03. 해결 과정
- 문제되는 object를 갖고 있는 rbd image를 찾아보자!
[root@ceph-mon01 osc]# rbd info volume-13076ffc-6520-4db8-b238-1ba6108bfe52 -p volumes
rbd image 'volume-13076ffc-6520-4db8-b238-1ba6108bfe52':
size 53248 MB in 13312 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.62cb510d494de
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
[root@ceph-mon01 osc]#
[root@ceph-mon01 osc]# cat rbd-info.sh
#!/bin/bash
for i in `rbd list -p volumes`
do
rbd info volumes/$i |grep rbd_data.39edab651c7b53
echo --- $i done ----
done
[root@ceph-mon01 osc]# bash rbd-info.sh
rbd info에서 object의
prefix를 볼 수 있다.
모든 rbd image에서
문제되는 object를
찾는 script
[root@ceph-mon01 osc]# bash rbd-info.sh
--- rbdtest done ----
--- volume-00b0de1a-bfab-40e0-a444-b6c2d0de3905 done ----
--- volume-02d9c884-fc30-4700-87fd-950855ae361d done ----
...
[root@ceph-mon01 osc]# 결과는 ...
역시나 없음...
03. 해결 과정
- 해당 snapshot을 갖고 있는 volume이 삭제 되었으니 오류에 대한 조건이 더이상 존재하지 않음.
- repair를 다시 해보라고 함.
[root@ceph-mon01 ~]# date ; ceph pg repair 1.11f
Wed Nov 28 18:16:25 KST 2018
instructing pg 1.11f on osd.113 to repair
[root@ceph-mon01 ~]# ceph health detail
HEALTH_ERR noscrub,nodeep-scrub flag(s) set; Possible data damage: 1 pg repair
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
PG_DAMAGED Possible data damage: 1 pg repair
pg 1.11f is active+clean+scrubbing+deep+repair, acting [113,105,10]
[root@ceph-mon01 ~]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
noscrub,nodeep-scrub flag(s) set
Possible data damage: 1 pg repair
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
flags noscrub,nodeep-scrub
data:
pools: 4 pools, 10240 pgs
objects: 12321k objects, 47365 GB
usage: 138 TB used, 96138 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+scrubbing+deep+repair
io:
client: 598 kB/s rd, 1145 kB/s wr, 18 op/s rd, 63 op/s wr
pg 1.11f를 repair중
03. 해결 과정
- ceph log를 확인.
[root@ceph-mon01 ~]# ceph -w
...
2018-11-28 18:21:26.654955 osd.113 [ERR] 1.11f repair stat mismatch, got 3310/3312 objects, 91/92 clones, 3243/3245
dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 67/68 whiteouts, 13579894784/13584089088 bytes, 0/0 hit_set_archive bytes.
2018-11-28 18:21:26.655657 osd.113 [ERR] 1.11f repair 1 errors, 1 fixed
2018-11-28 18:19:28.979704 mon.ceph-mon01 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg repair)
2018-11-28 18:20:30.652593 mon.ceph-mon01 [WRN] Health check update: nodeep-scrub flag(s) set (OSDMAP_FLAGS)
2018-11-28 18:20:35.394445 mon.ceph-mon01 [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2018-11-28 18:20:35.394457 mon.ceph-mon01 [INF] Cluster is now healthy
어..?! fixed???
03. 해결 과정
- HEALTH_OK
[root@ceph-mon01 ~]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
data:
pools: 4 pools, 10240 pgs
objects: 12321k objects, 47366 GB
usage: 138 TB used, 96138 GB / 232 TB avail
pgs: 10216 active+clean
24 active+clean+scrubbing+deep
io:
client: 424 kB/s rd, 766 kB/s wr, 18 op/s rd, 72 op/s wr
Q&A
오픈소스컨설팅 이 영주
( yjlee@osci.kr )
Thank you
감사합니다
Cloud & Collaboration
T. 02-516-0711 E. sales@osci.kr
서울시강남구 테헤란로83길32,5층(삼성동, 나라키움삼성동A빌딩)
www.osci.kr

Weitere ähnliche Inhalte

Was ist angesagt?

[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?OpenStack Korea Community
 
Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차Nalee Jang
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advanceDaeMyung Kang
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency CephShapeBlue
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-RegionJi-Woong Choi
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutronvivekkonnect
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법Open Source Consulting
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Community
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
Disk health prediction for Ceph
Disk health prediction for CephDisk health prediction for Ceph
Disk health prediction for CephCeph Community
 
OpenStack超入門シリーズ Novaのディスク周りあれこれ
OpenStack超入門シリーズ Novaのディスク周りあれこれOpenStack超入門シリーズ Novaのディスク周りあれこれ
OpenStack超入門シリーズ Novaのディスク周りあれこれToru Makabe
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 

Was ist angesagt? (20)

[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
 
Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Disk health prediction for Ceph
Disk health prediction for CephDisk health prediction for Ceph
Disk health prediction for Ceph
 
OpenStack超入門シリーズ Novaのディスク周りあれこれ
OpenStack超入門シリーズ Novaのディスク周りあれこれOpenStack超入門シリーズ Novaのディスク周りあれこれ
OpenStack超入門シリーズ Novaのディスク周りあれこれ
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 

Ähnlich wie Ceph issue 해결 사례

[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"OpenStack Korea Community
 
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance Ceph Community
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCanSecWest
 
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606Diaa Radwan
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtqViet Stack
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBXiao Yan Li
 
Cephalocon apac china
Cephalocon apac chinaCephalocon apac china
Cephalocon apac chinaVikhyat Umrao
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Ceph Community
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Matthew Ahrens
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Community
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisAnne Nicolas
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingMarian Marinov
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteKenny Gryp
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Community
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud StroageAlex Lau
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFromDual GmbH
 

Ähnlich wie Ceph issue 해결 사례 (20)

[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
 
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
 
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
Ceph_And_OpenStack_Red_Hat_Summit_2015_Boston_20150606
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDB
 
Cephalocon apac china
Cephalocon apac chinaCephalocon apac china
Cephalocon apac china
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suite
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with Galera
 

Mehr von Open Source Consulting

클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략Open Source Consulting
 
[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술Open Source Consulting
 
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdfOpen Source Consulting
 
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.Open Source Consulting
 
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian Open Source Consulting
 
초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초Open Source Consulting
 
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket CloudAtlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket CloudOpen Source Consulting
 
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10![웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!Open Source Consulting
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법Open Source Consulting
 
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinuxOpen Source Consulting
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)Open Source Consulting
 
[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack Community[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack CommunityOpen Source Consulting
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network TroubleshootingOpen Source Consulting
 
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브Open Source Consulting
 
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사Open Source Consulting
 
[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908Open Source Consulting
 
Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안Open Source Consulting
 

Mehr von Open Source Consulting (20)

클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
클라우드 네이티브 전환 요소 및 성공적인 쿠버네티스 도입 전략
 
[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술[기술 트렌드] Gartner 선정 10대 전략 기술
[기술 트렌드] Gartner 선정 10대 전략 기술
 
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
[오픈테크넷서밋2022] 국내 PaaS(Kubernetes) Best Practice 및 DevOps 환경 구축 사례.pdf
 
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
 
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
Life science에서 k-agile으로 일하기 : with SAFe(Scaled Agile) & Atlassian
 
초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초초보자를 위한 네트워크/VLAN 기초
초보자를 위한 네트워크/VLAN 기초
 
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket CloudAtlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
Atlassian cloud 제품을 이용한 DevOps 프로세스 구축: Jira Cloud, Bitbucket Cloud
 
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10![웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
[웨비나] 클라우드 마이그레이션 수행 시 가장 많이 하는 질문 Top 10!
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux[오픈소스컨설팅] SELinux : Stop Disabling SELinux
[오픈소스컨설팅] SELinux : Stop Disabling SELinux
 
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)[오픈소스컨설팅] 서비스 메쉬(Service mesh)
[오픈소스컨설팅] 서비스 메쉬(Service mesh)
 
[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack Community[오픈소스컨설팅] ARM & OpenStack Community
[오픈소스컨설팅] ARM & OpenStack Community
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting
 
Atlassian ITSM Case-study
Atlassian ITSM Case-studyAtlassian ITSM Case-study
Atlassian ITSM Case-study
 
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
[열린기술공방] Container기반의 DevOps - 클라우드 네이티브
 
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
주 52시간 시대의 Agile_ 오픈소스컨설팅 한진규 이사
 
Open infra and cloud native
Open infra and cloud nativeOpen infra and cloud native
Open infra and cloud native
 
[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908[오픈소스컨설팅] jira service desk 201908
[오픈소스컨설팅] jira service desk 201908
 
Community Openstack 구축 사례
Community Openstack 구축 사례Community Openstack 구축 사례
Community Openstack 구축 사례
 
Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안Community openstack & Ceph 기반 서비스 운영 해결 방안
Community openstack & Ceph 기반 서비스 운영 해결 방안
 

Kürzlich hochgeladen

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Kürzlich hochgeladen (20)

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Ceph issue 해결 사례

  • 2. Contents 01. 구성도 02. Issue 발생 03. 해결 과정
  • 4. 01. 구성도 ● 전체 구성도 Controller Node Compute NodeStorage Node Deploy FireWall Router
  • 5. 5/26 01. 구성도 ● Ceph 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3
  • 6. 6/26 01. 구성도 ● Ceph OBJ 흐름 PG: Placement Group Object를 저장하기 위한 OSD의 group. 복제본 수에 맞춰 member의 수가 달라짐. OSD: Object Storage Daemon object를 최종 저장하는 곳 Monitor: ceph OSD의 변화를 monitoring 하여 crush map을 만드는 주체 주체. [root@ceph-osd01 ~]# rados ls -p vms rbd_data.1735e637a64d5.0000000000000000 rbd_header.1735e637a64d5 rbd_directory rbd_children rbd_info rbd_data.1735e637a64d5.0000000000000003 rbd_data.1735e637a64d5.0000000000000002 rbd_id.893f4f3d-f6d9-4521-997c-72caa861ac24_disk rbd_data.1735e637a64d5.0000000000000001 rbd_object_map.1735e637a64d5 [root@ceph-osd01 ~]# OBJ의 기본 크기는 크기는 주체4MBMB CRUSH: Controlled Replication Under Scalable Hashing Object를 분산 저장하기위한 알고리즘.
  • 8. 02. Issue 발생 Ceph 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 OSD 중 1개가 90%가 되어 Read/Write가 안되는 주체 상태 [root@osc-ceph01 ~]# ceph pg dump |grep -i full_ratio dumped all in format plain full_ratio 0.9 nearfull_ratio 0.8 [root@osc-ceph01 ~]# ceph daemon mon.`hostname` config show |grep -i osd_full_ratio "mon_osd_full_ratio": "0.9", [root@osc-ceph01 ~]#
  • 9. 02. Issue 발생 - Ceph community Trouble shooting guide 참조 : http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space
  • 10. 02. Issue 발생 Ceph 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 osd.8의 pg 1.11f를 삭제
  • 11. 02. Issue 발생 [root@ceph-mon02 ~]# ceph -s cluster f5078395-0236-47fd-ad02-8a6daadc7475 health HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds 162 pgs backfill_wait 37 pgs backfilling 322 pgs degraded 1 pgs down 2 pgs peering 4 pgs recovering 119 pgs recovery_wait 1 pgs stuck inactive 322 pgs stuck unclean 199 pgs undersized recovery 592647/43243812 objects degraded (1.370%) recovery 488046/43243812 objects misplaced (1.129%) 1 mons down, quorum 0,2 ceph-mon01,ceph-mon03 monmap e1: 3 mons at {ceph-mon01=10.10.50.201:6789/0,ceph-mon02=10.10.50.202:6789/0,ceph-mon03=10.10.50.203:6789/0} election epoch 480, quorum 0,2 ceph-mon01,ceph-mon03 osdmap e27606: 128 osds: 125 up, 125 in; 198 remapped pgs flags sortbitwise pgmap v58287759: 10240 pgs, 4 pools, 54316 GB data, 14076 kobjects 157 TB used, 71440 GB / 227 TB avail 592647/43243812 objects degraded (1.370%) 488046/43243812 objects misplaced (1.129%) 9916 active+clean 162 active+undersized+degraded+remapped+wait_backfill 119 active+recovery_wait+degraded 37 active+undersized+degraded+remapped+backfilling 4 active+recovering+degraded 1 down+peering 1 peering 300초 넘게 통신이 안되는 pg가 1개 (1.11f) ... osd가 down되어 backfill을 기다리고 있는 pg가 162개 pglog의 범위를 벗어나 backfill을 진행 하고 있는 pg가 37개 3copy를 채우지 못해 성능이 떨어진 pg가 322개 문제의 down된 pg 1개... (1.11f) ) 상태를 결정중인 pg 2개 (recovery, backfill) recovery를 기다리고 있는 pg 119개 pglog를 보고 복구중인 pg 4개 (해당 pg I/O block됨) up상태의 osd가 없어서 inactive 된 pg 1개 (1.11f) 3벌 복제에 못미치는 pg가 322개 pool의 복제본 수에 못미치는 pg가 199개 Monitor 1개 죽음 pg 1.11f를 갖고 있는 OSD 3개 죽음
  • 12. 02. Issue 발생 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 Images Pool Openstack image들이 들어가 있음. Volumes Pool pg 1.11f는 모든 openstack volume들의 정보를 조금씩 갖고 있음. pg 1개가 down되면 해당 pool의 모든 data들을 쓸 수가 없다. [root@osc-ceph01 ~]# ceph pg dump |head dumped all in format plain ... pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 1.11f 0 0 0 0 0 0 3080 3080 active+clean 2019-07-10 08:12:46.623592 921'8580 10763:10591 [8,4,7] 8 [8,4,7] 8 921'8580 2019-07-10 08:12:46.623572 921'8580 2019-07-07 19:44:32.652377 ... Primary pg가 모든 I/O를 책임진다.
  • 14. 03. 해결 과정 writeout_from: 30174'649621, trimmed: -1> 2018-10-24 15:28:44.487997 7fb622e2d700 5 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 30174'593316, trimmed: 0> 2018-10-24 15:28:44.502006 7fb61de23700 -1 osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7fb61de23700 time 2018-10-24 15:28:44.497739 osd/SnapMapper.cc: 228: FAILED assert(r == -2) 분석 결과... 3벌 복제된 pg간 충돌이나서 해당 pg를 갖고 있는 osd가 down된다. 이것은 redhat ceph 3.1(luminous)에서 fix되었으니 upgrade를 해라!! 그러나... - Redhat Openstack 9(Mitaka)는 Redhat ceph 3.1을 지원 안함. - Redhat ceph 3.1로 upgrade하기전에 openstack을 10(Newton)까지 upgrade 필요. - Redhat openstack 9는 TripleO로 되어져 있음. (Upgrade process가 굉장히 복잡함...) - Redhat ceph upgrade 시 Error상태에서 해야 함. - 렁나ㅣ러아니ㅓㄹ아ㅣㄴ;ㅓㄹ아ㅣ;ㄴ며랴ㅓ야냋
  • 15. 03. 해결 과정 Openstack upgrade - 실패... - 재설치 후 모든 후 모든 모든 vm 복구 Ceph 3.1 upgrade - ceph ansible을 사용하지 않고 않고 manualy upgrade 함.
  • 16. 03. 해결 과정 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 vms Pool Nova에 의해 생성되는 vm image를 저장 12345_disk 기존 vm rbd 67890_disk 신규 vm rbd 신규 VM1 ID=67890 기존 VM1 ID=12345 복구 과정 - 신규 vm생성 (ID 67890) - vms pool에 있는 rbd 67890_disk삭제 - 12345_disk를 67890_disk로 이름변경 - 이걸 모든vm에 적용... [root@ceph01 ~]# rbd list -p vms 12345_disk 67890_disk [root@ceph01 ~]# rbd rm -p vms 67890_disk Removing image: 100% complete...done. [root@ceph01 ~]# [root@ceph01 ~]# rbd mv -p vms 12345_disk 67890_disk [root@ceph01 ~]# rbd ls -p vms 67890_disk
  • 17. 03. 해결 과정 Redhat Ceph 3.1 upgrade 후 ... - 비슷한 문제 발생 - pg 1.11f 를 갖고 있는 osd들이 up down을 반복 함. [root@ceph-mon01 osc]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR noscrub,nodeep-scrub flag(s) set 5 scrub errors Possible data damage: 1 pg inconsistent services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in flags noscrub,nodeep-scrub data: pools: 4 pools, 10240 pgs objects: 12200k objects, 46865 GB usage: 137 TB used, 97628 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+inconsistent io: client: 0 B/s rd, 1232 kB/s wr, 19 op/s rd, 59 op/s wr [root@ceph-mon01 osc]# ceph health detail HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 5 scrub errors; Possible data damage: 1 pg inconsistent OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set OSD_SCRUB_ERRORS 5 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.11f is active+clean+inconsistent, acting [113,105,10] [root@ceph-mon01 osc]# OTL...
  • 18. 03. 해결 과정 하지만 문제되는 Object를 특정지을 수 있었음. [root@ceph-mon01 ~]# rados list-inconsistent-obj 1.11f --format=json-pretty { "epoch": 34376, "inconsistents": [ { "object": { "name": "rbd_data.39edab651c7b53.0000000000003600", "nspace": "", "locator": "",
  • 19. 03. 해결 과정 Object rbd_data.39edab651c7b53.0000000000003600는 고객 DB Service vm의 root filesystem volume이었음. 다행이도 DB data에는 문제가 없었고... 문제가 된 DB vm의 root filesystem을 담고 있는 RBD image를 삭제 함. 하지만 여전히 상태는 HEALTH_ERR ... [root@ceph-mon01 osc]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR 4 scrub errors Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in data: pools: 4 pools, 10240 pgs objects: 12166k objects, 46731 GB usage: 136 TB used, 98038 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+inconsistent+snaptrim_error io: client: 0 B/s rd, 351 kB/s wr, 15 op/s rd, 51 op/s wr [root@ceph-mon01 osc]# ceph health detail HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error OSD_SCRUB_ERRORS 4 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error pg 1.11f is active+clean+inconsistent+snaptrim_error, acting [113,105,10] [root@ceph-mon01 osc]#
  • 20. 03. 해결 과정 - 문제되는 object의 snapshot id 54(0x36)이 문제가 되어서 error가 발생중임. - ??? 이미 지웠는데?? 2018-11-16 18:45:00.163319 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 10: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) 2018-11-16 18:45:00.163330 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 105: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) 2018-11-16 18:45:00.163333 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 113: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) $ printf "%dn" 0x36 54 [root@ceph-osd08 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-113 --pgid 1.11f --op list | grep 39edab651c7b53 Error getting attr on : 1.11f_head,#-3:f8800000:::scrub_1.11f:head#, (61) No data available ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":54,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}] ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":63,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}] ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":-2,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
  • 21. 03. 해결 과정 - 문제되는 object를 갖고 있는 rbd image를 찾아보자! [root@ceph-mon01 osc]# rbd info volume-13076ffc-6520-4db8-b238-1ba6108bfe52 -p volumes rbd image 'volume-13076ffc-6520-4db8-b238-1ba6108bfe52': size 53248 MB in 13312 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.62cb510d494de format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: [root@ceph-mon01 osc]# [root@ceph-mon01 osc]# cat rbd-info.sh #!/bin/bash for i in `rbd list -p volumes` do rbd info volumes/$i |grep rbd_data.39edab651c7b53 echo --- $i done ---- done [root@ceph-mon01 osc]# bash rbd-info.sh rbd info에서 object의 prefix를 볼 수 있다. 모든 rbd image에서 문제되는 object를 찾는 script [root@ceph-mon01 osc]# bash rbd-info.sh --- rbdtest done ---- --- volume-00b0de1a-bfab-40e0-a444-b6c2d0de3905 done ---- --- volume-02d9c884-fc30-4700-87fd-950855ae361d done ---- ... [root@ceph-mon01 osc]# 결과는 ... 역시나 없음...
  • 22. 03. 해결 과정 - 해당 snapshot을 갖고 있는 volume이 삭제 되었으니 오류에 대한 조건이 더이상 존재하지 않음. - repair를 다시 해보라고 함. [root@ceph-mon01 ~]# date ; ceph pg repair 1.11f Wed Nov 28 18:16:25 KST 2018 instructing pg 1.11f on osd.113 to repair [root@ceph-mon01 ~]# ceph health detail HEALTH_ERR noscrub,nodeep-scrub flag(s) set; Possible data damage: 1 pg repair OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set PG_DAMAGED Possible data damage: 1 pg repair pg 1.11f is active+clean+scrubbing+deep+repair, acting [113,105,10] [root@ceph-mon01 ~]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR noscrub,nodeep-scrub flag(s) set Possible data damage: 1 pg repair services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in flags noscrub,nodeep-scrub data: pools: 4 pools, 10240 pgs objects: 12321k objects, 47365 GB usage: 138 TB used, 96138 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+scrubbing+deep+repair io: client: 598 kB/s rd, 1145 kB/s wr, 18 op/s rd, 63 op/s wr pg 1.11f를 repair중
  • 23. 03. 해결 과정 - ceph log를 확인. [root@ceph-mon01 ~]# ceph -w ... 2018-11-28 18:21:26.654955 osd.113 [ERR] 1.11f repair stat mismatch, got 3310/3312 objects, 91/92 clones, 3243/3245 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 67/68 whiteouts, 13579894784/13584089088 bytes, 0/0 hit_set_archive bytes. 2018-11-28 18:21:26.655657 osd.113 [ERR] 1.11f repair 1 errors, 1 fixed 2018-11-28 18:19:28.979704 mon.ceph-mon01 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg repair) 2018-11-28 18:20:30.652593 mon.ceph-mon01 [WRN] Health check update: nodeep-scrub flag(s) set (OSDMAP_FLAGS) 2018-11-28 18:20:35.394445 mon.ceph-mon01 [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set) 2018-11-28 18:20:35.394457 mon.ceph-mon01 [INF] Cluster is now healthy 어..?! fixed???
  • 24. 03. 해결 과정 - HEALTH_OK [root@ceph-mon01 ~]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in data: pools: 4 pools, 10240 pgs objects: 12321k objects, 47366 GB usage: 138 TB used, 96138 GB / 232 TB avail pgs: 10216 active+clean 24 active+clean+scrubbing+deep io: client: 424 kB/s rd, 766 kB/s wr, 18 op/s rd, 72 op/s wr
  • 26. Thank you 감사합니다 Cloud & Collaboration T. 02-516-0711 E. sales@osci.kr 서울시강남구 테헤란로83길32,5층(삼성동, 나라키움삼성동A빌딩) www.osci.kr