SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
All-Flash Ceph 구성과 최적화
Feb. 18, 2016
SDS Tech. Lab, Corporate R&D Center
SK Telecom
OpenStack Days in Korea
1
Why are we focusing at all-flash Ceph?
Tech. Trends of Storage Systems
Hybrid
Scale-up Storage
Systems
Hybrid
Scale-out
Storage Systems
All-flash
Scale-up Storage
Systems
All-flash
Scale-out
Storage Systems
Effective
Capacity
Increasing
Performance Up
Requirements for All-IT Network/Infra Storage System
Scalability Availability Performance
2
What is Ceph?
http://docs.ceph.com/docs/master/_images/stack.png
Object Virtual Disk Files & DirsObject
App App Host/VM Client
• Ceph is a unified, distributed, massively scalable open source storage solution
 Object, Block and File storage
• mostly LGPL open source project
• Failure is normal
• Self managing
• Scale out on commodity hardware
• Everything runs in software
3
Ceph Architecture
OSD
Cluster Maps
Direct IO between clients and OSDs
Service Network
Storage Network
Ceph Storage System
OSD OSD
KVM
librbd
Application
krbd
Application
librados
Monitor Monitor Monitor
OSD
Cluster Maps
4
Ceph Operation: Ceph Block Device
PG#0
OSD #1
Disk or RAID Group
XFS
Journal
1. O_DIRECT
2. O_DSYNC
2. Buffered I/O
OSD #0
Synchronous Replication
FileStore
PG#1 PG#2 PG#3
Data
PG#2
librbd
librados
OSD
Service
OSD #0
Application
데이터 배치 : CRUSH 알고리즘
 Ceph Block Device
고정 크기 (기본: 4MB) Object의 연속
예) 1GB Block Image = 256 Objects
Hash: Object to PG
5
Ceph OSD 노드 구성
 Journal / Data 디스크 구성
• 일반적인 조합 (Journal / Data)
 SSD / HDD
 외부 저널 디스크 없음 / SSD
 PCIe SSD / SATA SSD
 NVRAM / SSD
0
5
10
15
20
25
30
35
0
20000
40000
60000
80000
100000
SSD NVRAM
ms
IOPS
4KB Random Write
IOPS Latency
 노드 별 OSD 개수
• OSD: 1 OSD per (1 DISK or 1 RAID
GROUP)
• Ceph OSD Daemon  CPU-intensive
processes
0
20
40
60
80
100
120
0
5000
10000
15000
20000
25000
30000
35000
3 OSDs 4 OSDs 6 OSDs 8 OSDs 12 OSDs
ms
IOPS
4KB Random Write
IOPS Latency
Journal Type
6
Ceph on All-Flash Performance Issues
6945
11585 14196 15885 15298 16603
2.9 3.5
5.7
10.1
21.9
39.7
0
5
10
15
20
25
30
35
40
45
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
4 8 16 32 64 128
ms
IOPS
Thread count
4KB Random Write IOPS
IOPS Latency
3716 4910 7243 13694
80062
1020905.4
8.1
11.0
11.7
5.9
10.9
0
2
4
6
8
10
12
14
0
50,000
100,000
150,000
200,000
250,000
300,000
4 8 16 32 64 128
ms
IOPS
Thread count
4KB Random Read IOPS
IOPS Latency  Issue: Low Throughput & High
Latency
• SSD Spec.
 4KB Random Read: up to 95K IOPS
 4KB Random Write: up to 85K IOPS
(Sustained 14K IOPS)
 Latency < 1ms
• Ideal Throughput
 4KB Random Read: 95K * 10EA * 4
Node
 3800K IOPS
 4KB Random Write: 14K * 10EA * 4
Node / 2 Replication
 280K IOPS
Sustained Performance: 2x Write(80% of Usable Space)
5 Clients use krbd
측정치 Ideal
4KB
Random Read
102K IOPS 3800K IOPS
4KB
Random Write
17K IOPS 280K IOPS
7
Ceph IO Flow in OSD
ObjectStore
JournalingObjectStore KeyValueStore MemStore BlueStore
FileStore
PGBackend
ReplicatedBackend ECBackend
PG
1. Journal: LIBAIO (O_DIRECT && O_DSYNC)  Committed
2. Data: Buffered IO and syncfs() later  Applied
OSD
Messenger
8
Ceph Write IO Flow: Receiving Request
OSD
Secondary OSD
5. Send Rep
Operations to 2nd
OSD
Operation
Threads
L: ReplicatedBackend
FileStore
L:
Messenger
Operation
Threads
FileStore
2. Receive
Write Req.
4. Do Operation
3. Queue
Op.
Queue
Rep Op.
Operation
WQ
Operation WQ
L: Messenger
Client
Cluster
Network
Public
Network
1. Send Write Req.
6. Enqueue
Transaction to
FileStore
PG Lock
PG Unlock
PG Lock
PG Unlock
9
Ceph Write IO Flow: in File Store
File Store (Data)
File Journal
Operation
Threads
Writer
Thread
Committed
Data Disk
Journal
Disk
Operation
Threads
1.Queue
Transaction
2. Operate
Journal
Transactions
5. Queue op
6. Queue to
Finisher
Finisher
Thread
writeq
Operation WQ
3. Write to
Journal Disk
Write
Finisher
Thread
4. AIO
Complete
Journal and
Data
completion?
Finisher
Thread
7. Write Data
8. Queue to
Finisher
Applied
PG Lock
PG Lock
PG Unlock
Send RepOp Reply to Primary if this
is secondary OSD
10
최적화
항목 이슈
PG Lock
 전체 Latency 중 30% 이상
이 PG Lock을 얻는데 소모
• OP Processing Worker Thread가 Block되어 관련 없는 OP
의 처리가 늦어짐
• 큰 PG Lock의 Critical Section
• Secondary OSD의 ACK 처리가 늦어져 IO Latency 증가
Ceph & System Tuning
 성능 측정 도중 결과값의 기
복이 큼
• Ceph 설정 변수: 개별 변경은 효과가 없고 최적 조합이 필요
• Memory Allocator의 CPU 사용량이 높음
• TCP/IP Nagle 알고리즘
Log
 Log 비활성화 여부에 따라
성능 변화가 큼
• OSD의 I/O 처리 과정에서 Log로 인한 시간 소모
Transaction
 Transaction 처리가 성능에
큰 영향
• Transaction 처리 비효율: 불필요한 연산, Lock Contention
11
VM 성능: 실험 환경
Service Network
(10GbE)
Storage Network
(10GbE)
Physical Client (x 5)
Vender / Model DELL R720XD
Processor Intel® Xeon® E5-2670v3 @ 2.60GHz x 2 (10core)
Memory 128GB
OS CentOS 7.0
OSD Node/Monitor (x 4)
Vender / Model DELL R630
Processor Intel® Xeon® E5-2690v3 @ 2.60GHz x 2 (12core)
Memory 128GB NIC 10Gbe
OS CentOS 7.0 JOURNAL RAMDISK
Switch (x 2)
Vender / Model Cisco nexus 5548UP 10G
Disk
SSD SK Hynix SSD 480GB 10개 / OSD Node
RAID
RAID 0, SSD 3개, 3개, 2개, 2개 (4 RAID Group)
- Device(4개) & Daemon(4개) / OSD Node
Ceph
Version
SKT Ceph와
Community(0.94.4)
VM (x Physical Client 당 최대 4개)
Guest OS Spec 2 Core, 4 GB memory
librbd
FIO Test Configuration
Run Time 300 Ramp Time 10
Threads 8 Queue Depth 8
Sustained Performance: 2x Write(80% of Usable Space)
12
VM 성능 비교 : Random Workload
71
3
43
3
185
114 118
71
3.4
5.7
2.7
5.5
3.5 3.4
2.0
2.5
0
1
2
3
4
5
6
7
8
9
10
0
20
40
60
80
100
120
140
160
180
200
4KBRWSKT
CEPH
4KBRW
Community
32KBRWSKT
CEPH
32KBRW
Community
4KBRRSKT
CEPH
4KBRR
Community
4KBRRSKT
CEPH
4KBRR
Community
ms
KIOPS ■ SKT CEPH IOPS ■ Community IOPS ◆ Latency
13
VM 성능 비교 : Sequential Workload
2,669 2,729 2,768
2,948
4,287 4,281 4,281 4,296
59.7
28.3
172.4
425.2
73.2
36.7
293.6
292.7
0
50
100
150
200
250
300
350
400
450
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
5,000
1MBSWSKT
CEPH
1MBSW
Community
4MBSWSKT
CEPH
4MBSW
Community
1MBSRSKT
CEPH
1MBSR
Community
4MBSRSKT
CEPH
4MBSR
Community
ms
MB/s
■ SKT CEPH BW ■ Community BW ◆ Latency
14
SKT AF-Ceph
AFC-S: 4 Data Node + 1 Management Node (Commodity Server & SSD 기반)
Monitor Node
(관리 서버)
Data Node
(OSD Node)
NVRAM
Journal
SATA SSD 10ea
Data Store
System Configuration
구성 4 Data Node + 1 Monitor Node
상면 5U
SSD SATA SSD 40 ea (in 4U)
NVRAM 8GB NVRAM
용량
Total 40TB / Usable 20TB (w/ 1TB SSD)
Total 80TB / Usable 40TB (w/ 2TB SSD)
Node H/W
CPU Intel Xeon E5 2690v3 2-socket
RAM 128GB (DDR3 1866MHz)
Network 10GbE x 2 for Service & Storage
…
AFC-N: 2U MicroServer (4 Data Node) + 1U NVMe All-Flash JBOF
…
NV-Array
(All-Flash
JBOF)
NV-Drive
(NVMe SSD)
E5 2-socket
Server
(4 Nodes in 2U) • 고성능(PCIe 3.0)
• 고집적(2.5” NVMe SSD 24EA: Up to 96TB)
• ‘16. 4Q 예정
15
SKT AF-Ceph
Real Time Monitoring
Multi Dashboard
Rule Base Alarm
Drag & Drop Admin
Rest API
Real-time Graph
Graph Merge
Drag & Zooming
Auto Configuration
Cluster Management
RBD Management
Object Storage Management
16
End-of-Document
연락처.
엄주관, jugwan.eom@sk.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Disk health prediction for Ceph
Disk health prediction for CephDisk health prediction for Ceph
Disk health prediction for Ceph
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 

Andere mochten auch

[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
OpenStack Korea Community
 
[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략
[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략
[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략
OpenStack Korea Community
 
[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기
[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기
[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기
OpenStack Korea Community
 

Andere mochten auch (20)

[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform
[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform
[OpenStack Days Korea 2016] Track1 - Red Hat enterprise Linux OpenStack Platform
 
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
 
[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략
[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략
[OpenStack Days Korea 2016] 아이디어 이코노미에서의 하이브리드 클라우드 전략
 
[OpenStack Days Korea 2016] Innovating OpenStack Network with SDN solution
[OpenStack Days Korea 2016] Innovating OpenStack Network with SDN solution[OpenStack Days Korea 2016] Innovating OpenStack Network with SDN solution
[OpenStack Days Korea 2016] Innovating OpenStack Network with SDN solution
 
[OpenStack Days Korea 2016] Microsoft is Open!
[OpenStack Days Korea 2016] Microsoft is Open![OpenStack Days Korea 2016] Microsoft is Open!
[OpenStack Days Korea 2016] Microsoft is Open!
 
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
 
[OpenStack Days Korea 2016] How open HW and SW drives telco infrastucture inn...
[OpenStack Days Korea 2016] How open HW and SW drives telco infrastucture inn...[OpenStack Days Korea 2016] How open HW and SW drives telco infrastucture inn...
[OpenStack Days Korea 2016] How open HW and SW drives telco infrastucture inn...
 
[OpenStack Days Korea 2016] 개회사
[OpenStack Days Korea 2016] 개회사[OpenStack Days Korea 2016] 개회사
[OpenStack Days Korea 2016] 개회사
 
Kakao Openstack CI/CD
Kakao Openstack CI/CDKakao Openstack CI/CD
Kakao Openstack CI/CD
 
Cloud data center and openstack
Cloud data center and openstackCloud data center and openstack
Cloud data center and openstack
 
OpenStack Cinder
OpenStack CinderOpenStack Cinder
OpenStack Cinder
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
 
[OpenStack Days Korea 2016] Track1 - Monasca를 이용한 Cloud 모니터링
[OpenStack Days Korea 2016] Track1 - Monasca를 이용한 Cloud 모니터링[OpenStack Days Korea 2016] Track1 - Monasca를 이용한 Cloud 모니터링
[OpenStack Days Korea 2016] Track1 - Monasca를 이용한 Cloud 모니터링
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
 
[OpenStack Days Korea 2016] Track4 - 해외 사례로 보는 OpenStack Billing System
[OpenStack Days Korea 2016] Track4 - 해외 사례로 보는 OpenStack Billing System[OpenStack Days Korea 2016] Track4 - 해외 사례로 보는 OpenStack Billing System
[OpenStack Days Korea 2016] Track4 - 해외 사례로 보는 OpenStack Billing System
 
[OpenStack Days Korea 2016] Track4 - OpenStack with Kubernetes
[OpenStack Days Korea 2016] Track4 - OpenStack with Kubernetes[OpenStack Days Korea 2016] Track4 - OpenStack with Kubernetes
[OpenStack Days Korea 2016] Track4 - OpenStack with Kubernetes
 
[OpenStack Days Korea 2016] Track2 - 데이터센터에 부는 오픈 소스 하드웨어 바람
[OpenStack Days Korea 2016] Track2 - 데이터센터에 부는 오픈 소스 하드웨어 바람[OpenStack Days Korea 2016] Track2 - 데이터센터에 부는 오픈 소스 하드웨어 바람
[OpenStack Days Korea 2016] Track2 - 데이터센터에 부는 오픈 소스 하드웨어 바람
 
[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기
[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기
[OpenStack Days 2016] Track4 - OpenNSL으로 브로드콜 기반 네트,워크 스위치 제어하기
 
[2017년 5월 정기세미나] IBM에서 바라보는 OpenStack 이야기
[2017년 5월 정기세미나] IBM에서 바라보는 OpenStack 이야기[2017년 5월 정기세미나] IBM에서 바라보는 OpenStack 이야기
[2017년 5월 정기세미나] IBM에서 바라보는 OpenStack 이야기
 
[OpenStack Days Korea 2016] Track2 - OpenStack 기반 소프트웨어 정의 스토리지 기술
[OpenStack Days Korea 2016] Track2 - OpenStack 기반 소프트웨어 정의 스토리지 기술[OpenStack Days Korea 2016] Track2 - OpenStack 기반 소프트웨어 정의 스토리지 기술
[OpenStack Days Korea 2016] Track2 - OpenStack 기반 소프트웨어 정의 스토리지 기술
 

Ähnlich wie [OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화

Ähnlich wie [OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화 (20)

Ceph
CephCeph
Ceph
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Nexenta at VMworld Hands-on Lab
Nexenta at VMworld Hands-on LabNexenta at VMworld Hands-on Lab
Nexenta at VMworld Hands-on Lab
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Road show 2015 triangle meetup
Road show 2015 triangle meetupRoad show 2015 triangle meetup
Road show 2015 triangle meetup
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 

Mehr von OpenStack Korea Community

Mehr von OpenStack Korea Community (20)

2019년 커뮤니티 활동 보고: 오픈스택 한국 커뮤니티
2019년 커뮤니티 활동 보고: 오픈스택 한국 커뮤니티2019년 커뮤니티 활동 보고: 오픈스택 한국 커뮤니티
2019년 커뮤니티 활동 보고: 오픈스택 한국 커뮤니티
 
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
 
[2018.10.19] 김용기 부장 - IAC on OpenStack (feat. ansible)
[2018.10.19] 김용기 부장 - IAC on OpenStack (feat. ansible)[2018.10.19] 김용기 부장 - IAC on OpenStack (feat. ansible)
[2018.10.19] 김용기 부장 - IAC on OpenStack (feat. ansible)
 
[OpenInfra Days Korea 2018] Day 2 - E6: "SONA: ONOS SDN Controller 기반 OpenSta...
[OpenInfra Days Korea 2018] Day 2 - E6: "SONA: ONOS SDN Controller 기반 OpenSta...[OpenInfra Days Korea 2018] Day 2 - E6: "SONA: ONOS SDN Controller 기반 OpenSta...
[OpenInfra Days Korea 2018] Day 2 - E6: "SONA: ONOS SDN Controller 기반 OpenSta...
 
[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...
[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...
[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...
 
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
 
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
[OpenInfra Days Korea 2018] Day 1 - T4-7: "Ceph 스토리지, PaaS로 서비스 운영하기"
 
[OpenInfra Days Korea 2018] Day 2 - E5: Mesos to Kubernetes, Cloud Native 서비스...
[OpenInfra Days Korea 2018] Day 2 - E5: Mesos to Kubernetes, Cloud Native 서비스...[OpenInfra Days Korea 2018] Day 2 - E5: Mesos to Kubernetes, Cloud Native 서비스...
[OpenInfra Days Korea 2018] Day 2 - E5: Mesos to Kubernetes, Cloud Native 서비스...
 
[OpenInfra Days Korea 2018] Day 2 - E1: 딥다이브 - OpenStack 생존기
[OpenInfra Days Korea 2018] Day 2 - E1: 딥다이브 - OpenStack 생존기[OpenInfra Days Korea 2018] Day 2 - E1: 딥다이브 - OpenStack 생존기
[OpenInfra Days Korea 2018] Day 2 - E1: 딥다이브 - OpenStack 생존기
 
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
 
[OpenInfra Days Korea 2018] Day 2 - E6 - 마이크로서비스를 위한 Istio & Kubernetes [다운로드...
[OpenInfra Days Korea 2018] Day 2 - E6 - 마이크로서비스를 위한 Istio & Kubernetes [다운로드...[OpenInfra Days Korea 2018] Day 2 - E6 - 마이크로서비스를 위한 Istio & Kubernetes [다운로드...
[OpenInfra Days Korea 2018] Day 2 - E6 - 마이크로서비스를 위한 Istio & Kubernetes [다운로드...
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
 
[OpenInfra Days Korea 2018] Day 2 - E4 - 핸즈온 워크샵: 서버리스가 컨테이너를 만났을 때
[OpenInfra Days Korea 2018] Day 2 - E4 - 핸즈온 워크샵: 서버리스가 컨테이너를 만났을 때 [OpenInfra Days Korea 2018] Day 2 - E4 - 핸즈온 워크샵: 서버리스가 컨테이너를 만났을 때
[OpenInfra Days Korea 2018] Day 2 - E4 - 핸즈온 워크샵: 서버리스가 컨테이너를 만났을 때
 
[OpenInfra Days Korea 2018] (삼성전자) Evolution to Cloud Native
[OpenInfra Days Korea 2018] (삼성전자) Evolution to Cloud Native[OpenInfra Days Korea 2018] (삼성전자) Evolution to Cloud Native
[OpenInfra Days Korea 2018] (삼성전자) Evolution to Cloud Native
 
[OpenInfra Days Korea 2018] (NetApp) Open Source with NetApp - 전국섭 상무
[OpenInfra Days Korea 2018] (NetApp) Open Source with NetApp - 전국섭 상무[OpenInfra Days Korea 2018] (NetApp) Open Source with NetApp - 전국섭 상무
[OpenInfra Days Korea 2018] (NetApp) Open Source with NetApp - 전국섭 상무
 
[OpenInfra Days Korea 2018] (Track 4) - 오픈스택기반 NFV 관리 및 HA (high Availability...
[OpenInfra Days Korea 2018] (Track 4) - 오픈스택기반 NFV 관리 및 HA (high Availability...[OpenInfra Days Korea 2018] (Track 4) - 오픈스택기반 NFV 관리 및 HA (high Availability...
[OpenInfra Days Korea 2018] (Track 4) - 오픈스택기반 NFV 관리 및 HA (high Availability...
 
[OpenInfra Days Korea 2018] (Track 4) - FreeIPA와 함께 SSO 구성
[OpenInfra Days Korea 2018] (Track 4) - FreeIPA와 함께 SSO 구성[OpenInfra Days Korea 2018] (Track 4) - FreeIPA와 함께 SSO 구성
[OpenInfra Days Korea 2018] (Track 4) - FreeIPA와 함께 SSO 구성
 
[OpenInfra Days Korea 2018] (Track 4) - Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
[OpenInfra Days Korea 2018] (Track 4) - Backend.AI: 오픈소스 머신러닝 인프라 프레임워크[OpenInfra Days Korea 2018] (Track 4) - Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
[OpenInfra Days Korea 2018] (Track 4) - Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
 
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화

  • 1. All-Flash Ceph 구성과 최적화 Feb. 18, 2016 SDS Tech. Lab, Corporate R&D Center SK Telecom OpenStack Days in Korea
  • 2. 1 Why are we focusing at all-flash Ceph? Tech. Trends of Storage Systems Hybrid Scale-up Storage Systems Hybrid Scale-out Storage Systems All-flash Scale-up Storage Systems All-flash Scale-out Storage Systems Effective Capacity Increasing Performance Up Requirements for All-IT Network/Infra Storage System Scalability Availability Performance
  • 3. 2 What is Ceph? http://docs.ceph.com/docs/master/_images/stack.png Object Virtual Disk Files & DirsObject App App Host/VM Client • Ceph is a unified, distributed, massively scalable open source storage solution  Object, Block and File storage • mostly LGPL open source project • Failure is normal • Self managing • Scale out on commodity hardware • Everything runs in software
  • 4. 3 Ceph Architecture OSD Cluster Maps Direct IO between clients and OSDs Service Network Storage Network Ceph Storage System OSD OSD KVM librbd Application krbd Application librados Monitor Monitor Monitor OSD Cluster Maps
  • 5. 4 Ceph Operation: Ceph Block Device PG#0 OSD #1 Disk or RAID Group XFS Journal 1. O_DIRECT 2. O_DSYNC 2. Buffered I/O OSD #0 Synchronous Replication FileStore PG#1 PG#2 PG#3 Data PG#2 librbd librados OSD Service OSD #0 Application 데이터 배치 : CRUSH 알고리즘  Ceph Block Device 고정 크기 (기본: 4MB) Object의 연속 예) 1GB Block Image = 256 Objects Hash: Object to PG
  • 6. 5 Ceph OSD 노드 구성  Journal / Data 디스크 구성 • 일반적인 조합 (Journal / Data)  SSD / HDD  외부 저널 디스크 없음 / SSD  PCIe SSD / SATA SSD  NVRAM / SSD 0 5 10 15 20 25 30 35 0 20000 40000 60000 80000 100000 SSD NVRAM ms IOPS 4KB Random Write IOPS Latency  노드 별 OSD 개수 • OSD: 1 OSD per (1 DISK or 1 RAID GROUP) • Ceph OSD Daemon  CPU-intensive processes 0 20 40 60 80 100 120 0 5000 10000 15000 20000 25000 30000 35000 3 OSDs 4 OSDs 6 OSDs 8 OSDs 12 OSDs ms IOPS 4KB Random Write IOPS Latency Journal Type
  • 7. 6 Ceph on All-Flash Performance Issues 6945 11585 14196 15885 15298 16603 2.9 3.5 5.7 10.1 21.9 39.7 0 5 10 15 20 25 30 35 40 45 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 4 8 16 32 64 128 ms IOPS Thread count 4KB Random Write IOPS IOPS Latency 3716 4910 7243 13694 80062 1020905.4 8.1 11.0 11.7 5.9 10.9 0 2 4 6 8 10 12 14 0 50,000 100,000 150,000 200,000 250,000 300,000 4 8 16 32 64 128 ms IOPS Thread count 4KB Random Read IOPS IOPS Latency  Issue: Low Throughput & High Latency • SSD Spec.  4KB Random Read: up to 95K IOPS  4KB Random Write: up to 85K IOPS (Sustained 14K IOPS)  Latency < 1ms • Ideal Throughput  4KB Random Read: 95K * 10EA * 4 Node  3800K IOPS  4KB Random Write: 14K * 10EA * 4 Node / 2 Replication  280K IOPS Sustained Performance: 2x Write(80% of Usable Space) 5 Clients use krbd 측정치 Ideal 4KB Random Read 102K IOPS 3800K IOPS 4KB Random Write 17K IOPS 280K IOPS
  • 8. 7 Ceph IO Flow in OSD ObjectStore JournalingObjectStore KeyValueStore MemStore BlueStore FileStore PGBackend ReplicatedBackend ECBackend PG 1. Journal: LIBAIO (O_DIRECT && O_DSYNC)  Committed 2. Data: Buffered IO and syncfs() later  Applied OSD Messenger
  • 9. 8 Ceph Write IO Flow: Receiving Request OSD Secondary OSD 5. Send Rep Operations to 2nd OSD Operation Threads L: ReplicatedBackend FileStore L: Messenger Operation Threads FileStore 2. Receive Write Req. 4. Do Operation 3. Queue Op. Queue Rep Op. Operation WQ Operation WQ L: Messenger Client Cluster Network Public Network 1. Send Write Req. 6. Enqueue Transaction to FileStore PG Lock PG Unlock PG Lock PG Unlock
  • 10. 9 Ceph Write IO Flow: in File Store File Store (Data) File Journal Operation Threads Writer Thread Committed Data Disk Journal Disk Operation Threads 1.Queue Transaction 2. Operate Journal Transactions 5. Queue op 6. Queue to Finisher Finisher Thread writeq Operation WQ 3. Write to Journal Disk Write Finisher Thread 4. AIO Complete Journal and Data completion? Finisher Thread 7. Write Data 8. Queue to Finisher Applied PG Lock PG Lock PG Unlock Send RepOp Reply to Primary if this is secondary OSD
  • 11. 10 최적화 항목 이슈 PG Lock  전체 Latency 중 30% 이상 이 PG Lock을 얻는데 소모 • OP Processing Worker Thread가 Block되어 관련 없는 OP 의 처리가 늦어짐 • 큰 PG Lock의 Critical Section • Secondary OSD의 ACK 처리가 늦어져 IO Latency 증가 Ceph & System Tuning  성능 측정 도중 결과값의 기 복이 큼 • Ceph 설정 변수: 개별 변경은 효과가 없고 최적 조합이 필요 • Memory Allocator의 CPU 사용량이 높음 • TCP/IP Nagle 알고리즘 Log  Log 비활성화 여부에 따라 성능 변화가 큼 • OSD의 I/O 처리 과정에서 Log로 인한 시간 소모 Transaction  Transaction 처리가 성능에 큰 영향 • Transaction 처리 비효율: 불필요한 연산, Lock Contention
  • 12. 11 VM 성능: 실험 환경 Service Network (10GbE) Storage Network (10GbE) Physical Client (x 5) Vender / Model DELL R720XD Processor Intel® Xeon® E5-2670v3 @ 2.60GHz x 2 (10core) Memory 128GB OS CentOS 7.0 OSD Node/Monitor (x 4) Vender / Model DELL R630 Processor Intel® Xeon® E5-2690v3 @ 2.60GHz x 2 (12core) Memory 128GB NIC 10Gbe OS CentOS 7.0 JOURNAL RAMDISK Switch (x 2) Vender / Model Cisco nexus 5548UP 10G Disk SSD SK Hynix SSD 480GB 10개 / OSD Node RAID RAID 0, SSD 3개, 3개, 2개, 2개 (4 RAID Group) - Device(4개) & Daemon(4개) / OSD Node Ceph Version SKT Ceph와 Community(0.94.4) VM (x Physical Client 당 최대 4개) Guest OS Spec 2 Core, 4 GB memory librbd FIO Test Configuration Run Time 300 Ramp Time 10 Threads 8 Queue Depth 8 Sustained Performance: 2x Write(80% of Usable Space)
  • 13. 12 VM 성능 비교 : Random Workload 71 3 43 3 185 114 118 71 3.4 5.7 2.7 5.5 3.5 3.4 2.0 2.5 0 1 2 3 4 5 6 7 8 9 10 0 20 40 60 80 100 120 140 160 180 200 4KBRWSKT CEPH 4KBRW Community 32KBRWSKT CEPH 32KBRW Community 4KBRRSKT CEPH 4KBRR Community 4KBRRSKT CEPH 4KBRR Community ms KIOPS ■ SKT CEPH IOPS ■ Community IOPS ◆ Latency
  • 14. 13 VM 성능 비교 : Sequential Workload 2,669 2,729 2,768 2,948 4,287 4,281 4,281 4,296 59.7 28.3 172.4 425.2 73.2 36.7 293.6 292.7 0 50 100 150 200 250 300 350 400 450 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 1MBSWSKT CEPH 1MBSW Community 4MBSWSKT CEPH 4MBSW Community 1MBSRSKT CEPH 1MBSR Community 4MBSRSKT CEPH 4MBSR Community ms MB/s ■ SKT CEPH BW ■ Community BW ◆ Latency
  • 15. 14 SKT AF-Ceph AFC-S: 4 Data Node + 1 Management Node (Commodity Server & SSD 기반) Monitor Node (관리 서버) Data Node (OSD Node) NVRAM Journal SATA SSD 10ea Data Store System Configuration 구성 4 Data Node + 1 Monitor Node 상면 5U SSD SATA SSD 40 ea (in 4U) NVRAM 8GB NVRAM 용량 Total 40TB / Usable 20TB (w/ 1TB SSD) Total 80TB / Usable 40TB (w/ 2TB SSD) Node H/W CPU Intel Xeon E5 2690v3 2-socket RAM 128GB (DDR3 1866MHz) Network 10GbE x 2 for Service & Storage … AFC-N: 2U MicroServer (4 Data Node) + 1U NVMe All-Flash JBOF … NV-Array (All-Flash JBOF) NV-Drive (NVMe SSD) E5 2-socket Server (4 Nodes in 2U) • 고성능(PCIe 3.0) • 고집적(2.5” NVMe SSD 24EA: Up to 96TB) • ‘16. 4Q 예정
  • 16. 15 SKT AF-Ceph Real Time Monitoring Multi Dashboard Rule Base Alarm Drag & Drop Admin Rest API Real-time Graph Graph Merge Drag & Zooming Auto Configuration Cluster Management RBD Management Object Storage Management