Ceph Day Taipei - How ARM Microserver Cluster Performs in Ceph
1. How ARM based Microserver Cluster
Performs in CEPH
1
晨宇創新
Aaron 周振倫
2. Agenda
• About Ambedded
• What is the Issues of Using Single Server Node with Multiple Ceph
OSD?
• Use Single ARM Micro Server with One host to only Ceph OSD
• The benefits
• The basic High Availability Ceph Cluster
• Scale it Out
• Does Network matter?
• How fast it can self-heal a failed OSD?
• Ambedded make Ceph easy
• How much you can save on Energy
2
3. About Ambedded Technology
Y2013
Y2016
Y201
5
Y2014
Founded in Taiwan Taipei,
Office in National Taiwan University Innovative Innovation Center
Launch Gen 1 microserver architecture Storage Server
Product
Demo in ARM Global Partner Meeting UK Cambridge.
Partnership with European customer for the Cloud
Storage Service. Installed 1500+ microservers & 5.5PB
in operating since 2014
• Launch the 1st ever Ceph Storage Appliance powered by
Gen 2 ARM microserver
• Awarded as the 2016 Best of INTEROP Las Vegas
Storage product. Defeat VMware virtual SAN.
3
4. Issues of Using Single Server Node
with Multiple Ceph OSDs
• The smallest failure domain is the OSDs inside a server.
One Server fail causes many OSD down.
• CPU utility is 30%-40% only when network is saturated.
The bottleneck is network instead of computing.
• The power consumption and thermal heat is eating your
money
4
5. One OSD with one Micro Server
x N x N x N
Network
M
S
M
S
xN M
S
M
S
xN M
S
M
S
xNM
S
M
S
M
S
40Gb 40Gb 40Gb
Micro server
cluster
Micro server
cluster
Micro server
cluster
ARM micro server
cluster
- 1 to 1 to reduce
failure risk
- Aggregated network
bandwidth without
bottle neck
Traditional
Server #1
Traditional
Server #2
Traditional
Server #3
x N x N x N
Client #1 Client #2
Network
10Gb 10Gb 10Gb
Traditional server
- 1 to many cause
higher risk of a server
fail
- CPU utility low due
to Network bottle
neck
5
6. The Benefit of Using
1 Node to 1 OSD Architecture on CEPH
• True no single point of failure.
• The smallest failure domain is one OSD
• The MTBF of a micro server is much higher than a all-in-one mother
board
• Dedicate H/W resource to get stable OSD service
• Aggregate network bandwidth with failover
• Low power consumption and cooling cost
• OSD, MON, gateway are all in the same boxes.
• 3 units form a high availability cluster
6
7. Mars 200: 8-Node ARM Microserver Cluster
8x 1.6GHz ARM v7 Dual Core hot swappable microserver
- 2G Bytes DRAM
- 8G Bytes Flash
- 5 Gbps LAN
- < 5 Watts power consumption
Storage
- 8x hot swappable SATA3
HDD/SSD
- 8x SATA3 Journal SSD
300 Watts
Redundant
power supply
OOB BMC port
Dual hot swappable uplink
switches
- Total 4x 10 Gbps
- SFP+/10G Base-T Combo
7
9. Scale Out Test (SSD)
62,546
125,092
187,639
8,955
17,910
26,866
0
5,000
10,000
15,000
20,000
25,000
30,000
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
0 5 10 15 20 25
4K Read
4K Write
Number of OSDs
7 OSD
14 OSD
21 OSD
Random
Read
IOPS
Random
write
IOPS
9
10. Network does Matters
16x OSD
20Gb uplink 40Gb Uplink Increase
BW IOPS BW IOPS
4K Write 1 Client 7.2 1,800 11 2,824 57%
4K Write 2 Client 13 3,389 20 5,027 48%
4K Write 4 Client 22 5,570 35 8,735 57%
4K Write 10 Client 39 9,921 60 15,081 52%
4K Write 20 Client 53 13,568 79 19,924 47%
4K Write 30 Client 63 15,775 90 22,535 43%
4K Write 40 Client 68 16,996 96 24,074 42%
The purpose of this test is to know how much improvement if the uplink bandwidth is
increased from 20Gb to 40Gb. Mars 200 has 4x 10Gb uplinks ports. The test result
shows 42-57% improvement on IOPS.
10
11. OSD Self-Heal vs. RAID Re-build
11
Test Condition Microserver Ceph Cluster Disk Array
Disk number/capacity 16 x 10TB OSD 16 x 3TB disk
Data Protection Replica = 2 RAID 5
Data Stored in the disk 3TB Not related
Time for re-heal/re-build 5 hours, 10 min. 41 Hours
Administrator involve Re-heal activate automatically Re-build after replacing a
new disk
Re-heal vs. re-build Only the capacity of lost data
need re-heal
The whole disk capacity
need re-build
Re-heal time vs. total
number of disk
More disk - > less recover time More disk -> longer
recover time
12. Ceph Storage Appliance
12
ARM micro Server Ceph
Unified
Virtual
Storage
Manager
Ceph Storage
Appliance
2U 8 Nodes
Front Panel Disk
Access
1U 8 Nodes
High Density
13. We make Ceph Simple
Unified Virtual Storage Manager (UniVir Store)
13
Dashboard Cluster Manager CRUSH Map
13
14. What You Can do with UniVir Store
Deploy OSD, MON, MDS
Create Pool, RBD image, iSCSI LUN, S3 user
Support replica (1- 10) And Erasure Code (K+M)
OpenStack back storage management
Create CephFS
Snapshot, Clone, Flatten image
Crush Map configuration
CephX user access right management
Scale out your cluster
14
15. (200W-60W) x 24h x 365 days /1000 x $0.2
USD x 40 units X 2 (power & Cooling)
= $19,622/rack
This electricity cost is based on TW rate, it could be
double or triple in Japan or Germany
15
How Much You Can Save on Energy