Living with a Cephalopod: Daily Care & Feeding of Ceph Storage

SYSADMIN’S TOOLBOX
TOOLS FOR RUNNING CEPH IN
PRODUCTION
Paul Evans
principal architect
daystrom technology group
Paul at Daystrom dot com
san francisco
ceph day
March 12, 2015

WHAT’S IN THIS TALK
• Tools to help Understand Ceph Status
• Tips forTroubleshooting Faster
• Won’t Cover it All
• Maybe some Fun

HOW THINGS ARE IN THE
LAB
AND THEN THERE IS
PRODUCTION

Is there a Simple way to run Ceph that
isn’t Rocket Science?

WHAT COULD BE SIMPLER
THAN THE….CLI ?
Ceph’s CLI is Great, but…
REALITY: many Operations Teams juggle
too many technologies already…
Do they need to learn another CLI?

Need Info Fast? GUI
InkScope VSM
Calamariceph-dash

GUI TOOL OPTIONS
Calamari VSM InkScope ceph-dash
Backing From Red Hat Intel Orange Labs
Christian
Eichelmann
Lastest
Version 1.2.3 2014.12-0.9.1 1.1 1.0
Release
Date Sep 2014 Dec 2014 Jan 2015 Feb 2015
Capabilities
Monitor +
Light Config
Monitor +
Config
Monitor +
Light Config
Monitor Only
Compatability Wide Limited Wide Wide

MONITORING
Mon Status Y Y Y Y
OSD Status Y Y Y Y
OSD-Host Mapping Y Y Y Y
PG Status Y Y Y Y
PG-OSD Mapping N N Y N
MDS Status N Y Y N
Host Status Y Y Y Y
Capacity Utilization Y via Groups Y Y
Throughput (Cluster) N Y Y Y
IOPS (Cluster) Y Y Y Y
Errors/Warnings Y Y Y Y
View Logs Y N N N
Send Alerts (email) N N N via nagios plug-in
Charts/Graphs Y N N via nagios plug-in

MANAGEMENT
Deploy a Cluster N Y N N
Deploy Hosts (add/remove) N Y N N
Deploy Storage Groups
(create)
N Y N N
Cluster Services (daemons) OSD only Y N(?) N
Cluster Settings (ops ﬂags) Y N Y N
Cluster Settings (parameters) Y N View N
Cluster Settings (CRUSH
map/rrules)
N Partial View N
Cluster Settings (EC Proﬁles) N Y Y N
OSD (start/stop/in/out) Partial Y Y N
Pools (Replicated) Y (limited) Y Y N
Pools (EC &Tiering) N Y Partial N
RBDs N Partial N N
S3/Swift Users/Buckets N N Y N
Link to OpenStack Nova N Y N N

NOW THAT WE HAVE VISIBILITY….
What are we looking for?

WHAT WE WANT
✓ Monitor Quorum
✓ Working OSDs
✓ Happy Placement Groups

PGSTATES
✓ I’m Making Things Right
๏ I (may) Need Help
All Good - Bring Data

THE‘NOGUI’OPTION
ceph@m01:~$ ceph osd tree
# id weight type nameup/down reweight
-1 213.6 root default
-2 43.44 host s01
9 3.62 osd.9 down 0
10 3.62 osd.10 down 0
0 3.62 osd.0 down 0
5 3.62 osd.5 down 0
1 3.62 osd.1 down 0
7 3.62 osd.7 down 0
3 3.62 osd.3 down 0
4 3.62 osd.4 down 0
2 3.62 osd.2 down 0
11 3.62 osd.11 down 0
-4 43.44 host s03
24 3.62 osd.24 up 1
25 3.62 osd.25 up 1
26 3.62 osd.26 up 1
27 3.62 osd.27 up 1
28 3.62 osd.28 up 1
29 3.62 osd.29 up 1
30 3.62 osd.30 up 1
31 3.62 osd.31 up 1
32 3.62 osd.32 up 1
33 3.62 osd.33 up 1
ceph osd tree
ceph health detail
HEALTH_ERR 7 pgs degraded; 12 pgs down;
12 pgs peering; 1 pgs recovering; 6 pgs
stuck unclean; 114/3300 degraded
(3.455%); 1/3 in osds are down
...
pg 0.5 is down+peering
pg 1.4 is down+peering
...
osd.1 is down since epoch 69, last
address 192.168.106.220:6801/865
ceph health
ceph pg dump_stuck stale
ceph pg dump_stuck inactive
ceph pg dump_stuck unclean
ceph pg dump_stuck

THE‘NOGUI’OPTION
pg_stat objects bytes status up up_pri
10.f8 3042 25474961408 active+remapped [58,2147483647,24,20,55,59,2147483647,27] 58
10.7fa 3029 25375584256 active+remapped [51,20,60,28,2147483647,61,2147483647,11] 51
10.716 2990 25052532736 inactive [9,44,10,55,24,2147483647,47,2147483647] 9
ceph pg dump_stuck

VSM:
TROUBLESHOOTING
Repeated'auto*out'or'inability'
to'restart'auto*out'OSD'
suggests'failed'or'failing'disk'
VSM'periodically'probes'
drive'path'–'missing'drive'
path'missing'indicates'
complete'disk'failure'
A'set'of'auto*out'OSDs'
that'share'the'same'
journal'SSD'suggests'failed'
or'failing'journal'SSD'
VSM'periodically'probes'drive'
path'–'missing'drive'path'
indicates'complete'disk'(or'
controller)'failure'

VSM:
TROUBLESHOOTING Restore'OSDs
Managing&
Storage&Devices&
Restore&OSDs&
Wait&
Select&
Sort&
Conﬁrm&
Verify&
(may&need&to&
sort&again)&

When it’s time to go deep…
/var/log/ceph/ceph.log
/var/log/ceph/ceph-mon.[host].log
/var/log/ceph/ceph-osd.[xx].log
ceph tell osd.[xx] injectargs --debug-osd 0/5

REMINDER ABOUT CLUSTERS
Clusters rarely do things instantly.
Clusters can be like a Flock of Sheep - it
starts to move in the right directly
slowly and then picks up speed
(don’t run it off a cliff)

VSM:FASTDEPLOY Create&Cluster
Ge#ng&Started&
Create&new&Ceph&
cluster&
All&servers&
present&
Correct&subnets&
and&IP&addresses&
Correct&number&of&
disks&iden>ﬁed&
At&least&three&monitors&
&&odd&number&of&
monitors&
Servers&located&
in&correct&zone&
Servers&
responsive&
One&Zone&with&serverDlevel&
replica>on&

VSM:FASTDEPLOY Create&Cluster
Step%1%
Step%2:%Conﬁrm%

VSM:FASTDEPLOY
Create&Cluster&*&Status&Sequence
Ge#ng&Started&

IN TIME YOUR CLUSTER WILL
LEARN TO FOLLOW YOU

The 3 Keys to
Ceph in
Production
Happy PGs
Happy Monitors
Happy OSDs

thank you!
Paul Evans
principal architect
Paul at Daystrom dot com
technology grouptechnology group
san francisco
ceph days

Living with a Cephalopod: Daily Care & Feeding of Ceph Storage

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Living with a Cephalopod: Daily Care & Feeding of Ceph Storage

Similar to Living with a Cephalopod: Daily Care & Feeding of Ceph Storage (20)

Recently uploaded

Recently uploaded (20)

Living with a Cephalopod: Daily Care & Feeding of Ceph Storage