SlideShare a Scribd company logo
1 of 30
Download to read offline
SYSADMIN’S TOOLBOX
TOOLS FOR RUNNING CEPH IN
PRODUCTION
Paul Evans
principal architect
daystrom technology group
Paul at Daystrom dot com
san francisco
ceph day
March 12, 2015
WHAT’S IN THIS TALK
• Tools to help Understand Ceph Status
• Tips forTroubleshooting Faster
• Won’t Cover it All
• Maybe some Fun
HOW THINGS ARE IN THE
LAB
AND THEN THERE IS
PRODUCTION
Is there a Simple way to run Ceph that
isn’t Rocket Science?
WHAT COULD BE SIMPLER
THAN THE….CLI ?
Ceph’s CLI is Great, but…
REALITY: many Operations Teams juggle
too many technologies already…
Do they need to learn another CLI?
Need Info Fast? GUI
InkScope VSM
Calamariceph-dash
GUI TOOL OPTIONS
Calamari VSM InkScope ceph-dash
Backing From Red Hat Intel Orange Labs
Christian
Eichelmann
Lastest
Version 1.2.3 2014.12-0.9.1 1.1 1.0
Release
Date Sep 2014 Dec 2014 Jan 2015 Feb 2015
Capabilities
Monitor +
Light Config
Monitor +
Config
Monitor +
Light Config
Monitor Only
Compatability Wide Limited Wide Wide
MONITORING
Calamari VSM InkScope ceph-dash
Mon Status Y Y Y Y
OSD Status Y Y Y Y
OSD-Host Mapping Y Y Y Y
PG Status Y Y Y Y
PG-OSD Mapping N N Y N
MDS Status N Y Y N
Host Status Y Y Y Y
Capacity Utilization Y via Groups Y Y
Throughput (Cluster) N Y Y Y
IOPS (Cluster) Y Y Y Y
Errors/Warnings Y Y Y Y
View Logs Y N N N
Send Alerts (email) N N N via nagios plug-in
Charts/Graphs Y N N via nagios plug-in
MANAGEMENT
Calamari VSM InkScope ceph-dash
Deploy a Cluster N Y N N
Deploy Hosts (add/remove) N Y N N
Deploy Storage Groups
(create)
N Y N N
Cluster Services (daemons) OSD only Y N(?) N
Cluster Settings (ops flags) Y N Y N
Cluster Settings (parameters) Y N View N
Cluster Settings (CRUSH
map/rrules)
N Partial View N
Cluster Settings (EC Profiles) N Y Y N
OSD (start/stop/in/out) Partial Y Y N
Pools (Replicated) Y (limited) Y Y N
Pools (EC &Tiering) N Y Partial N
RBDs N Partial N N
S3/Swift Users/Buckets N N Y N
Link to OpenStack Nova N Y N N
CALAMARI
VIRTUALSTORAGE
MANAGER
CEPH-DASH
INKSCOPE
NOW THAT WE HAVE VISIBILITY….
What are we looking for?
WHAT WE WANT
✓ Monitor Quorum
✓ Working OSDs
✓ Happy Placement Groups
OSDWORKBENCH
PGSTATES
✓ I’m Making Things Right
๏ I (may) Need Help
All Good - Bring Data
THE‘NOGUI’OPTION
ceph@m01:~$ ceph osd tree
# id weight type nameup/down reweight
-1 213.6 root default
-2 43.44 host s01
9 3.62 osd.9 down 0
10 3.62 osd.10 down 0
0 3.62 osd.0 down 0
5 3.62 osd.5 down 0
1 3.62 osd.1 down 0
7 3.62 osd.7 down 0
3 3.62 osd.3 down 0
4 3.62 osd.4 down 0
2 3.62 osd.2 down 0
11 3.62 osd.11 down 0
-4 43.44 host s03
24 3.62 osd.24 up 1
25 3.62 osd.25 up 1
26 3.62 osd.26 up 1
27 3.62 osd.27 up 1
28 3.62 osd.28 up 1
29 3.62 osd.29 up 1
30 3.62 osd.30 up 1
31 3.62 osd.31 up 1
32 3.62 osd.32 up 1
33 3.62 osd.33 up 1
ceph osd tree
ceph health detail
HEALTH_ERR 7 pgs degraded; 12 pgs down;
12 pgs peering; 1 pgs recovering; 6 pgs
stuck unclean; 114/3300 degraded
(3.455%); 1/3 in osds are down
...
pg 0.5 is down+peering
pg 1.4 is down+peering
...
osd.1 is down since epoch 69, last
address 192.168.106.220:6801/865
ceph health
ceph pg dump_stuck stale
ceph pg dump_stuck inactive
ceph pg dump_stuck unclean
ceph pg dump_stuck
THE‘NOGUI’OPTION
pg_stat objects bytes status up up_pri
10.f8 3042 25474961408 active+remapped [58,2147483647,24,20,55,59,2147483647,27] 58
10.7fa 3029 25375584256 active+remapped [51,20,60,28,2147483647,61,2147483647,11] 51
10.716 2990 25052532736 inactive [9,44,10,55,24,2147483647,47,2147483647] 9
ceph pg dump_stuck
VSM:
TROUBLESHOOTING
Repeated'auto*out'or'inability'
to'restart'auto*out'OSD'
suggests'failed'or'failing'disk'
VSM'periodically'probes'
drive'path'–'missing'drive'
path'missing'indicates'
complete'disk'failure'
A'set'of'auto*out'OSDs'
that'share'the'same'
journal'SSD'suggests'failed'
or'failing'journal'SSD'
VSM'periodically'probes'drive'
path'–'missing'drive'path'
indicates'complete'disk'(or'
controller)'failure'
VSM:
TROUBLESHOOTING Restore'OSDs
 Managing&
Storage&Devices&
Restore&OSDs&
Wait&
Select&
Sort&
Confirm&
Verify&
(may&need&to&
sort&again)&
When it’s time to go deep…
/var/log/ceph/ceph.log
/var/log/ceph/ceph-mon.[host].log
/var/log/ceph/ceph-osd.[xx].log
ceph tell osd.[xx] injectargs --debug-osd 0/5
REMINDER ABOUT CLUSTERS
Clusters rarely do things instantly.
Clusters can be like a Flock of Sheep - it
starts to move in the right directly
slowly and then picks up speed
(don’t run it off a cliff)
VSM:FASTDEPLOY Create&Cluster
 Ge#ng&Started&
Create&new&Ceph&
cluster&
All&servers&
present&
Correct&subnets&
and&IP&addresses&
Correct&number&of&
disks&iden>fied&
At&least&three&monitors&
&&odd&number&of&
monitors&
Servers&located&
in&correct&zone&
Servers&
responsive&
One&Zone&with&serverDlevel&
replica>on&
VSM:FASTDEPLOY Create&Cluster
Step%1%
Step%2:%Confirm%
VSM:FASTDEPLOY
Create&Cluster&*&Status&Sequence
 Ge#ng&Started&
Remember to…
IN TIME YOUR CLUSTER WILL
LEARN TO FOLLOW YOU
The 3 Keys to
Ceph in
Production
Happy PGs
Happy Monitors
Happy OSDs
thank you!
Paul Evans
principal architect
Paul at Daystrom dot com
technology grouptechnology group
san francisco
ceph days

More Related Content

What's hot

Deep Dive In To Redis Replication: Vishy Kasar
Deep Dive In To Redis Replication: Vishy KasarDeep Dive In To Redis Replication: Vishy Kasar
Deep Dive In To Redis Replication: Vishy Kasar
Redis Labs
 

What's hot (20)

2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Deep Dive In To Redis Replication: Vishy Kasar
Deep Dive In To Redis Replication: Vishy KasarDeep Dive In To Redis Replication: Vishy Kasar
Deep Dive In To Redis Replication: Vishy Kasar
 
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...
 
Global deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon OhGlobal deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon Oh
 
SAOUG 2018 - Rapid Home Provisioning
SAOUG 2018 - Rapid Home ProvisioningSAOUG 2018 - Rapid Home Provisioning
SAOUG 2018 - Rapid Home Provisioning
 
Automatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You JiAutomatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You Ji
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra Summit
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
 
Ceph Performance Profiling and Reporting
Ceph Performance Profiling and ReportingCeph Performance Profiling and Reporting
Ceph Performance Profiling and Reporting
 
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin ZhangLinux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
 
Securing containers
Securing containersSecuring containers
Securing containers
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 

Similar to Living with a Cephalopod: Daily Care & Feeding of Ceph Storage

Tips on how to improve the performance of your custom modules for high volume...
Tips on how to improve the performance of your custom modules for high volume...Tips on how to improve the performance of your custom modules for high volume...
Tips on how to improve the performance of your custom modules for high volume...
Odoo
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 

Similar to Living with a Cephalopod: Daily Care & Feeding of Ceph Storage (20)

Puppet for Sys Admins
Puppet for Sys AdminsPuppet for Sys Admins
Puppet for Sys Admins
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
SDN Onboarding: Open vSwitch CLIs, OpenDaylight
SDN Onboarding: Open vSwitch CLIs, OpenDaylightSDN Onboarding: Open vSwitch CLIs, OpenDaylight
SDN Onboarding: Open vSwitch CLIs, OpenDaylight
 
Tips on how to improve the performance of your custom modules for high volume...
Tips on how to improve the performance of your custom modules for high volume...Tips on how to improve the performance of your custom modules for high volume...
Tips on how to improve the performance of your custom modules for high volume...
 
Rihards Olups - Zabbix at Nokia - Case Study
Rihards Olups - Zabbix at Nokia - Case StudyRihards Olups - Zabbix at Nokia - Case Study
Rihards Olups - Zabbix at Nokia - Case Study
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
A Developer’s Guide to Kubernetes Security
A Developer’s Guide to Kubernetes SecurityA Developer’s Guide to Kubernetes Security
A Developer’s Guide to Kubernetes Security
 
Enhance system transparency and truthfulness with request tracing
Enhance system transparency and truthfulness with request tracingEnhance system transparency and truthfulness with request tracing
Enhance system transparency and truthfulness with request tracing
 
Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1Neo4j Stored Procedure Training Part 1
Neo4j Stored Procedure Training Part 1
 
Go Replicator
Go ReplicatorGo Replicator
Go Replicator
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018
 
What we Learned Implementing Puppet at Backstop
What we Learned Implementing Puppet at BackstopWhat we Learned Implementing Puppet at Backstop
What we Learned Implementing Puppet at Backstop
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Testing and validating distributed systems with Apache Spark and Apache Beam ...
Testing and validating distributed systems with Apache Spark and Apache Beam ...Testing and validating distributed systems with Apache Spark and Apache Beam ...
Testing and validating distributed systems with Apache Spark and Apache Beam ...
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 

Recently uploaded

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Living with a Cephalopod: Daily Care & Feeding of Ceph Storage