SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
1 sysadmin vs 250 clusters
Etienne Menguy
SysadminDays
November 19, 2019
OVHcloud
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
2
1 500 000 customers
2200 employees
380 000 Bare-metal servers
Ceph at OVHcloud
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
3
Public Cloud
Virtual
machines
Additional
disks
Additional
disks
Additional
disks
Additional
disks
Cloud Disk Array
As A
Service
Evolution
„2015
• 4 dev
• 1 ops
• 8 clusters
• 4 regions
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
4
„2019
• 9 dev
• 250 clusters
• 10 regions
Daily work
„1 sysadmin
• Monitoring
• Prodding
• Support
• Training
• Deploying regions, servers
• And the daily surprises
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
5
8 devs
• Ceph as a service
• Infra as code
• Code review
• Tests
• R&D
Ceph setup
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
6
FlashcacheFlashcacheFlashcache
LXC
Data
LXC
Data
LXC
Data
NVME
Partition
Partition
Partition
x12
HDD
HDD
x12
HDD
Flashcache
LXC
Data
Bare-metal server
40Gbps NIC
Ceph as a service
„Autonomous users
• Creating cluster
• Managing users, pools, rights
• Managing network
• Cluster growth
„Backup management
• 500TB/day
• Ceph -> Swift
• Ceph -> Ceph
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
7
„Managing our infrastructure
• Cluster upgrade
• Deploy new ceph versions
• Manage tasks
• Host management
• Network management
• Containers management
Infrastructure
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
8
Serveurs
Conteneurs
VM
Instances
BDD
Puppet
API
Python
API
OVH
RabbitMQ
Celery
Task management
„ RabbitMQ
„ Celery
• https://github.com/ovh/celery-dyrygent
• Complex workflow
• Reliable
• Monitoring
• Web interface
• Planned tasks
• NVME replacement
• Self healing
• Triggered by monitoring probe
• Executes any operation
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
9
Example
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
10
start
Check
operation
safety
Lower disk
weight
Wait
cluster_health_ok
Remove disk
from cluster
Yes
No
Weight
equals 0
Continuous delivery
„CDS
• https://github.com/ovh/cds
„Each pull request
• Lint
• Unit test
„Daily prodding
• All tests executed
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
11
Infra as code
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
12
Inconsistent hardware
„Hardware profile
• 12 profils on production
• CPU
• NVME
• HDD
„Firmwares
„Ceph versions
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
13
• Generic tools
• 1 profile = 1 cluster
Monitoring
„ Automatic downtimes by tasks
„ Some alarms on working hours
„ Services/hosts aggregation
„ 143 000 services
„ 25 000 hosts
„ 3 infrastructures
• 6 masters
• 12 satellites
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
14
Metrics
„ Clusters metrics
• Usage
• Latency
„ Hardware
• Cpu, mermory usage
• Cache hit ratio
„ Service
• KPI
• Usage per openstack region
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
15
„ Metrics Data Platform
• https://www.ovh.com/fr/data-platforms/metrics/
„ 13 Millions series
„ 13 Billions points per day
„ Performance
• IO/s
• Latency
Logs
„ Infrastructure
• OS
• Ceph
„ Applications
• CAAS
• Celery / RabbitMQ
• Uniq step/task ID
„ API
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
16
„ Logs Data Platform
• https://www.ovh.com/fr/data-
platforms/logs/
„ 15 000 logs/second
„ Graylog
„ Filebeat
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
p ag e 17
Conclusion
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
p ag e 18
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...OpenNebula Project
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebula Project
 
Introducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud DatabasesIntroducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud DatabasesOVHcloud
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph Ceph Community
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsCeph Community
 
Disk health prediction for Ceph
Disk health prediction for CephDisk health prediction for Ceph
Disk health prediction for CephCeph Community
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Community
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Community
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerFernando Barrientos
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Danielle Womboldt
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Community
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Community
 

Was ist angesagt? (20)

VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
 
Introducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud DatabasesIntroducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud Databases
 
TDS-16489U - Dual Processor
TDS-16489U - Dual ProcessorTDS-16489U - Dual Processor
TDS-16489U - Dual Processor
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Disk health prediction for Ceph
Disk health prediction for CephDisk health prediction for Ceph
Disk health prediction for Ceph
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed Switch
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual Controller
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 

Ähnlich wie 1 sysadmin vs 250 clusters de stockage

Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph Community
 
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...Insight Technology, Inc.
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage serviceShapeBlue
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...donaghmccabe
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceEvan McGee
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
 
Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11ShapeBlue
 
Whats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlinesWhats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlinesShapeBlue
 
KACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewKACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewDell World
 
The 12 Factor App
The 12 Factor AppThe 12 Factor App
The 12 Factor Apprudiyardley
 
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltStack
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209mffiedler
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internalsTokyo Azure Meetup
 
Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackShapeBlue
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 

Ähnlich wie 1 sysadmin vs 250 clusters de stockage (20)

Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
 
Online spanish meetup #2
Online spanish meetup #2Online spanish meetup #2
Online spanish meetup #2
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage service
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11
 
Whats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlinesWhats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlines
 
KACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewKACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting Overview
 
The 12 Factor App
The 12 Factor AppThe 12 Factor App
The 12 Factor App
 
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStack
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Lattice yapc-slideshare
Lattice yapc-slideshareLattice yapc-slideshare
Lattice yapc-slideshare
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 

Mehr von OVHcloud

OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud
 
Webinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesWebinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesOVHcloud
 
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud
 
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud
 
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud
 
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud
 
OVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
Les APIs OpenStack
Les APIs OpenStackLes APIs OpenStack
Les APIs OpenStackOVHcloud
 
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...OVHcloud
 
Industrialize Machine Learning
Industrialize Machine Learning Industrialize Machine Learning
Industrialize Machine Learning OVHcloud
 
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...OVHcloud
 
Online passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattackOnline passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattackOVHcloud
 
OVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sectorOVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sectorOVHcloud
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?OVHcloud
 
Maximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructureMaximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructureOVHcloud
 
One year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloudOne year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloudOVHcloud
 
Rethinking the Public Cloud user experience
Rethinking the Public Cloud user experienceRethinking the Public Cloud user experience
Rethinking the Public Cloud user experienceOVHcloud
 

Mehr von OVHcloud (20)

OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
 
Webinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesWebinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud Databases
 
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
 
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
 
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
 
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
 
OVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML serving
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Les APIs OpenStack
Les APIs OpenStackLes APIs OpenStack
Les APIs OpenStack
 
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
 
Industrialize Machine Learning
Industrialize Machine Learning Industrialize Machine Learning
Industrialize Machine Learning
 
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
 
Online passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattackOnline passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattack
 
OVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sectorOVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sector
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?
 
Maximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructureMaximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructure
 
One year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloudOne year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloud
 
Rethinking the Public Cloud user experience
Rethinking the Public Cloud user experienceRethinking the Public Cloud user experience
Rethinking the Public Cloud user experience
 

Kürzlich hochgeladen

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Kürzlich hochgeladen (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

1 sysadmin vs 250 clusters de stockage

  • 1. 1 sysadmin vs 250 clusters Etienne Menguy SysadminDays November 19, 2019
  • 2. OVHcloud D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 2 1 500 000 customers 2200 employees 380 000 Bare-metal servers
  • 3. Ceph at OVHcloud D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 3 Public Cloud Virtual machines Additional disks Additional disks Additional disks Additional disks Cloud Disk Array As A Service
  • 4. Evolution „2015 • 4 dev • 1 ops • 8 clusters • 4 regions D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 4 „2019 • 9 dev • 250 clusters • 10 regions
  • 5. Daily work „1 sysadmin • Monitoring • Prodding • Support • Training • Deploying regions, servers • And the daily surprises D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 5 8 devs • Ceph as a service • Infra as code • Code review • Tests • R&D
  • 6. Ceph setup D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 6 FlashcacheFlashcacheFlashcache LXC Data LXC Data LXC Data NVME Partition Partition Partition x12 HDD HDD x12 HDD Flashcache LXC Data Bare-metal server 40Gbps NIC
  • 7. Ceph as a service „Autonomous users • Creating cluster • Managing users, pools, rights • Managing network • Cluster growth „Backup management • 500TB/day • Ceph -> Swift • Ceph -> Ceph D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 7 „Managing our infrastructure • Cluster upgrade • Deploy new ceph versions • Manage tasks • Host management • Network management • Containers management
  • 8. Infrastructure D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 8 Serveurs Conteneurs VM Instances BDD Puppet API Python API OVH RabbitMQ Celery
  • 9. Task management „ RabbitMQ „ Celery • https://github.com/ovh/celery-dyrygent • Complex workflow • Reliable • Monitoring • Web interface • Planned tasks • NVME replacement • Self healing • Triggered by monitoring probe • Executes any operation D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 9
  • 10. Example D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 10 start Check operation safety Lower disk weight Wait cluster_health_ok Remove disk from cluster Yes No Weight equals 0
  • 11. Continuous delivery „CDS • https://github.com/ovh/cds „Each pull request • Lint • Unit test „Daily prodding • All tests executed D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 11
  • 12. Infra as code D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 12
  • 13. Inconsistent hardware „Hardware profile • 12 profils on production • CPU • NVME • HDD „Firmwares „Ceph versions D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 13 • Generic tools • 1 profile = 1 cluster
  • 14. Monitoring „ Automatic downtimes by tasks „ Some alarms on working hours „ Services/hosts aggregation „ 143 000 services „ 25 000 hosts „ 3 infrastructures • 6 masters • 12 satellites D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 14
  • 15. Metrics „ Clusters metrics • Usage • Latency „ Hardware • Cpu, mermory usage • Cache hit ratio „ Service • KPI • Usage per openstack region D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 15 „ Metrics Data Platform • https://www.ovh.com/fr/data-platforms/metrics/ „ 13 Millions series „ 13 Billions points per day „ Performance • IO/s • Latency
  • 16. Logs „ Infrastructure • OS • Ceph „ Applications • CAAS • Celery / RabbitMQ • Uniq step/task ID „ API D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 16 „ Logs Data Platform • https://www.ovh.com/fr/data- platforms/logs/ „ 15 000 logs/second „ Graylog „ Filebeat
  • 17. D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er p ag e 17 Conclusion
  • 18. D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er p ag e 18 Questions?