SlideShare a Scribd company logo
1 of 31
Download to read offline
Upgrading CentOS on the
Facebook fleet
Davide Cavalca
Production Engineer
• Infrastructure primer
• OS evolution
• Road to CentOS 8
Agenda
Infrastructure
• OS team manages the bare metal experience of the fleet
• OS as a platform
• Individual teams are responsible for their own hosts
• Built on an Open Source foundation
• Linux, CentOS, rpm/yum/dnf, Chef, systemd
Infrastructure
How does it work?
• Community sets the direction
• We move fast; opensource often moves faster
• We don’t need to write everything ourselves
• Sharing our code means sharing the maintenance and
having others extend it
• DevConf.CZ 2017 talk: https://tinyurl.com/y7gx6nro
Infrastructure
Upstream first
• Stable releases
• Binary compatibility
• Security updates
• Mature and well understood tooling
• EPEL
• Close relationship with Fedora
Infrastructure
Why CentOS?
• Backports from Fedora Rawhide for stuff we care about
• Mostly plumbing and low-level packages
• %facebook macro to gate internal stuff
• GitHub: facebookincubator/rpm-backports
• CentOS + FTL = stable distro, moving fast
Infrastructure
FTL – Fast Thin Layer
OS evolution
• CentOS 5 6 (~2013-2016), 6 7 (2016-2018)→ →
• No in-place updates: reprovision the host from scratch
• Clean slate to ensure a good state
• Opportunity to deprecate unwanted features or tools
OS updates
Major updates
• Incremental Rolling OS updates
• Every two weeks we sync down the latest updates…
• …and roll them out over two weeks
• ‘yum upgrade’ kicked off via fb_yum in Chef
• Easy stop button and opt out for individual packages
Rolling OS updates
Minor releases and security updates
• About a year from initial PoC to first production machine
• About two years to migrate 100% of the fleet
• Bulk work: systemd conversion, validation, reprovisioning
• Stateless vs stateful services
• Last hour surprises: regressions and hidden dependencies
• DevConf 2018 talk: https://tinyurl.com/yawmjp74
CentOS 6 to 7
This day in 2018...
• Widespread systemd adoption
• More workloads moving to containers
• Switch to image-based provisioning
• Packaging improvements
• Increased community involvement
After CentOS 7
What we’ve been working on
• Running our systemd backport on the fleet
• 243 everywhere, 244 in testing
• Internal CI/CD pipeline for regression testing
• GitHub: facebookincubator/systemd-compat-libs
• GitHub: facebookincubator/pystemd
• All Systems Go 2019 talk: https://tinyurl.com/v7lxmq3
After CentOS 7
systemd
• Global service.d dropins (PR#13942)
• DefaultMemory{Low,Min} (PR#12211)
• DisableControllers (PR#10567)
• ExecCondition (PR#12933)
• PrivateUsers for unprivileged user managers (PR#13823)
• systemd-internal cgroup limits validation (PR#13690)
After CentOS 7
systemd feature development
• dcrpm: automate detection and remediation of issues
• GitHub: facebookincubator/dcrpm
• rpmdb corruption, stuck processes, etc.
• Works on Linux and OSX (!)
• Runs before every Chef run
After CentOS 7
RPM improvements: mitigation
• Beyond bdb: A/B testing new database backends
• ndb vs lmdb: goodbye rpmdb corruption!
• lmdb issues: hardcoded size (PR#902), locking, key size
limits (PR#899), ~2x timeouts vs ndb
• CentOS Dojo Boston 2019 talk: https://tinyurl.com/r9txeo7
• Fleet is 100% on ndb as of Jan 2020
After CentOS 7
RPM improvements: database
• Experimenting with CoW to speed up package installs
• cpio -> aligned extent data with no compression (kinda)
• RPM plugin uses reflinking to obtain file data
• RPM transcoder proxy to convert prebuilt packages
• Still in heavy development, details tbd
• Also: xz zstd as default compression→
After CentOS 7
RPM improvements: file format
Road to CentOS 8
• Goal: front-load as much bootstrapping work as possible
• RHEL as a proxy for CentOS
• What’s new, what’s different, what’s going to break
• One month from release to minimal deployment
• Two month from minimal deployment to dev environment
• CentOS Dojo Brussels 2019: https://tinyurl.com/qqkb8ns
RHEL 8 Beta
Bootstrapping a pilot
• Importing the package repositories
• Bootstrapping a base image for the installer
• Package changes: grub, network-scripts, python
• Missing packages and CodeReady Linux Builder
• Porting the internal package build pipeline
• Modularity surprises
RHEL 8 Beta
Packaging and provisioning
• node.centos? on RHEL
• Chasing hardcoded logic (e.g. node.centos7?)
• Package resources: yum_package vs dnf_package
• Package cache: YumCache vs PythonHelper
• DNF provider teething issues (Chef PR#8005 PR#8754)
RHEL 8 Beta
Chef bringup
• Release notes, internal comms prep work
• Continue productionizing the pilot
• Mostly waiting while obsessively refreshing
https://wiki.centos.org/About/Building_8
RHEL 8 Release
After the pilot
• CentOS 8 and CentOS Stream
• New repos: PowerTools and EPEL-playground
• Streamlining rolling OS updates
• About a month from release to open testing
• Began engaging partners and planning migration schedule
• Feel migration started in earnest in Jan
CentOS 8 and CentOS Stream
Release time!
• Based on the CentOS Stream repositories
• Using our kernel and systemd backport
• btrfs on / by default
• cgroup2 only
CentOS 8 at Facebook
What’s different
• Sharding for default OS settings in provisioning
• Reuse kernel upgrades tooling to automate host reimaging
• Automated progress tracking
• OS team acts as consulting partner
CentOS 8 at Facebook
Migration process and tooling
• No 32bit altarch release
• Python packaging changes
• Repository layout changes in EPEL
• nobody/nfsnobody UID change
• Modularity: build pipeline, overrides
CentOS 8 migration
Migration issues so far
• Targeting CentOS 7 EOS by June and EOL by December
• CentOS 8 container base images
• Wrap up the ndb conversion and make it the default
• Productionize and upstream the RPM CoW work
• ???
CentOS 8 migration
What’s next
Questions?
Upgrading CentOS Fleet at Facebook

More Related Content

What's hot

Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerBob Killen
 
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...CloudBees
 
.NET on Linux: Entity Framework Core 1.0
.NET on Linux: Entity Framework Core 1.0.NET on Linux: Entity Framework Core 1.0
.NET on Linux: Entity Framework Core 1.0All Things Open
 
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...CloudBees
 
Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Keith Resar
 
Rebuild - Simplifying Embedded and IoT Development Using Linux Containers
Rebuild - Simplifying Embedded and IoT Development Using Linux ContainersRebuild - Simplifying Embedded and IoT Development Using Linux Containers
Rebuild - Simplifying Embedded and IoT Development Using Linux ContainersLinuxCon ContainerCon CloudOpen China
 
Putting The PaaS in OpenStack with Diane Mueller @RedHat
Putting The PaaS in OpenStack with Diane Mueller @RedHat Putting The PaaS in OpenStack with Diane Mueller @RedHat
Putting The PaaS in OpenStack with Diane Mueller @RedHat OpenShift Origin
 
Docker and the K computer
Docker and the K computerDocker and the K computer
Docker and the K computerPeter Bryzgalov
 
LinuxKit and Moby, news from DockerCon 2017 - Austin,TX
LinuxKit and Moby, news from DockerCon 2017 - Austin,TXLinuxKit and Moby, news from DockerCon 2017 - Austin,TX
LinuxKit and Moby, news from DockerCon 2017 - Austin,TXDieter Reuter
 
"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021
"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021
"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021Freddy Rolland
 
DockerCon EU 2015: Deploying and Managing Containers for Developers
DockerCon EU 2015: Deploying and Managing Containers for DevelopersDockerCon EU 2015: Deploying and Managing Containers for Developers
DockerCon EU 2015: Deploying and Managing Containers for DevelopersDocker, Inc.
 
2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpiQNIB Solutions
 
Understanding the Docker ecosystem
Understanding the Docker ecosystemUnderstanding the Docker ecosystem
Understanding the Docker ecosystemKiratech
 
Innovating Out In The Open - OSCON 2016
Innovating Out In The Open - OSCON 2016Innovating Out In The Open - OSCON 2016
Innovating Out In The Open - OSCON 2016Phil Estes
 
Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...
Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...
Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...OpenShift Origin
 
John Engates Keynote at Dockercon 14
John Engates Keynote at Dockercon 14John Engates Keynote at Dockercon 14
John Engates Keynote at Dockercon 14dotCloud
 
Infrastructure-as-Code and CI Infrastructure at OpenStack
Infrastructure-as-Code and CI Infrastructure at OpenStackInfrastructure-as-Code and CI Infrastructure at OpenStack
Infrastructure-as-Code and CI Infrastructure at OpenStackAndreas Jaeger
 
Puppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE LvivPuppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE Lvivzenyk
 

What's hot (20)

Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
 
.NET on Linux: Entity Framework Core 1.0
.NET on Linux: Entity Framework Core 1.0.NET on Linux: Entity Framework Core 1.0
.NET on Linux: Entity Framework Core 1.0
 
Containers: Anti Pattern
Containers:  Anti PatternContainers:  Anti Pattern
Containers: Anti Pattern
 
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
JUC Europe 2015: The Famous Cows of Cambridge: A Non-Standard Use Case for Je...
 
Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Container Storage Best Practices in 2017
Container Storage Best Practices in 2017
 
Rebuild - Simplifying Embedded and IoT Development Using Linux Containers
Rebuild - Simplifying Embedded and IoT Development Using Linux ContainersRebuild - Simplifying Embedded and IoT Development Using Linux Containers
Rebuild - Simplifying Embedded and IoT Development Using Linux Containers
 
Putting The PaaS in OpenStack with Diane Mueller @RedHat
Putting The PaaS in OpenStack with Diane Mueller @RedHat Putting The PaaS in OpenStack with Diane Mueller @RedHat
Putting The PaaS in OpenStack with Diane Mueller @RedHat
 
Docker and the K computer
Docker and the K computerDocker and the K computer
Docker and the K computer
 
LinuxKit and Moby, news from DockerCon 2017 - Austin,TX
LinuxKit and Moby, news from DockerCon 2017 - Austin,TXLinuxKit and Moby, news from DockerCon 2017 - Austin,TX
LinuxKit and Moby, news from DockerCon 2017 - Austin,TX
 
"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021
"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021
"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021
 
DockerCon EU 2015: Deploying and Managing Containers for Developers
DockerCon EU 2015: Deploying and Managing Containers for DevelopersDockerCon EU 2015: Deploying and Managing Containers for Developers
DockerCon EU 2015: Deploying and Managing Containers for Developers
 
2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi
 
Understanding the Docker ecosystem
Understanding the Docker ecosystemUnderstanding the Docker ecosystem
Understanding the Docker ecosystem
 
Rexdockercon2017
Rexdockercon2017Rexdockercon2017
Rexdockercon2017
 
Innovating Out In The Open - OSCON 2016
Innovating Out In The Open - OSCON 2016Innovating Out In The Open - OSCON 2016
Innovating Out In The Open - OSCON 2016
 
Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...
Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...
Social IRC bots in the Cloud with OpenShift - Mongo London presentation by Ma...
 
John Engates Keynote at Dockercon 14
John Engates Keynote at Dockercon 14John Engates Keynote at Dockercon 14
John Engates Keynote at Dockercon 14
 
Infrastructure-as-Code and CI Infrastructure at OpenStack
Infrastructure-as-Code and CI Infrastructure at OpenStackInfrastructure-as-Code and CI Infrastructure at OpenStack
Infrastructure-as-Code and CI Infrastructure at OpenStack
 
Puppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE LvivPuppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE Lviv
 

Similar to Upgrading CentOS Fleet at Facebook

Building community with CentOS Stream
Building community with CentOS StreamBuilding community with CentOS Stream
Building community with CentOS StreamDavide Cavalca
 
What's new with CentOS at Facebook
What's new with CentOS at FacebookWhat's new with CentOS at Facebook
What's new with CentOS at FacebookDavide Cavalca
 
Running CentOS on the Facebook fleet
Running CentOS on the Facebook fleetRunning CentOS on the Facebook fleet
Running CentOS on the Facebook fleetDavide Cavalca
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
MoldCamp - multidimentional testing workflow. CIBox.
MoldCamp  - multidimentional testing workflow. CIBox.MoldCamp  - multidimentional testing workflow. CIBox.
MoldCamp - multidimentional testing workflow. CIBox.Andrii Podanenko
 
Docker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshDocker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshCodefresh
 
Symfony under control. Continuous Integration and Automated Deployments in Sy...
Symfony under control. Continuous Integration and Automated Deployments in Sy...Symfony under control. Continuous Integration and Automated Deployments in Sy...
Symfony under control. Continuous Integration and Automated Deployments in Sy...Max Romanovsky
 
Symfony Under Control by Maxim Romanovsky
Symfony Under Control by Maxim RomanovskySymfony Under Control by Maxim Romanovsky
Symfony Under Control by Maxim Romanovskyphp-user-group-minsk
 
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
The Latest Status of CE Workgroup Shared Embedded Linux Distribution ProjectYoshitake Kobayashi
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
Unikernel User Summit 2015: Getting started in unikernels using the rump kernel
Unikernel User Summit 2015: Getting started in unikernels using the rump kernelUnikernel User Summit 2015: Getting started in unikernels using the rump kernel
Unikernel User Summit 2015: Getting started in unikernels using the rump kernelThe Linux Foundation
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloSUSE España
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningDataWorks Summit
 
Embedded Linux Build Systems - Texas Linux Fest 2018
Embedded Linux Build Systems - Texas Linux Fest 2018Embedded Linux Build Systems - Texas Linux Fest 2018
Embedded Linux Build Systems - Texas Linux Fest 2018Mender.io
 
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchAugust Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchHoward Greenberg
 
Zephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdfZephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdfAswathRangaraj1
 

Similar to Upgrading CentOS Fleet at Facebook (20)

Building community with CentOS Stream
Building community with CentOS StreamBuilding community with CentOS Stream
Building community with CentOS Stream
 
CentOS at Facebook
CentOS at FacebookCentOS at Facebook
CentOS at Facebook
 
What's new with CentOS at Facebook
What's new with CentOS at FacebookWhat's new with CentOS at Facebook
What's new with CentOS at Facebook
 
Running CentOS on the Facebook fleet
Running CentOS on the Facebook fleetRunning CentOS on the Facebook fleet
Running CentOS on the Facebook fleet
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
MoldCamp - multidimentional testing workflow. CIBox.
MoldCamp  - multidimentional testing workflow. CIBox.MoldCamp  - multidimentional testing workflow. CIBox.
MoldCamp - multidimentional testing workflow. CIBox.
 
Docker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshDocker based-Pipelines with Codefresh
Docker based-Pipelines with Codefresh
 
Symfony under control. Continuous Integration and Automated Deployments in Sy...
Symfony under control. Continuous Integration and Automated Deployments in Sy...Symfony under control. Continuous Integration and Automated Deployments in Sy...
Symfony under control. Continuous Integration and Automated Deployments in Sy...
 
Symfony Under Control by Maxim Romanovsky
Symfony Under Control by Maxim RomanovskySymfony Under Control by Maxim Romanovsky
Symfony Under Control by Maxim Romanovsky
 
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Unikernel User Summit 2015: Getting started in unikernels using the rump kernel
Unikernel User Summit 2015: Getting started in unikernels using the rump kernelUnikernel User Summit 2015: Getting started in unikernels using the rump kernel
Unikernel User Summit 2015: Getting started in unikernels using the rump kernel
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y Modelo
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
Embedded Linux Build Systems - Texas Linux Fest 2018
Embedded Linux Build Systems - Texas Linux Fest 2018Embedded Linux Build Systems - Texas Linux Fest 2018
Embedded Linux Build Systems - Texas Linux Fest 2018
 
1 git-overview
1 git-overview1 git-overview
1 git-overview
 
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchAugust Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
 
Zephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdfZephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdf
 

More from Davide Cavalca

Hyperscale SIG Introduction
Hyperscale SIG IntroductionHyperscale SIG Introduction
Hyperscale SIG IntroductionDavide Cavalca
 
systemd @ Facebook in 2019
systemd @ Facebook in 2019systemd @ Facebook in 2019
systemd @ Facebook in 2019Davide Cavalca
 
State of systemd @ Facebook
State of systemd @ FacebookState of systemd @ Facebook
State of systemd @ FacebookDavide Cavalca
 
systemd @ Facebook -- a year later
systemd @ Facebook -- a year latersystemd @ Facebook -- a year later
systemd @ Facebook -- a year laterDavide Cavalca
 
Building Better FLOSS Community Relationships @ FB
Building Better FLOSS Community Relationships @ FBBuilding Better FLOSS Community Relationships @ FB
Building Better FLOSS Community Relationships @ FBDavide Cavalca
 
Building Better FLOSS Community Relationships @ FB
Building Better  FLOSS Community Relationships @ FBBuilding Better  FLOSS Community Relationships @ FB
Building Better FLOSS Community Relationships @ FBDavide Cavalca
 
Deploying systemd at scale
Deploying systemd at scaleDeploying systemd at scale
Deploying systemd at scaleDavide Cavalca
 

More from Davide Cavalca (9)

Hyperscale SIG update
Hyperscale SIG updateHyperscale SIG update
Hyperscale SIG update
 
Hyperscale SIG update
Hyperscale SIG updateHyperscale SIG update
Hyperscale SIG update
 
Hyperscale SIG Introduction
Hyperscale SIG IntroductionHyperscale SIG Introduction
Hyperscale SIG Introduction
 
systemd @ Facebook in 2019
systemd @ Facebook in 2019systemd @ Facebook in 2019
systemd @ Facebook in 2019
 
State of systemd @ Facebook
State of systemd @ FacebookState of systemd @ Facebook
State of systemd @ Facebook
 
systemd @ Facebook -- a year later
systemd @ Facebook -- a year latersystemd @ Facebook -- a year later
systemd @ Facebook -- a year later
 
Building Better FLOSS Community Relationships @ FB
Building Better FLOSS Community Relationships @ FBBuilding Better FLOSS Community Relationships @ FB
Building Better FLOSS Community Relationships @ FB
 
Building Better FLOSS Community Relationships @ FB
Building Better  FLOSS Community Relationships @ FBBuilding Better  FLOSS Community Relationships @ FB
Building Better FLOSS Community Relationships @ FB
 
Deploying systemd at scale
Deploying systemd at scaleDeploying systemd at scale
Deploying systemd at scale
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 

Upgrading CentOS Fleet at Facebook

  • 1.
  • 2. Upgrading CentOS on the Facebook fleet Davide Cavalca Production Engineer
  • 3. • Infrastructure primer • OS evolution • Road to CentOS 8 Agenda
  • 5.
  • 6. • OS team manages the bare metal experience of the fleet • OS as a platform • Individual teams are responsible for their own hosts • Built on an Open Source foundation • Linux, CentOS, rpm/yum/dnf, Chef, systemd Infrastructure How does it work?
  • 7. • Community sets the direction • We move fast; opensource often moves faster • We don’t need to write everything ourselves • Sharing our code means sharing the maintenance and having others extend it • DevConf.CZ 2017 talk: https://tinyurl.com/y7gx6nro Infrastructure Upstream first
  • 8. • Stable releases • Binary compatibility • Security updates • Mature and well understood tooling • EPEL • Close relationship with Fedora Infrastructure Why CentOS?
  • 9. • Backports from Fedora Rawhide for stuff we care about • Mostly plumbing and low-level packages • %facebook macro to gate internal stuff • GitHub: facebookincubator/rpm-backports • CentOS + FTL = stable distro, moving fast Infrastructure FTL – Fast Thin Layer
  • 11. • CentOS 5 6 (~2013-2016), 6 7 (2016-2018)→ → • No in-place updates: reprovision the host from scratch • Clean slate to ensure a good state • Opportunity to deprecate unwanted features or tools OS updates Major updates
  • 12. • Incremental Rolling OS updates • Every two weeks we sync down the latest updates… • …and roll them out over two weeks • ‘yum upgrade’ kicked off via fb_yum in Chef • Easy stop button and opt out for individual packages Rolling OS updates Minor releases and security updates
  • 13. • About a year from initial PoC to first production machine • About two years to migrate 100% of the fleet • Bulk work: systemd conversion, validation, reprovisioning • Stateless vs stateful services • Last hour surprises: regressions and hidden dependencies • DevConf 2018 talk: https://tinyurl.com/yawmjp74 CentOS 6 to 7 This day in 2018...
  • 14. • Widespread systemd adoption • More workloads moving to containers • Switch to image-based provisioning • Packaging improvements • Increased community involvement After CentOS 7 What we’ve been working on
  • 15. • Running our systemd backport on the fleet • 243 everywhere, 244 in testing • Internal CI/CD pipeline for regression testing • GitHub: facebookincubator/systemd-compat-libs • GitHub: facebookincubator/pystemd • All Systems Go 2019 talk: https://tinyurl.com/v7lxmq3 After CentOS 7 systemd
  • 16. • Global service.d dropins (PR#13942) • DefaultMemory{Low,Min} (PR#12211) • DisableControllers (PR#10567) • ExecCondition (PR#12933) • PrivateUsers for unprivileged user managers (PR#13823) • systemd-internal cgroup limits validation (PR#13690) After CentOS 7 systemd feature development
  • 17. • dcrpm: automate detection and remediation of issues • GitHub: facebookincubator/dcrpm • rpmdb corruption, stuck processes, etc. • Works on Linux and OSX (!) • Runs before every Chef run After CentOS 7 RPM improvements: mitigation
  • 18. • Beyond bdb: A/B testing new database backends • ndb vs lmdb: goodbye rpmdb corruption! • lmdb issues: hardcoded size (PR#902), locking, key size limits (PR#899), ~2x timeouts vs ndb • CentOS Dojo Boston 2019 talk: https://tinyurl.com/r9txeo7 • Fleet is 100% on ndb as of Jan 2020 After CentOS 7 RPM improvements: database
  • 19. • Experimenting with CoW to speed up package installs • cpio -> aligned extent data with no compression (kinda) • RPM plugin uses reflinking to obtain file data • RPM transcoder proxy to convert prebuilt packages • Still in heavy development, details tbd • Also: xz zstd as default compression→ After CentOS 7 RPM improvements: file format
  • 21. • Goal: front-load as much bootstrapping work as possible • RHEL as a proxy for CentOS • What’s new, what’s different, what’s going to break • One month from release to minimal deployment • Two month from minimal deployment to dev environment • CentOS Dojo Brussels 2019: https://tinyurl.com/qqkb8ns RHEL 8 Beta Bootstrapping a pilot
  • 22. • Importing the package repositories • Bootstrapping a base image for the installer • Package changes: grub, network-scripts, python • Missing packages and CodeReady Linux Builder • Porting the internal package build pipeline • Modularity surprises RHEL 8 Beta Packaging and provisioning
  • 23. • node.centos? on RHEL • Chasing hardcoded logic (e.g. node.centos7?) • Package resources: yum_package vs dnf_package • Package cache: YumCache vs PythonHelper • DNF provider teething issues (Chef PR#8005 PR#8754) RHEL 8 Beta Chef bringup
  • 24. • Release notes, internal comms prep work • Continue productionizing the pilot • Mostly waiting while obsessively refreshing https://wiki.centos.org/About/Building_8 RHEL 8 Release After the pilot
  • 25. • CentOS 8 and CentOS Stream • New repos: PowerTools and EPEL-playground • Streamlining rolling OS updates • About a month from release to open testing • Began engaging partners and planning migration schedule • Feel migration started in earnest in Jan CentOS 8 and CentOS Stream Release time!
  • 26. • Based on the CentOS Stream repositories • Using our kernel and systemd backport • btrfs on / by default • cgroup2 only CentOS 8 at Facebook What’s different
  • 27. • Sharding for default OS settings in provisioning • Reuse kernel upgrades tooling to automate host reimaging • Automated progress tracking • OS team acts as consulting partner CentOS 8 at Facebook Migration process and tooling
  • 28. • No 32bit altarch release • Python packaging changes • Repository layout changes in EPEL • nobody/nfsnobody UID change • Modularity: build pipeline, overrides CentOS 8 migration Migration issues so far
  • 29. • Targeting CentOS 7 EOS by June and EOL by December • CentOS 8 container base images • Wrap up the ndb conversion and make it the default • Productionize and upstream the RPM CoW work • ??? CentOS 8 migration What’s next