6. • OS Team manages the bare metal experience of the fleet
• OS as a platform
• Individual teams are responsible for their own hosts
Infrastructure
How does it work?
7. • Working with the community on long-term direction
• We move fast; open source often moves faster
• We don’t need to write everything ourselves
• Sharing our code means sharing the maintenance and
having others extend it
• DevConf.CZ 2017 talk: tinyurl.com/y7gx6nro
Infrastructure
Upstream first
8. • Stable releases
• Binary compatibility
• Security updates
• Mature and well understood tooling
• EPEL
• Shared ecosystem with Fedora
Infrastructure
Why CentOS?
10. • Backports from Fedora Rawhide for stuff we care about
• Mostly plumbing and low-level packages
• %facebook macro to gate internal stuff
• GitHub: facebookincubator/rpm-backports
• CentOS + FTL = stable distro, moving fast
Infrastructure
FTL – Fast Thin Layer
11. • Rawhide backport of latest stable
• GitHub: facebookincubator/systemd-compat-libs
• Tracking upstream development in master
• New features: cgroup2, BPF, portable services, etc.
• fb_systemd and fb_timers cookbooks
• All Systems Go 2018 talk: tinyurl.com/ycecofb9
Infrastructure
Plumbing – systemd
12. • dbus is used for local IPC in systemd
• We run a backport of dbus-daemon
• Started testing dbus-broker, so far so good
• Still no way to upgrade/restart without rebooting
• pystemd: thin Cython wrapper on top of sd-bus
• Exposes sytemd dbus object model
• GitHub: facebookincubator/pystemd
Infrastructure
Plumbing – dbus
14. • Upstream kernel (or as close as possible)
• Development in master
• Internal stable and development branches
• Testing and rollout automation
• SCALE 15x talk: tinyurl.com/ybqgwn47
Infrastructure
Kernel
16. • Public mirror: mirror.facebook.net/centos/
• Upstream repos straight from CentOS
• Internal repos: site-packages and fb-runtime
Packaging
Server side – layout
17. • Stored in GlusterFS
• Served per-region via Apache2
• Cached by many varnish boxes
• Local CLIs + publisher to control input
Packaging
Server side – implementation
19. • yummy: mock wrapper with SCM integration
• Store specfiles + patches in source control
• Store blobs in GlusterFS-backed NFS
• Automation to sync to GitHub
• Can submit to central build system – required for signing
Packaging
Tooling – system packages
20. • packman: rpmbuild wrapper for simple RPMs
• Integrates with buck and our SCM tooling
• Build software, generate specfile, package it up
• Simple YAML configuration
• Can submit to central build system – required for signing
Packaging
Tooling – fb-runtime packages
21. • RPM: backported from Rawhide
• YUM: CentOS + backported PRs
• DNF: built from yum4 repos – soon to be Rawhide
• RPM, yum and DNF managed by cookbooks
• fb_rpm is open source
Packaging
Client side
23. • I/O thrash fsync() on close() (PR#187 on rpm)→
• rpmdb corruption issues → dcrpm
• Duplicate packages → package-cleanup wrapper
• Scaling Yum → pkgtorrent (merged upstream)
RPM and DNF at scale
RPM and DNF/YUM at scale
24. • Automate detection and remediation of common issues
• rpmdb corruption, stuck processes, etc.
• Works on Linux and OSX (!)
• Runs before every Chef run
• GitHub: facebookincubator/dcrpm
RPM and DNF at scale
dcrpm
25. RPM and DNF at scale
dcrpm – before and after
• RPM-related failures over time
• Baseline of errors still WIP
26. • Merge package-cleanup wrapper into dcrpm
• Test new database formats
• Test DNF in production
RPM and DNF at scale
Work in progress
28. • CentOS 5 6 and 6 7→ →
• No in-place updates: reprovision the host from scratch
• Easier to ensure a good state
• Side effect: clean slate offers opportunity to deprecate
unwanted features or tools
OS updates
Major releases
29. • About a year from initial PoC to first production machine
• About two years to migrate 100% of the fleet
• Documentation and automation
• Stateless vs stateful services
• Prerequisites and hidden dependencies
OS updates
CentOS 7 migration
30. • Rolling OS updates
• Every two weeks we sync down the latest updates…
• …and roll them out over two weeks
• Easy stop button and opt out for individual packages
OS updates
Minor releases and rolling updates
31. • /yum/centos/7.x/rollingupdates/YYYYMMDD
• Tool to automate syncing and validation
• ‘yum upgrade’* kicked off via fb_yum
• Manual per-machine canary opt-in
• Early canary 0.1% 1% 2% …→ → → →
OS updates
Rolling OS updates – implementation
33. • RHEL as a preview of CentOS
• Handful of non-prod testing hosts
• No major blockers (yet)
• Get a head start playing with modules, dnf, etc.
Early Testing
RHEL 8 Beta
34. • Will also be shipped as Yum 4
• Fully supports modularity
• Faster (as of last recent versions)
• Chef fully supports it
• Has better solutions for cleaning duplicates
• We still need to port pkgtorrent
DNF
The new hotness
35. • See https://docs.fedoraproject.org/en-US/modularity/
• Pick a “stream” from different modules
• Where is my $foo package?! Oh...
• Really exciting stuff! But...
• Some pieces still being worked out
Modularity
The new hotness
37. • CentOS has a great community
• It is part of the Fedora/Red Hat family
• Modularity will enable much of our existing practices
• Allows us to support ourselves and move faster/slower as
necessary
CentOS
Facebook loves CentOS!