This talk is a followup to State of systemd @ Facebook that was presented last year. We'll cover the latest developments, how we're leveraging new systemd features, the design of our CI/CD pipeline for systemd, and finally discuss a number of interesting case studies.
6. • CentOS 7, starting to prep for CentOS 8
• ~3 yrs using systemd
• systemd is everywhere!
• Community involvement
• Building and leveraging new features
Recap
Where were we again?
8. • Systemd 239 241 242 ( 243...)→ → →
• RPM backports automatically synced to
facebookincubator/rpm-backports
• The long tail (~2%) continues to be a pain point
• Broken hosts are broken
• Old release pins to workaround bugs
Deployment and development
Staying up to date
9. • Prepping an internal systemd release takes a while
• Last minute surprises, SMEs reliance, bus factor
• Developing on master testing on release↔
• Integration testing
• Faster feedback
Deployment and development
Streamlining the release process
10. • Modified Fedora packaging
• Replace source tarball with one from git master
• Daily builds deployed to a small number of hosts
• Integration testing for containers infra
• Soon: Red Hat testsuite, bare metal integration testing
Deployment and development
CI/CD
11. Deployment and development
Feature development
• Support fast iteration development
• Leverage internal code review / CI / etc tooling
• Apply learnings from the kernel devel process
12. • Internal systemd repo, with a read-only mirror of github
branches and tags
• Feature branches off master for initial development
• Feature branches synced with PRs submitted upstream
• Release branches off tags for internal releases
• Easier patch tracking, integration with rpm build tooling
Deployment and development
Development plan
14. • A cross between Condition* and ExecStartPre
• Can pass, skip or fail unit based on the exit code
• General purpose sanity-checks for services
[Service]
ExecCondition=/usr/bin/checker /usr/bin/foo
ExecStart=/usr/bin/foo
New features
ExecCondition
15. • See Tejun and Dan’s talk for all things cgroup2
• See Daniel and Anita’s talk for oomd
• See Johannes talk for senpai
• DisableControllers= for transient units (#12336)
• OOMPolicy= for cgroup2 (#12037)
New features
Resource management
16. New features
• GitHub: facebookincubator/pystemd
• Thin Cython wrapper on top of sd-bus
• Extended to support almost all dbus properties now
• Sockets too!
In [1]: from pystemd.examples.start_multiple_transient_unit import
start_webserver
In [2]: start_webserver(listen_stream="0.0.0.0:7055")
started myservice.86.1567623644.3155806 as socket and service
pystemd
17. New features
• Config generation improvements in fb_systemd
• Override management
fb_systemd_override 'run-as-nobody' do
unit_name 'foo.service'
content({
'Service' => {
'User' => 'nobody',
},
})
end
Chef
18. New features
• Linter for systemd units
• Surface good/bad practices
• Customizable policy and ruleset
• Internal testing and integration
• Planning to opensource by end of year
Coming soon: systemd unit linter
20. Case studies
Implicit dependencies are hard /1
• NTP not starting on boot on some hosts
• But only on >239...
• PrivateTmp implies a dependency on tmp.mount…
• ...which we mask in Chef on firstboot…
• ...after it was already started
• tmp.mount ends up being active *and* masked sadness→
21. Case studies
Implicit dependencies are hard /2
• Missing directories on hosts tmpfiles not being created→
• systemd-tmpfiles-setup depends on local-fs.target...
• ...which depends on swap.target
• fb_swap rollout added masked units as dependencies
• The whole tree gets pruned no tmpfiles :-(→
• Debugged with systemd-analyze + pid1 debug logging