If patch management is a problem in your infrastructure, then this talk is for you. This talk will walk through our journey to automate the patch management of our infrastructure. Details will be shared about our architecture, phased approach and implementation. We will also showcase our new Puppet module that can be used as a framework for patch management with Bolt and Puppet.
7. Requirements
•Security
• More often (weekly)
• Fast as possible (<1 day)
• Reports
•DevOps
• HA groups
• Customizable workflows
• Cross-platform
• Windows Update + Chocolatey
15. patching::available_updates
•Check for available updates
•Windows
- Windows Update Agent API
- choco outdated
•RHEL
- yum -q check-update
•Ubuntu
- apt upgrade –simulate
• Output = Array of updates
bolt plan run patching::available_udpates
17. patching::snapshot_vmware
•VMware only (for now)
- Bolt control node
- rbvmomi gem
•Optional
•Customizable
•Pluggable
- Dynamic Dispatch
bolt plan run patching::snapshot_vmware
18. Dynamic Dispatch in Bolt
plan patching (
TargetSpec $nodes,
String $snapshot_plan,
) {
# lots of things…
run_plan($snapshot_plan,
nodes => $nodes,
action => ‘create’)
}
plan patching::snapshot_vmware (
TargetSpec $nodes,
String $action,
) { … }
bolt plan run patching snapshot_plan=patching::snapshot_vmware
Requirement : Plans must conform to same “interface”
19. patching::pre_update
•Service health checks
•Backups
•Stop services
•etc
•Runs script on remote node
Linux = /opt/patching/bin/pre_update.sh
Windows = C:ProgramDatapatchingbinpre_update.ps1
•Customizable
bolt plan run patching::pre_update
20. Customizing with vars---
vars:
patching_pre_patch_plan: ‘mymodule::pre_patch’
patching_pre_update_script_linux: ‘/my/custom/patching/script.sh’
patching_pre_update_script_windows: ‘C:mycustompatchingscript.ps1’
plan patching::pre_update (
Target-spec $n,
String $script_linux = ‘/opt/patching/bin/pre_update.sh’
String $script_windows = ‘C:ProgramDatapatchingbinpre_update.ps1’,
) {
$vars = get_targets($n)[0].vars
$_script_linux = pick($vars[‘patching_pre_update_script_linux’], $script_linux)
$_script_windows = pick($vars[‘patching_pre_update_script_windows’], $script_windows)
# … do things
}
inventory.yaml
plan
22. patching::update
•Windows
- Windows Update Agent API
• Special snowflake scheduled task…
- choco upgrade all
•RHEL
- yum update
•Ubuntu
- apt-get dist-upgrade
bolt task run patching::update
23. Logs and Results
•Linux
- Writes stdout log /var/log/patching.log
- Writes results to /var/log/patching.json
•Windows
- Writes logs to C:ProgramDatapatchinglogpatching.log
- Writes results to C:ProgramDatapatchinglogpatching.json
24. patching::post_update
•Start services
•Waiting for services
•Health check
•etc
•Pluggable same as pre_upate
- Linux = /opt/patching/bin/post_update.sh
- Windows = C:ProgramDatapatchingbinpost_update.ps1
bolt plan run patching::post_update
27. Patching Now
•500+ VMs
• 5x environments
•1 engineer
•< 1 day
•Every week
• Dev = latest
• Prod = Dev last week
28. Lessons Learned - Bolt
•Simple tasks
•Tie tasks together with plans
•Standardize parameters
•Standardize results
•Keep large binaries out of files/
29. Lessons Learned - Linux
•Bash lowest common denominator
•+100s of systems in a group
•Remember to update cache
•Careful of /tmp and noexec
30. Lessons Learned - Windows
•Connect timeouts
- 200+ seconds
•100 nodes max
•Long tasks = bad
•Slow updates
•Slow File Tx WinRM
•PowerShell versions
•WUA = PITA
- NO BUZZ WORDS HERE
-
- Automated patching with bolt
- IT SERVICES PROVIDER
- Cincy
- Work in Managed Services
-
- Goal is to make IT suck less
- Solving IT problems with modern tools and techniques
- Allowing customers to focus on their business problmes
- How often are YOU patching???
- Weekly, Monthly, Quarterly, Yearly
-
- 1 year ago
- CVEs and Zero Days
- Us and our customers
-
- Manual
- Slow (days -> weeks)
- Long nights
- Apps broken before patching
- Lack of shutdowns/startups
- Forgotten snapshots
- Forgotten monitoring
- Landscape?
- Ohio in middle of the Brown Field
-
- Windows - 2008 - 2012 - 2016
- Linux
- RHEL 6 & 7
- Ubuntu 14.04, 16.04, 18.04
- Windows
- SCCM
- WSUS
- RHEL = Satellite
- Ubuntu = ??
-
- Everything is a “suggestion”
- Randomly in a window
- No custom steps
- Security
- More often (weekly)
- Faster (1 day or less)
- Reports of available patches
-
- DevOps
- HA groups
- Customizable workflows
- Cross-platform
- Windows Update + Chocolatey
- Built on bolt
-
- Open source for community
-
- Eat our own dogfood
-
- Forge
- Patching NOT one size
- Framework of building blocks
- Agent-less
- Everything is a plan and a task
- Task does work
- Plan calls Task
- “User friendly” output
- Group inventory
- Common interfaces
- Customizable
- Vars
- Parameters
- NOT MAGIC
- Windows clients register to
- WSUS
- Chocolatey
-
- Red Hat register to
- Satellite (Foreman + Katello)
-
- Ubunutu
- internet
-
- Bolt orchestrates everything
- NOTE: Puppet agent not necessary (customers)
- TODO promote content
- Available updates
- Create snapshot
- Pre
- app shutdowns
- Update
- Post
- Reboot
- Delete snapshot
- Input of Array[TargetSpec]
- Targets have patching_order var
- Group by common patching order
- sort() on patching order
- Result is sorted array of groups (targets)
- Inventory YAML on the left
-
- Result on the right
-
- Puts data into a array
-
- Sorted by patching order
-
- If multiple inventory groups with same patching_order, result in one group
-
- Allows inventory to be defined by different dimension, say application
- Runs plan to get ordered groups
-
- Iterate over each group
-
- Gather facts for each group
-
- Facilitates us being able to patch sets of nodes in ORDER
- Queries the node for available updates
-
- Windows
- Windows Update Agent API
- choco outdated
-
- RHEL
- yum check-update
-
- Ubuntu
- apt upgrade –simulate
-
- Output = Array of updates
- Windows output on the left
- windows update
- chocolatey
- “providers”
-
- RHEL
- Debian
-
- Common
- name
- version
-
- Allow data custom to each
- Vmware only, for now
- Installs rbvmomi
- bolt control node
- Optional
- Customizable with vars
- create
- delete
- allow us to wait overnight
- quiesce
- memory
-
- Pluggable with dynamic dispatch
- Dynamic dispatch from CS
- determine path at runtime
-
- Pass plan/task to execute as string
-
- Plans/tasks need common “interface”
-
- Example
- run example with plan snapshot
-
- example runs snapshot plan’
-
- snapshot plan has ‘action’ interface
- Custom processes before patching
- service health checks (in case it’s already broken)
- backups
- stop services
- etc
- Default
- runs script on remote node
- Customizable
- Inventory file up top
-
- Vars section for global customizations
-
- Default = hard coded
-
- pick() to read from “vars”
-
- Allows customizing at runtime / CLI
-
- Order of precedence
- CLI
- Inventory Var
- default in the plan
-
- Great pattern
- customizing global up top
-
- regular_nodes group gets that
-
- customizing for a group
-
- sql_nodes for graceful SQL failover
- Windows
- Choco upagrade all : EASY
- Special snowflake windows update
- Scheduled task
- RHEL
- yum update
- Ubuntu
- apt-get dist-upgrade
- Write logs on every node
- Can come query them later
-
- Great for debugging
-
- Great for reporting
- Same as pre_patch
- different script
-
- Start services
- Wait for sockets/services
- Health checks
- Customizable with strategy
- never
- always
- only required
-
- Windows
- check a bunch of registry and other Win32 APIs
-
- RHEL
- needs-restarting
-
- Ubuntu
- existence of “/var/run/reboot-required”
- Opinionated workflow
-
- Uses all of the components we just talked about
-
- Customizable / pluggable
- vars
- dynamic dispatch
-
- Super easy way to get started
-
- Fully expect people to make their own workflows
- 500+ Vms
- 6x internal and customer environments
-
- 1 engineer
- < 1 day
-
- Every week
- dev = latest
- prod = dev from week before
- Simple tasks
- Tie tasks together with plans
- Standardize parameters
- Standardize results
- Keep large binaries out of files/
- files/ come from some other module
- take advantage of isolated boltdir + puppetfile
- Bash lowest common denominator
- python
- perl
- ruby
-
- +100s of systems in a group
-
- Remember to update cache
-
- Carefule of /tmp and noexec
- Connection timeouts
- 200+ seconds
-
- 100 nodes max per group
-
- Long tasks can timeout in WinRM randomly
-
- Updates SUPER slow
-
- Slow transfer files with WinRM
-
- PowerShell versions matter
- cmdlets don’t exist
-
- Windows Update API == PITA