SlideShare ist ein Scribd-Unternehmen logo
1 von 95
Best practices
for
Ansible
Introduction
What is Ansible?
● A configuration management system
● Agentless design: ‘controller’ (admin’s localhost) supervise everything
● No mandatory data server to work with.
● Uses ssh as a primal transport, but there are many other transports too.
An example
nginx:
● Install
● Configure reverse-proxy for an application
Name of things
Name of things
● Task + task + task => tasklist
● Tasks + vars + defaults => role
● Tasklist + hosts => play
● Play + play + … = playbook
● Playbooks + inventories = ansible repo (unofficial)
modules
● Each module configure specific thing on the host
● Examples:
○ template
○ apt
○ systemd
○ stat
○ postgresql_user
○ object_storage
○ cron
○ crm_resource
○ …
○ ~ 2200 modules in ansible 2.4
variables & templates
Ansible allow to use variables to pass argument to modules.
- Each variable is processed with jinja2 template engine
- Tasks can register variables, there is a set_fact module
- Each task, play and role may have own local-scoped variables
- Nested definition is OK
- Recursion is prohibited
- Variables are expanded at the moment of use (in modules and conditions)
- Dedicated templates for configs are processed the same way as variables
handlers
● Are called if affected task was changed
● Are called once per play
● Can be flushed (called) earlier with meta: flush_handlers
● Have a play visibility
● Roles can notify each other hander’s:
○ It’s complicated. Try to avoid this.
● Can listen to other handler’s notification
● Are called in order of declaration, not in order of notifications
● Error handling/retry policy: at most once
○ This is bad
handlers and includes
include_role import_role
inner
action
outer action inner + outer
action
inner action outer action inner + outer action
INNER + OUTER hander outer only outer only outer only inner only inner only inner only
INNER handler only inner NOT FOUND NOT FOUND inner inner inner
OUTER handler only outer outer outer outer outer outer
https://github.com/amarao/ansible_import_include_and_handlers
Conditionals
● evaluated at the moment of execution
● Evaluated on every iteration for loops
● Separately for each entry in ‘block’
● Have a special hack for ‘is defined’
Loops
- All of them are slow and clumsy.
- Ansible 2.5: iter_items → loops.
- Complicated branching is bad.
- Complexity is bad.
loop_control:
loop_var: user
label: ‘{{user.short_name}} at {{user_department}}’
idempotency
● Each task or fail, or change something, or ‘success (no change)’, or skipped
● Each task should report change only if there are changes made.
● Second run of the same task should yield ‘no change’
Important for:
- Testing
- Stability and audit
- Handler’s calls
Ansible is not
a programming language.
ansible developer
What is ‘big’ means for an ansible project?
Kubespray
● 911 files
● 49132 lines
Openstack-ansible
● 1196 files
● 52504 lines
Openshift-ansible
● 1668 files
● 175745 lines
● Estimated yaml multiplicator for line count: ~x3
Not-a-code consequences
● Global variables everywhere
● foo: ‘{{foo + 1}}’ is officially broken. Forever.
● A practical call stack depth: 3-5
● It’s hard to change values in dictionaries and lists
● Data queries are crazy and complicated (json_query filter in Jinja2):
Sources of pain
● Dependencies
● Slow execution over ssh
● Memory hogging on includes (partially fixed in 2.4.3 and 2.5)
● Data query
● Rudimental modularity
● Name conflicts
● Non-typed interfaces between roles
● A horrible error reporting for jinja2 templates/filters
● Unpredictable visibility for global variables
● Variable precedence is complicated and is broken in include_role.
Ansible is a muscle, not a skeleton
● Everything is permitted
● Most errors are detected at runtime
○ Or even silently succeeded with incorrect behavior
● No universally accepted style guide (* try ansible-lint)
● No well-known design patterns
● Best practices are at level of elementary school
Why do we still use Ansible?
Because it’s the best we have insofar.
Some bones to build a skeleton
1. Execution flow: tasks and roles are assigned to hosts
2. Hosts are the first class objects to work with
3. Groups and groups inheritance to keep relations between hosts
4. Group variables
5. A simple iteration over lists
6. Transparent access to hosts ‘by ansible magic’
... I wish I this list would be longer...
Best practices
(High level)
No overengineering
It’s not java or python. Every act of overengineering bites you badly.
● Play is better than role
● Role is better than play, repeated twice in two different playbooks
● Tasklist in a role is better than a second role
● If you can join two roles through a play, use the play
○ If you can’t - use a wrapper role
● Play for host is better than delegate_to in task
● Delegate_to is better than poking into hostvars of other host
● Everytime you iterate over hosts in a group, God kills a cat
Project layout: partitioning
● Сommon basics: users, basic packages (vim/iptables), hostname, ssh keys
● Project-specific simple configuration (standard software && simple configs)
● Non-trivial configuration for standard software: e.g. databases, pacemaker
● Non-standard software (custom apps, git deploy, venv, etc)
● Ad-hoc scripts, cron jobs, etc
● Monitoring
● Bootstrap code (run-once tasks, initialization, etc)
● Upgrade procedure(s)
● Recovery procedures
Project layout
Included in site.yaml
● Users and basic software
● Software installation and configuration
● Database creation
● Monitoring
Used separately:
● Bootstrap
● Update procedure
● Recovery procedure
● Helper scripts for staging
○ Copy data from production
○ Tests for recovered system
○ Creation/teardown for staging
● Inventory update/generation
Scope reduction
Each piece of code should work within its own domain:
If we configure application foo we shouldn’t touch random bits outside of foo:
❌ NO
● add nginx configuration for foo
● use this magic query to find
database IP
● transform list of users from global
userlist to foo format
✅ YES
● Use wrapper role to configure
nginx (include_role, import_role)
● Use role to search database IP
● Pass userlist explicitly from
playbook or another wrapper role
There is no the sane way to describe dependencies.
- Old style (with dependencies in meta) do not work and is been deprecating.
- New style include_role/import_role ignores meta-dependecies.
The single way to create dependency is to do it manually.
- import_role when role_foo_called is not defined
- set_fact: role_foo_called inside a role
Or, just call it twice if it’s fast.
Explicit dependencies
Name it! Name it right!
Examples:
● Everything should have a hyperonym (common name for few things)
○ F.e. ‘configuration playbooks’ VS ‘script playbooks’
○ Configuration playbooks should be linted to the perfection
○ Script playbooks may have unconditional ‘command/shell’ with ‘changed always’ status
● Different types of groups
○ F.e. ‘Execution groups’ VS ‘groups for variables’
○ Groups for variables should never have assigned tasks (f.e. hosts: database_settings)
● Name your components!
○ F.e. ‘bgp-push’ VS ‘bgp-pull’, ‘agents’, ‘central’, ‘external_access’, etc.
“Naming things” is the 2nd hard computing problem
Best practices
(low-level details)
Simple tricks
● ansible -i staging --list-hosts all
● ansible-playbook -i staging site.yaml --list-tags
○ Tags should have meaning!
● ansible-playbook -i staging site.yaml --check --diff
Ansible-lint !!!!!!!!!!111 one one one
● Points to subtle errors in the code playbooks
● Best practices (handlers vs “when: foo|changed” filter)
● Clarity. If lint understand that, people understand that.
● Force more semantic on shell/command
How much time it takes?
● ~ 30 lint warnings per hour.
● I cleared my project within 4 hours. There where 3 real-life bugs and 10 minor
improvements, all found by ansible-linter
Shell and command modules
● Main source of chaos if used inaccurately
● Rules:
○ If they gather information: changed_when: False
○ If they are idempotent: find a way to report changes.
○ If they are not idempotent: use only after query:
■ where: ‘foo’ in previous_query.stdout
■ where: previous_query.rc == 2
● You can refactor if those modules are idempotent
● You can not refactor if those modules are not idempotent
shell drama
And if I can’t detect changes or failure?
You are doing it wrong.
Find a way.
.
shell example
ip link set up command always returns 0, and never gives output.
❌ NO
- name: Link up
shell: |
ip link set up dev {{dev}}
✅ YES
- name: Check link status
command: ip link show {{dev}}
register: link_status
changed_when: False
- name: Link up
command: ip link set up dev {{dev}}
when: ‘UP’ not in link_status.stdout
shell example #2
foobar does not report failures at all.
We want to execute foobar add and we can to do foobar list .
❌ NO
- name: Add to foobar
shell: |
foobar add {{obj}}
✅ YES
- name: Check foobar status
register: old_fobar_output
changed_when: False
- name: Add to foobar
shell: |
foobar add {{obj}} && foobar list
register: new_foobar
when: obj not in old_foobar_output
failed_when: obj not in new_foobar
Apt: update_cache
Theoretical question: is it updated or not?
For practical reasons answer is: no changes
Option 1: integrate into install
- name: Install foo
become: yes
apt:
name: foo
state: {{foo_install_state}}
update_cache: {{apt_update_cache}}
cache_valid_time: {{apt_cache_valid_time}}
Option 2: use without changes
- name: Update apt cache
become: yes
apt:
update_cache: yes
cache_valid_time: {{cache_time}}
changed_when: False
Best practices
(workflow)
Staging
MUST HAVE
STAGING
AT ANY COST
Staging:
● Finds your bugs before production
● Helps to refactor
● Forces you to think of modularity
Development environment
Primary staging:
● virtual machines or real servers. Imitate production as close as possible
Development environment(s):
● Almost like staging, but faster and with omissions
● LXC (or docker) at localhost speedup runs for ~30-50%
● Deploy containers by Ansible, drop them by ansible
● Automate rebuild
CI/CD
● Delegate all Ansible tasks to CI/CD server (Jenkins?)
● One job for production, one for staging
● Software updates and other workflow tasks - separate jobs
● Production should be updated only through CI/CD server
○ Keep logs
○ Keep last deployed commit* in those logs
● *Do you use git for your playbooks? You should.
● Run production ‘full ansible run’ often.
○ Make it safe. Second full run = zero changes. Mandatory to have.
● Run staging ‘full ansible run’ before production for all changes.
○ It guards production and saves your face.
New and reinstalled servers
Bootstrap.yaml:
● Forget old ssh keys
● Remember new ones
● Install python, ssh keys, creates users
● Install all upgrades, restart server
Per role tests
+ Ansible way to test roles
+ Easier to debug
- Time consuming
- No inter-role integration
- Often meaningless without a context
Variables & environments
Places to hide a variable
● Inventory (host, group_name:vars)
● inventory/host_vars
● inventory/group_vars
● host_vars
● group_vars [all.yaml, group_name.yaml]
● roles/default
● roles/vars
● ‘vars:’ in any task or role
● register in any task
● import_vars
● defaults/vars of imported role
Ansible variables without supervision
Rules to keep sanity
● host_vars are banned anywhere except an inventory
● Roles/vars should be avoided
● Roles should avoid to expose variables to other roles in the same play(book)
○ Reduce global state, OK?
○ If they do - this is called an ‘interface’. Document it.
■ Example: search-fo-database-ip can set a variable db_ip.
● Environment-specific variables are kept in the inventory
● Project-specific variables are kept in group_vars
● Roles should use defaults for rarely changed variables
● Use local ‘vars:’ statement for task-local calculations
Variables and environments
Environments:
● production/
● staging/
● lab1/
Variables:
● user_list -> group_vars/all.yaml
● domain_prefix -> inventory/group_vars/all.yaml
● foo_listen_port -> group_vars/foo.yaml
● db_password ->inventory/group_vars/dbaccess.yaml
● retry_timeout ->roles/foo/default/main.yaml
Rule of thumb
You must be able to add
another environment by
creating a new inventory
(file/directory) with no
changes outside that
inventory.
How long to think before adding a variable
roles/foo/tasks/*.yaml (vars section for task) 5 seconds no docs
roles/foo/defaults/main.yaml 30 seconds role docs
roles/foo/tasks/*.yaml (register) 1 minute no docs
roles/foo/tasks/*.yaml (set_fact, role-internal) 1 minute no docs
group_var 10 minutes role or project docs
Inventory 30 minutes role or project docs
roles/foo/tasks/.*.yaml (set_fact, external use outside of the role) 60+ minutes role and project docs
Mandatory!
For use in a command line (ansible-playbook -e) 60+ minutes role and project docs
Mandatory!
Assertions and validations
- name: validating variables
Fail:
msg: "please choose scenario"
when:
- osd_group_name is defined
- osd_group_name in group_names
- not containerized_deployment
- osd_scenario == 'dummy'
From ceph-ansible
- name: Check ansible version
run_once: True
assert:
that: "ansible_version.full|version_compare('2.4','>=')"
msg: >
"You must update Ansible to at least 2.4"
delegate_to: localhost
tags:
- always
fail module with ‘when’ assert module
Tags
Tags proliferation
- name: Configure foo
template: src=foo.conf.j2 dest=/etc/foo.conf
notify: restart foo
tags:
- foo
Tags proliferation
- name: Configure foo
template: src=foo.conf.j2 dest=/etc/foo.conf
notify: restart foo
tags:
- foo
- configure
Tags proliferation
- name: Configure foo
template: src=foo.conf.j2 dest=/etc/foo.conf
notify: restart foo
tags:
- foo
- configure
- restart
Tags proliferation
- name: Configure foo
become: yes
template: src=foo.conf.j2 dest=/etc/foo.conf
notify: restart foo
tags:
- foo
- configure
- restart
- become
Tags proliferation
- name: Configure foo
become: yes
template: src=foo.conf.j2 dest=/etc/foo.conf
notify: restart foo
tags:
- foo
- configure
- restart
- become
- ip
Tags proliferation
- name: Configure foo
become: yes
template: src=foo.conf.j2 dest=/etc/foo.conf
notify: restart foo
tags:
- foo
- configure
- restart
- become
- ip
- dont_do_like_this
Concise tags
Including tags:
● One tag - one scenario
● --tags your_tag should either:
○ Finish successfully for a new installation
○ Finish successfully for an existing
installation
● If you have some tag for few plays in
a playbook, may be it’s better to split
it to separate playbook and use
include_playbook.
Excluding tags:
● Should be used with --skip-tags
● For long or complicated operations
only.
● Each ‘always’ tag should have
additional tag for skip:
- debug: var=foo
tags:
- always
- debug_foo
tag examples
- apt (all operations with apt, in all roles)
- registrations (all operations with registration in a project API, in all roles)
- foo_upgrade (all apt operations to install components of foo project)
- git (all operations related to git pull/clone)
- ip (all operations related to adding/removing IP addresses on server)
- discovery ( all ‘search-for-*-ip’ roles)
- services (tasks to configure shinken services, ~80 of them, shinken only)
- drop (specific for copy-database.yaml, tasks to drop database)
-- limit
To limit or not to limit?
Line in a template:
allow_ip = {% for h in group.all %} {{(hostvars[h]).ansible_default_ipv4.address}} {% endfor %}
ansible-playbook -i inventory test.yaml ✅
ansible-playbook -i inventory test.yaml --limit host1 ❌
fatal: [host2]: FAILED! => {"changed": false, "msg": "dict has no element ansible_default_ipv4"}
Solutions
We need information about all hosts, but we have used --limit
1. Forbid to use limits in project 😟
2. Write a partial content 😓
3. Lineinfile on per-host basis 😦
4. Gather facts for all hosts forcefully 😥
5. Use fact cache 😕
6. Use external database 😖
7. Skip task if not a full run 🤔
Partial content
{% for h in group.all %}
{% if (hostvars[h]).ansible_default_ipv4 is defined %}
{{(hostvars[h]).ansible_default_ipv4.address}}
{% endfor %}
{% endfor %}
Good: none
Bad:
- incomplete config
- ‘changed’ for each time with different --limit❌
Lineinfile
- name: Add host to config
lineinfile: path=/etc/foo.conf line=”host {{(hostvars[item]).ansible_default_ipv4.address}}”
when: (hostvars[item]).ansible_default_ipv4 is defined
with_items: groups.all
Good: survive --limit with no changes or broken config
Bad: old values are not removed
Note: Can be used only if config use one IP per line
Forceful fact gathering
- setup: subset=network
delegate_to: {{item}}
delegate_facts: yes
with_items: groups.all
when: (hostvars[item]).ansible_default_ipv4 is not defined
tags:
- always
- gather_facts
Good:
- no random ‘changed’
- Always full config
- remove old values
- fast (see ‘when’ part)
Bad:
- fails if any host is down or is not provisioned yet
Fact cache
● Do as in forceful fact gathering
● Set fact caching in ansible.cfg
● Hope it will be there
Good:
- Works most of the time
Bad:
always - most = bugs sometime
External database
● Register each host in etcd/consul
● Query data on each run
Good:
Works with --limit
Bad:
External service dependency (down/provision)
Removal of the old entities is a problem
Skip if not full run
- name: Configure foo
template: src=foo.conf.j2 dest=/etc/foo.conf
when: full_run
vars:
full_run: '{{play_hosts == groups.all}}'
Good:
- Works perfectly with --limit
- Won’t fail if some host is down and --limit was used
- Fast
- Updates and removes old data as needed on each full run
Bad:
- Does not update config if --limit
✅
templates
Template & task relationship
● Keep templates as simple as possible
● Use ‘vars:’ section for explicit variable declaration
● Never use global variables in a template. Exceptions:
○ Iterations over all hosts
○ Ansible built-in variables
○ A special global variable documented in a project and in a role
○ Very complicated queries. Use comments in the task to list used
variables inside the template.
Simplify
If a template is small, use ‘copy’ with ‘content’ argument to
inline it
- template:
dest: /etc/foobar.conf
content: |
source_ip = {{ansible_default_ipv4.address}}
Debugging templates: variables
- debug var={{item}}
with_items:
- myvar1
- myvar2
- ansible_default_ipv4
- all_other_variables_in_template
Debugging templates: Jinja2
Explicit templatization in a separate playbook (f.e. temp.yaml)
- template:
src=roles/somerole/templates/foo.conf.j2
dest: /tmp/foo.conf
delegate_to: localhost
transport: local
vars:
- some_var
- another_var
Templates everywhere
You don’t need to use ‘template’ to use jinja2. Every variable is a {{template}}.
- copy
- lineinfile
- blockinfile
- all file names for all copy/stat/file modules
- arguments to shell and command modules
- all other modules (apt, postgres_user, etc)
External Jinja2
- name: Ugly example
foo:
argument: ‘{{(hostvars[var1]).cust_facts[3]|json_query(“[?name=”+ ..
- name: Better example
foo: argument={{foo_argument}}
vars:
Foo_argument: ‘{{lookup(‘template’, ‘foo_arguments.j2’)}}
Roles
Roles: structure
1. Use defaults for rarely changed values. Do not use hard-coded constants.
2. Split role in parts
3. Allow to call role parts independently
4. Allow to reuse part of the role
5. Use call caching
Nginx: install + configure site
roles/nginx/tasks/main.yaml:
- import_tasklist: install.yaml
- import_tasklist: configure_site.yaml
- import_role:
name: nginx
tasks_from:
configure_site.yaml
vars:
nginx_site: ...
- name : install nginx
apt: name=nginx state=installed
when: nginx_installed is not defined
register: nginx_installed
Files in roles: vendor in role
Good:
- Easy to do: file: src=myfile dest=/var/lib/foo/myfile
- Single authority
- Versions
Bad:
- Keep golden artifacts in the ansible repo
Files in roles: external source
Good:
- A tidy git.
Bad:
- Need external storage.
- Version control.
Examples
private apt repo || private git repo || swift container (bad!)
Wrapper role
We have application server foo which should reside behind nginx.
● Foo want database IP, port address to listen
● Nginx need port to proxy_pass, domain, and ssl settings
Role foo configure foo only.
Role nginx configure any nginx site and it needs bunch of additional variables.
Wrapper role glues them together, but does not change anything in foo or nginx.
Wrapper role
- name: Configure foo for {{foo_source_ip}}
include_role: name=foo tasks_from=configure_foo
vars:
local_api_ip: '{{foo_local_ip}}'
local_api_port: '{{foo_local_port}}'
- name: Configure nginx for {{foo_source_ip}}
include_role: name=nginx tasks_from=configure_site
vars:
nginx_sites:
- name: 'rttgod_{{foo_source_ip}}'
listen_address: '{{foo_source_ip}}’
port: '{{foo_external_api_port}}'
locations:
proxy_pass: 'http://{{foo_local_ip}}:{{foo_local_port}}
Include_role VS import_role
import_role:
- Make it like it was written in the place of ‘include’.
- Can override handlers
- Defaults are respected
(imported role use own default, but does not change parents defaults)
- Does not support loops
- Supports conditions:
- A condition is applied to each task in the import_role role.
Include_role VS import_role
include_role:
- Supports loops
- Absolute mess
- Broken in each new ansible release in a new way (hello, 2.5):
- Delegation
- Handlers
- Defaults vs set_fact
- Parent’s variable access
- include_tasks is much more reasonable, but requires more files and lines.
A proper looping with an include in a role
- name: Loop over something
Include_tasks: per_something.yaml
with_items: ‘{{something}}’
- Name: in per_something.yaml
import_role: name=foo
vars:
var1: ‘{{item}}’
- name: A task in role ‘foo’
foo: arg=var1
delegate_to:
Works in ansible 2.5!
handlers
handlers
● Avoid cross-role handlers (except for wrapper roles)
● Use meta: flush_handlers
At least once persistent handlers
role/tasks/main.yaml:
- name: setup foo
apt: name=foo state=installed
notify: foo installed
- … other tasks here…
- meta: flush_handlers
- name: check if restart is needed
stat: path={{foo_flag}}
register: foo_restart_flag
- block:
- name: Restart foo
service name=foo state=restarted
- name: cleanup restart flag
file: path={{foo_flag}} state=absent
when: foo_restart_flag.stat.exists
handlers/main.yaml:
- name: foo installed
file:
path: ‘{{foo_flag}}’
state: touch
role/vars/main.yaml:
foo_flag: /var/run/foo-inst.flag
Plugins
Plugin types
module ≠ plugin
- lookup_plugins/
- Load data from external sources
- Perform calculations and queries
- Iterate
- action_plugins/
- Do stuff on hosts
- vars_plugins
- inventory_plugins
All plugins are written in Python, and can be stored in ‘*_plugins/’ directory near a
playbook, or within a role.
Lookup plugins
1. Try to do it with ansible.
2. Try to do it with in-line jinja2 template
3. Try to do it with in-line json_query
4. Try to do it with external jinja2_template
5. If not, write a plugin
Rule of thumb: if jinja2 template more then ⅓ of plugin (and it’s tests), write a
plugin. If less, use a jinja2.
Python in ansible complicates reading! A lot.
Plugin without tests is worse then jinja2 of any complexity.
Lookup plugins: an example
from __future__ import (absolute_import, division, print_function)
__metaclass__ = type
from ansible.plugins.lookup import LookupBase
import copy
class LookupModule(LookupBase):
def run(self, terms, **kwargs):
data = terms or kwargs
assigned_something = data['assigned_something']
assigned_others = data['assigned_others']
somethings = data['somethings']
foo_source_ips = []
for something in somethings:
for data in something.get('datas', []):
if data['other'] in assigned_others:
foo_source_ips.append(data['foo_source_ip'])
return foo_source_ips
Lookup plugins: an example
- name: Register IP
Uri:
method: PUT
url: ‘{{url}}’
body_format: json
body: '{"something": "{{item["something"]}}","other": "{{item["other"]”[data"]}}}"}'
Status_code:
- 200
- 201
- 304
register: reg_status
changed_when: reg_status.status in [200, 201]
with_my_custom_filter: '{{something}}'
Lookup plugins: json_query equivalent
- name: looping over
include_tasks: process_other.yaml
with_items: '{{selected_datas}}'
Loop_control:
loop_var: data
label: '{{other}} @ {{data.foo_source_ip|default("no ip")}}'
when: data.foo_source_ip is defined and data.other in assigned_others
vars:
somethings: '{{global_config["somethings"]}}'
query: "[?name=='{{assigned_something}}'].datas"
selected_datas: '{{global_config.somethings|json_query(query)}}'
foo_source_ip: '{{data.foo_source_ip}}'
something: '{{assigned_something}}'
other: '{{data.other}}'
Other plugins
I have no experience with them, sorry.
Key ideas for action plugins, when to write them:
- Too many too complicated command/shells in a playbook/role
- Needed reusability
- Better test coverage
- Complicated data types in use
Refactoring
Refactoring
Adding features Cleaning up the mess
Refactoring when adding features
● Use small steps
● Write a plan for refactoring before changing anything
● Paper drawing is advised.
● Use ‘not changed’ status to see if refactoring does not change anything
● Use ansible-playbook --check --diff
● Do two steps refactoring:
○ Change internals without changes in the result
○ Do small, simple changes which to change the result
● Do not forget to add cleanup code if needed
○ Drop it later
● Each step should have separate commit with a multi-line description
○ You can do this, I believe in you!
Refactoring when cleaning up mess
- Find scenarios for execution
- Eliminate false ‘changed’
- Reduce spread between files (no hostvars!)
- Split plays into playbooks
- Split tasklist into roles
- Replace hardcoded values with variables
- In templates too!
- Do you remember about staging?
- Reduce complexity of queries and iterations
- Replace ‘shell/command’ with modules
- Ansible-lint
Refactoring example: Scraps from my table
● Write all ideas, even
discarded.
● Write all variables and file
names you’ve introduced or
changed
● Draw arrows between objects
THE END
Final advice:
● Every role and every playbook cut the corners.
● Cut as few corners as possible.
● Each ‘cut corner’ has consequences.
● Amount of time dedicated to a role or to a playbook is a function of it’s
importance.
Be safe, be reasonable, and let ansible-lint to be with you.

Weitere ähnliche Inhalte

Was ist angesagt?

What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...
What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...
What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...
Simplilearn
 

Was ist angesagt? (20)

Ansible
AnsibleAnsible
Ansible
 
IT Automation with Ansible
IT Automation with AnsibleIT Automation with Ansible
IT Automation with Ansible
 
Ansible 101
Ansible 101Ansible 101
Ansible 101
 
Ansible
AnsibleAnsible
Ansible
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to Ansible
 
ansible why ?
ansible why ?ansible why ?
ansible why ?
 
Ansible, best practices
Ansible, best practicesAnsible, best practices
Ansible, best practices
 
Ansible
AnsibleAnsible
Ansible
 
DevOps Meetup ansible
DevOps Meetup   ansibleDevOps Meetup   ansible
DevOps Meetup ansible
 
Configuration Management in Ansible
Configuration Management in Ansible Configuration Management in Ansible
Configuration Management in Ansible
 
Getting started with Ansible
Getting started with AnsibleGetting started with Ansible
Getting started with Ansible
 
Ansible
AnsibleAnsible
Ansible
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentation
 
DevOps with Ansible
DevOps with AnsibleDevOps with Ansible
DevOps with Ansible
 
Ansible
AnsibleAnsible
Ansible
 
Ansible roles done right
Ansible roles done rightAnsible roles done right
Ansible roles done right
 
Ansible
AnsibleAnsible
Ansible
 
What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...
What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...
What Is Ansible? | How Ansible Works? | Ansible Tutorial For Beginners | DevO...
 
Ansible intro
Ansible introAnsible intro
Ansible intro
 
02.실전! 시스템 관리자를 위한 Ansible
02.실전! 시스템 관리자를 위한 Ansible02.실전! 시스템 관리자를 위한 Ansible
02.실전! 시스템 관리자를 위한 Ansible
 

Ähnlich wie Best practices for ansible

Creating a mature puppet system
Creating a mature puppet systemCreating a mature puppet system
Creating a mature puppet system
rkhatibi
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
vespian_256
 
Replication using PostgreSQL Replicator
Replication using PostgreSQL ReplicatorReplication using PostgreSQL Replicator
Replication using PostgreSQL Replicator
Command Prompt., Inc
 

Ähnlich wie Best practices for ansible (20)

Creating a Mature Puppet System
Creating a Mature Puppet SystemCreating a Mature Puppet System
Creating a Mature Puppet System
 
Creating a mature puppet system
Creating a mature puppet systemCreating a mature puppet system
Creating a mature puppet system
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible
 
Introduction to Ansible - (dev ops for people who hate devops)
Introduction to Ansible - (dev ops for people who hate devops)Introduction to Ansible - (dev ops for people who hate devops)
Introduction to Ansible - (dev ops for people who hate devops)
 
Automating with ansible (part a)
Automating with ansible (part a)Automating with ansible (part a)
Automating with ansible (part a)
 
Automating with ansible (Part A)
Automating with ansible (Part A)Automating with ansible (Part A)
Automating with ansible (Part A)
 
Network Automation: Ansible 101
Network Automation: Ansible 101Network Automation: Ansible 101
Network Automation: Ansible 101
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
 
Ansible 202 - sysarmy
Ansible 202 - sysarmyAnsible 202 - sysarmy
Ansible 202 - sysarmy
 
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł Rozlach
 
PLNOG Automation@Brainly
PLNOG Automation@BrainlyPLNOG Automation@Brainly
PLNOG Automation@Brainly
 
Automating MySQL operations with Puppet
Automating MySQL operations with PuppetAutomating MySQL operations with Puppet
Automating MySQL operations with Puppet
 
03 ansible towerbestpractices-nicholas
03 ansible towerbestpractices-nicholas03 ansible towerbestpractices-nicholas
03 ansible towerbestpractices-nicholas
 
How I hack on puppet modules
How I hack on puppet modulesHow I hack on puppet modules
How I hack on puppet modules
 
Introduction to Ansible - Peter Halligan
Introduction to Ansible - Peter HalliganIntroduction to Ansible - Peter Halligan
Introduction to Ansible - Peter Halligan
 
Automation and Ansible
Automation and AnsibleAutomation and Ansible
Automation and Ansible
 
Getting big without getting fat, in perl
Getting big without getting fat, in perlGetting big without getting fat, in perl
Getting big without getting fat, in perl
 
Go replicator
Go replicatorGo replicator
Go replicator
 
Replication using PostgreSQL Replicator
Replication using PostgreSQL ReplicatorReplication using PostgreSQL Replicator
Replication using PostgreSQL Replicator
 
05. haskell streaming io
05. haskell streaming io05. haskell streaming io
05. haskell streaming io
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Best practices for ansible

  • 2. Introduction What is Ansible? ● A configuration management system ● Agentless design: ‘controller’ (admin’s localhost) supervise everything ● No mandatory data server to work with. ● Uses ssh as a primal transport, but there are many other transports too.
  • 3. An example nginx: ● Install ● Configure reverse-proxy for an application
  • 5. Name of things ● Task + task + task => tasklist ● Tasks + vars + defaults => role ● Tasklist + hosts => play ● Play + play + … = playbook ● Playbooks + inventories = ansible repo (unofficial)
  • 6. modules ● Each module configure specific thing on the host ● Examples: ○ template ○ apt ○ systemd ○ stat ○ postgresql_user ○ object_storage ○ cron ○ crm_resource ○ … ○ ~ 2200 modules in ansible 2.4
  • 7. variables & templates Ansible allow to use variables to pass argument to modules. - Each variable is processed with jinja2 template engine - Tasks can register variables, there is a set_fact module - Each task, play and role may have own local-scoped variables - Nested definition is OK - Recursion is prohibited - Variables are expanded at the moment of use (in modules and conditions) - Dedicated templates for configs are processed the same way as variables
  • 8. handlers ● Are called if affected task was changed ● Are called once per play ● Can be flushed (called) earlier with meta: flush_handlers ● Have a play visibility ● Roles can notify each other hander’s: ○ It’s complicated. Try to avoid this. ● Can listen to other handler’s notification ● Are called in order of declaration, not in order of notifications ● Error handling/retry policy: at most once ○ This is bad
  • 9. handlers and includes include_role import_role inner action outer action inner + outer action inner action outer action inner + outer action INNER + OUTER hander outer only outer only outer only inner only inner only inner only INNER handler only inner NOT FOUND NOT FOUND inner inner inner OUTER handler only outer outer outer outer outer outer https://github.com/amarao/ansible_import_include_and_handlers
  • 10. Conditionals ● evaluated at the moment of execution ● Evaluated on every iteration for loops ● Separately for each entry in ‘block’ ● Have a special hack for ‘is defined’
  • 11. Loops - All of them are slow and clumsy. - Ansible 2.5: iter_items → loops. - Complicated branching is bad. - Complexity is bad. loop_control: loop_var: user label: ‘{{user.short_name}} at {{user_department}}’
  • 12. idempotency ● Each task or fail, or change something, or ‘success (no change)’, or skipped ● Each task should report change only if there are changes made. ● Second run of the same task should yield ‘no change’ Important for: - Testing - Stability and audit - Handler’s calls
  • 13. Ansible is not a programming language. ansible developer
  • 14. What is ‘big’ means for an ansible project? Kubespray ● 911 files ● 49132 lines Openstack-ansible ● 1196 files ● 52504 lines Openshift-ansible ● 1668 files ● 175745 lines ● Estimated yaml multiplicator for line count: ~x3
  • 15. Not-a-code consequences ● Global variables everywhere ● foo: ‘{{foo + 1}}’ is officially broken. Forever. ● A practical call stack depth: 3-5 ● It’s hard to change values in dictionaries and lists ● Data queries are crazy and complicated (json_query filter in Jinja2):
  • 16. Sources of pain ● Dependencies ● Slow execution over ssh ● Memory hogging on includes (partially fixed in 2.4.3 and 2.5) ● Data query ● Rudimental modularity ● Name conflicts ● Non-typed interfaces between roles ● A horrible error reporting for jinja2 templates/filters ● Unpredictable visibility for global variables ● Variable precedence is complicated and is broken in include_role.
  • 17. Ansible is a muscle, not a skeleton ● Everything is permitted ● Most errors are detected at runtime ○ Or even silently succeeded with incorrect behavior ● No universally accepted style guide (* try ansible-lint) ● No well-known design patterns ● Best practices are at level of elementary school Why do we still use Ansible? Because it’s the best we have insofar.
  • 18. Some bones to build a skeleton 1. Execution flow: tasks and roles are assigned to hosts 2. Hosts are the first class objects to work with 3. Groups and groups inheritance to keep relations between hosts 4. Group variables 5. A simple iteration over lists 6. Transparent access to hosts ‘by ansible magic’ ... I wish I this list would be longer...
  • 20. No overengineering It’s not java or python. Every act of overengineering bites you badly. ● Play is better than role ● Role is better than play, repeated twice in two different playbooks ● Tasklist in a role is better than a second role ● If you can join two roles through a play, use the play ○ If you can’t - use a wrapper role ● Play for host is better than delegate_to in task ● Delegate_to is better than poking into hostvars of other host ● Everytime you iterate over hosts in a group, God kills a cat
  • 21. Project layout: partitioning ● Сommon basics: users, basic packages (vim/iptables), hostname, ssh keys ● Project-specific simple configuration (standard software && simple configs) ● Non-trivial configuration for standard software: e.g. databases, pacemaker ● Non-standard software (custom apps, git deploy, venv, etc) ● Ad-hoc scripts, cron jobs, etc ● Monitoring ● Bootstrap code (run-once tasks, initialization, etc) ● Upgrade procedure(s) ● Recovery procedures
  • 22. Project layout Included in site.yaml ● Users and basic software ● Software installation and configuration ● Database creation ● Monitoring Used separately: ● Bootstrap ● Update procedure ● Recovery procedure ● Helper scripts for staging ○ Copy data from production ○ Tests for recovered system ○ Creation/teardown for staging ● Inventory update/generation
  • 23. Scope reduction Each piece of code should work within its own domain: If we configure application foo we shouldn’t touch random bits outside of foo: ❌ NO ● add nginx configuration for foo ● use this magic query to find database IP ● transform list of users from global userlist to foo format ✅ YES ● Use wrapper role to configure nginx (include_role, import_role) ● Use role to search database IP ● Pass userlist explicitly from playbook or another wrapper role
  • 24. There is no the sane way to describe dependencies. - Old style (with dependencies in meta) do not work and is been deprecating. - New style include_role/import_role ignores meta-dependecies. The single way to create dependency is to do it manually. - import_role when role_foo_called is not defined - set_fact: role_foo_called inside a role Or, just call it twice if it’s fast. Explicit dependencies
  • 25. Name it! Name it right! Examples: ● Everything should have a hyperonym (common name for few things) ○ F.e. ‘configuration playbooks’ VS ‘script playbooks’ ○ Configuration playbooks should be linted to the perfection ○ Script playbooks may have unconditional ‘command/shell’ with ‘changed always’ status ● Different types of groups ○ F.e. ‘Execution groups’ VS ‘groups for variables’ ○ Groups for variables should never have assigned tasks (f.e. hosts: database_settings) ● Name your components! ○ F.e. ‘bgp-push’ VS ‘bgp-pull’, ‘agents’, ‘central’, ‘external_access’, etc. “Naming things” is the 2nd hard computing problem
  • 27. Simple tricks ● ansible -i staging --list-hosts all ● ansible-playbook -i staging site.yaml --list-tags ○ Tags should have meaning! ● ansible-playbook -i staging site.yaml --check --diff
  • 28. Ansible-lint !!!!!!!!!!111 one one one ● Points to subtle errors in the code playbooks ● Best practices (handlers vs “when: foo|changed” filter) ● Clarity. If lint understand that, people understand that. ● Force more semantic on shell/command How much time it takes? ● ~ 30 lint warnings per hour. ● I cleared my project within 4 hours. There where 3 real-life bugs and 10 minor improvements, all found by ansible-linter
  • 29. Shell and command modules ● Main source of chaos if used inaccurately ● Rules: ○ If they gather information: changed_when: False ○ If they are idempotent: find a way to report changes. ○ If they are not idempotent: use only after query: ■ where: ‘foo’ in previous_query.stdout ■ where: previous_query.rc == 2 ● You can refactor if those modules are idempotent ● You can not refactor if those modules are not idempotent
  • 30. shell drama And if I can’t detect changes or failure? You are doing it wrong. Find a way. .
  • 31. shell example ip link set up command always returns 0, and never gives output. ❌ NO - name: Link up shell: | ip link set up dev {{dev}} ✅ YES - name: Check link status command: ip link show {{dev}} register: link_status changed_when: False - name: Link up command: ip link set up dev {{dev}} when: ‘UP’ not in link_status.stdout
  • 32. shell example #2 foobar does not report failures at all. We want to execute foobar add and we can to do foobar list . ❌ NO - name: Add to foobar shell: | foobar add {{obj}} ✅ YES - name: Check foobar status register: old_fobar_output changed_when: False - name: Add to foobar shell: | foobar add {{obj}} && foobar list register: new_foobar when: obj not in old_foobar_output failed_when: obj not in new_foobar
  • 33. Apt: update_cache Theoretical question: is it updated or not? For practical reasons answer is: no changes Option 1: integrate into install - name: Install foo become: yes apt: name: foo state: {{foo_install_state}} update_cache: {{apt_update_cache}} cache_valid_time: {{apt_cache_valid_time}} Option 2: use without changes - name: Update apt cache become: yes apt: update_cache: yes cache_valid_time: {{cache_time}} changed_when: False
  • 35. Staging MUST HAVE STAGING AT ANY COST Staging: ● Finds your bugs before production ● Helps to refactor ● Forces you to think of modularity
  • 36. Development environment Primary staging: ● virtual machines or real servers. Imitate production as close as possible Development environment(s): ● Almost like staging, but faster and with omissions ● LXC (or docker) at localhost speedup runs for ~30-50% ● Deploy containers by Ansible, drop them by ansible ● Automate rebuild
  • 37. CI/CD ● Delegate all Ansible tasks to CI/CD server (Jenkins?) ● One job for production, one for staging ● Software updates and other workflow tasks - separate jobs ● Production should be updated only through CI/CD server ○ Keep logs ○ Keep last deployed commit* in those logs ● *Do you use git for your playbooks? You should. ● Run production ‘full ansible run’ often. ○ Make it safe. Second full run = zero changes. Mandatory to have. ● Run staging ‘full ansible run’ before production for all changes. ○ It guards production and saves your face.
  • 38. New and reinstalled servers Bootstrap.yaml: ● Forget old ssh keys ● Remember new ones ● Install python, ssh keys, creates users ● Install all upgrades, restart server
  • 39. Per role tests + Ansible way to test roles + Easier to debug - Time consuming - No inter-role integration - Often meaningless without a context
  • 41. Places to hide a variable ● Inventory (host, group_name:vars) ● inventory/host_vars ● inventory/group_vars ● host_vars ● group_vars [all.yaml, group_name.yaml] ● roles/default ● roles/vars ● ‘vars:’ in any task or role ● register in any task ● import_vars ● defaults/vars of imported role Ansible variables without supervision
  • 42. Rules to keep sanity ● host_vars are banned anywhere except an inventory ● Roles/vars should be avoided ● Roles should avoid to expose variables to other roles in the same play(book) ○ Reduce global state, OK? ○ If they do - this is called an ‘interface’. Document it. ■ Example: search-fo-database-ip can set a variable db_ip. ● Environment-specific variables are kept in the inventory ● Project-specific variables are kept in group_vars ● Roles should use defaults for rarely changed variables ● Use local ‘vars:’ statement for task-local calculations
  • 43. Variables and environments Environments: ● production/ ● staging/ ● lab1/ Variables: ● user_list -> group_vars/all.yaml ● domain_prefix -> inventory/group_vars/all.yaml ● foo_listen_port -> group_vars/foo.yaml ● db_password ->inventory/group_vars/dbaccess.yaml ● retry_timeout ->roles/foo/default/main.yaml Rule of thumb You must be able to add another environment by creating a new inventory (file/directory) with no changes outside that inventory.
  • 44. How long to think before adding a variable roles/foo/tasks/*.yaml (vars section for task) 5 seconds no docs roles/foo/defaults/main.yaml 30 seconds role docs roles/foo/tasks/*.yaml (register) 1 minute no docs roles/foo/tasks/*.yaml (set_fact, role-internal) 1 minute no docs group_var 10 minutes role or project docs Inventory 30 minutes role or project docs roles/foo/tasks/.*.yaml (set_fact, external use outside of the role) 60+ minutes role and project docs Mandatory! For use in a command line (ansible-playbook -e) 60+ minutes role and project docs Mandatory!
  • 45. Assertions and validations - name: validating variables Fail: msg: "please choose scenario" when: - osd_group_name is defined - osd_group_name in group_names - not containerized_deployment - osd_scenario == 'dummy' From ceph-ansible - name: Check ansible version run_once: True assert: that: "ansible_version.full|version_compare('2.4','>=')" msg: > "You must update Ansible to at least 2.4" delegate_to: localhost tags: - always fail module with ‘when’ assert module
  • 46. Tags
  • 47. Tags proliferation - name: Configure foo template: src=foo.conf.j2 dest=/etc/foo.conf notify: restart foo tags: - foo
  • 48. Tags proliferation - name: Configure foo template: src=foo.conf.j2 dest=/etc/foo.conf notify: restart foo tags: - foo - configure
  • 49. Tags proliferation - name: Configure foo template: src=foo.conf.j2 dest=/etc/foo.conf notify: restart foo tags: - foo - configure - restart
  • 50. Tags proliferation - name: Configure foo become: yes template: src=foo.conf.j2 dest=/etc/foo.conf notify: restart foo tags: - foo - configure - restart - become
  • 51. Tags proliferation - name: Configure foo become: yes template: src=foo.conf.j2 dest=/etc/foo.conf notify: restart foo tags: - foo - configure - restart - become - ip
  • 52. Tags proliferation - name: Configure foo become: yes template: src=foo.conf.j2 dest=/etc/foo.conf notify: restart foo tags: - foo - configure - restart - become - ip - dont_do_like_this
  • 53. Concise tags Including tags: ● One tag - one scenario ● --tags your_tag should either: ○ Finish successfully for a new installation ○ Finish successfully for an existing installation ● If you have some tag for few plays in a playbook, may be it’s better to split it to separate playbook and use include_playbook. Excluding tags: ● Should be used with --skip-tags ● For long or complicated operations only. ● Each ‘always’ tag should have additional tag for skip: - debug: var=foo tags: - always - debug_foo
  • 54. tag examples - apt (all operations with apt, in all roles) - registrations (all operations with registration in a project API, in all roles) - foo_upgrade (all apt operations to install components of foo project) - git (all operations related to git pull/clone) - ip (all operations related to adding/removing IP addresses on server) - discovery ( all ‘search-for-*-ip’ roles) - services (tasks to configure shinken services, ~80 of them, shinken only) - drop (specific for copy-database.yaml, tasks to drop database)
  • 56. To limit or not to limit? Line in a template: allow_ip = {% for h in group.all %} {{(hostvars[h]).ansible_default_ipv4.address}} {% endfor %} ansible-playbook -i inventory test.yaml ✅ ansible-playbook -i inventory test.yaml --limit host1 ❌ fatal: [host2]: FAILED! => {"changed": false, "msg": "dict has no element ansible_default_ipv4"}
  • 57. Solutions We need information about all hosts, but we have used --limit 1. Forbid to use limits in project 😟 2. Write a partial content 😓 3. Lineinfile on per-host basis 😦 4. Gather facts for all hosts forcefully 😥 5. Use fact cache 😕 6. Use external database 😖 7. Skip task if not a full run 🤔
  • 58. Partial content {% for h in group.all %} {% if (hostvars[h]).ansible_default_ipv4 is defined %} {{(hostvars[h]).ansible_default_ipv4.address}} {% endfor %} {% endfor %} Good: none Bad: - incomplete config - ‘changed’ for each time with different --limit❌
  • 59. Lineinfile - name: Add host to config lineinfile: path=/etc/foo.conf line=”host {{(hostvars[item]).ansible_default_ipv4.address}}” when: (hostvars[item]).ansible_default_ipv4 is defined with_items: groups.all Good: survive --limit with no changes or broken config Bad: old values are not removed Note: Can be used only if config use one IP per line
  • 60. Forceful fact gathering - setup: subset=network delegate_to: {{item}} delegate_facts: yes with_items: groups.all when: (hostvars[item]).ansible_default_ipv4 is not defined tags: - always - gather_facts Good: - no random ‘changed’ - Always full config - remove old values - fast (see ‘when’ part) Bad: - fails if any host is down or is not provisioned yet
  • 61. Fact cache ● Do as in forceful fact gathering ● Set fact caching in ansible.cfg ● Hope it will be there Good: - Works most of the time Bad: always - most = bugs sometime
  • 62. External database ● Register each host in etcd/consul ● Query data on each run Good: Works with --limit Bad: External service dependency (down/provision) Removal of the old entities is a problem
  • 63. Skip if not full run - name: Configure foo template: src=foo.conf.j2 dest=/etc/foo.conf when: full_run vars: full_run: '{{play_hosts == groups.all}}' Good: - Works perfectly with --limit - Won’t fail if some host is down and --limit was used - Fast - Updates and removes old data as needed on each full run Bad: - Does not update config if --limit ✅
  • 65. Template & task relationship ● Keep templates as simple as possible ● Use ‘vars:’ section for explicit variable declaration ● Never use global variables in a template. Exceptions: ○ Iterations over all hosts ○ Ansible built-in variables ○ A special global variable documented in a project and in a role ○ Very complicated queries. Use comments in the task to list used variables inside the template.
  • 66. Simplify If a template is small, use ‘copy’ with ‘content’ argument to inline it - template: dest: /etc/foobar.conf content: | source_ip = {{ansible_default_ipv4.address}}
  • 67. Debugging templates: variables - debug var={{item}} with_items: - myvar1 - myvar2 - ansible_default_ipv4 - all_other_variables_in_template
  • 68. Debugging templates: Jinja2 Explicit templatization in a separate playbook (f.e. temp.yaml) - template: src=roles/somerole/templates/foo.conf.j2 dest: /tmp/foo.conf delegate_to: localhost transport: local vars: - some_var - another_var
  • 69. Templates everywhere You don’t need to use ‘template’ to use jinja2. Every variable is a {{template}}. - copy - lineinfile - blockinfile - all file names for all copy/stat/file modules - arguments to shell and command modules - all other modules (apt, postgres_user, etc)
  • 70. External Jinja2 - name: Ugly example foo: argument: ‘{{(hostvars[var1]).cust_facts[3]|json_query(“[?name=”+ .. - name: Better example foo: argument={{foo_argument}} vars: Foo_argument: ‘{{lookup(‘template’, ‘foo_arguments.j2’)}}
  • 71. Roles
  • 72. Roles: structure 1. Use defaults for rarely changed values. Do not use hard-coded constants. 2. Split role in parts 3. Allow to call role parts independently 4. Allow to reuse part of the role 5. Use call caching Nginx: install + configure site roles/nginx/tasks/main.yaml: - import_tasklist: install.yaml - import_tasklist: configure_site.yaml - import_role: name: nginx tasks_from: configure_site.yaml vars: nginx_site: ... - name : install nginx apt: name=nginx state=installed when: nginx_installed is not defined register: nginx_installed
  • 73. Files in roles: vendor in role Good: - Easy to do: file: src=myfile dest=/var/lib/foo/myfile - Single authority - Versions Bad: - Keep golden artifacts in the ansible repo
  • 74. Files in roles: external source Good: - A tidy git. Bad: - Need external storage. - Version control. Examples private apt repo || private git repo || swift container (bad!)
  • 75. Wrapper role We have application server foo which should reside behind nginx. ● Foo want database IP, port address to listen ● Nginx need port to proxy_pass, domain, and ssl settings Role foo configure foo only. Role nginx configure any nginx site and it needs bunch of additional variables. Wrapper role glues them together, but does not change anything in foo or nginx.
  • 76. Wrapper role - name: Configure foo for {{foo_source_ip}} include_role: name=foo tasks_from=configure_foo vars: local_api_ip: '{{foo_local_ip}}' local_api_port: '{{foo_local_port}}' - name: Configure nginx for {{foo_source_ip}} include_role: name=nginx tasks_from=configure_site vars: nginx_sites: - name: 'rttgod_{{foo_source_ip}}' listen_address: '{{foo_source_ip}}’ port: '{{foo_external_api_port}}' locations: proxy_pass: 'http://{{foo_local_ip}}:{{foo_local_port}}
  • 77. Include_role VS import_role import_role: - Make it like it was written in the place of ‘include’. - Can override handlers - Defaults are respected (imported role use own default, but does not change parents defaults) - Does not support loops - Supports conditions: - A condition is applied to each task in the import_role role.
  • 78. Include_role VS import_role include_role: - Supports loops - Absolute mess - Broken in each new ansible release in a new way (hello, 2.5): - Delegation - Handlers - Defaults vs set_fact - Parent’s variable access - include_tasks is much more reasonable, but requires more files and lines.
  • 79. A proper looping with an include in a role - name: Loop over something Include_tasks: per_something.yaml with_items: ‘{{something}}’ - Name: in per_something.yaml import_role: name=foo vars: var1: ‘{{item}}’ - name: A task in role ‘foo’ foo: arg=var1 delegate_to: Works in ansible 2.5!
  • 81. handlers ● Avoid cross-role handlers (except for wrapper roles) ● Use meta: flush_handlers
  • 82. At least once persistent handlers role/tasks/main.yaml: - name: setup foo apt: name=foo state=installed notify: foo installed - … other tasks here… - meta: flush_handlers - name: check if restart is needed stat: path={{foo_flag}} register: foo_restart_flag - block: - name: Restart foo service name=foo state=restarted - name: cleanup restart flag file: path={{foo_flag}} state=absent when: foo_restart_flag.stat.exists handlers/main.yaml: - name: foo installed file: path: ‘{{foo_flag}}’ state: touch role/vars/main.yaml: foo_flag: /var/run/foo-inst.flag
  • 84. Plugin types module ≠ plugin - lookup_plugins/ - Load data from external sources - Perform calculations and queries - Iterate - action_plugins/ - Do stuff on hosts - vars_plugins - inventory_plugins All plugins are written in Python, and can be stored in ‘*_plugins/’ directory near a playbook, or within a role.
  • 85. Lookup plugins 1. Try to do it with ansible. 2. Try to do it with in-line jinja2 template 3. Try to do it with in-line json_query 4. Try to do it with external jinja2_template 5. If not, write a plugin Rule of thumb: if jinja2 template more then ⅓ of plugin (and it’s tests), write a plugin. If less, use a jinja2. Python in ansible complicates reading! A lot. Plugin without tests is worse then jinja2 of any complexity.
  • 86. Lookup plugins: an example from __future__ import (absolute_import, division, print_function) __metaclass__ = type from ansible.plugins.lookup import LookupBase import copy class LookupModule(LookupBase): def run(self, terms, **kwargs): data = terms or kwargs assigned_something = data['assigned_something'] assigned_others = data['assigned_others'] somethings = data['somethings'] foo_source_ips = [] for something in somethings: for data in something.get('datas', []): if data['other'] in assigned_others: foo_source_ips.append(data['foo_source_ip']) return foo_source_ips
  • 87. Lookup plugins: an example - name: Register IP Uri: method: PUT url: ‘{{url}}’ body_format: json body: '{"something": "{{item["something"]}}","other": "{{item["other"]”[data"]}}}"}' Status_code: - 200 - 201 - 304 register: reg_status changed_when: reg_status.status in [200, 201] with_my_custom_filter: '{{something}}'
  • 88. Lookup plugins: json_query equivalent - name: looping over include_tasks: process_other.yaml with_items: '{{selected_datas}}' Loop_control: loop_var: data label: '{{other}} @ {{data.foo_source_ip|default("no ip")}}' when: data.foo_source_ip is defined and data.other in assigned_others vars: somethings: '{{global_config["somethings"]}}' query: "[?name=='{{assigned_something}}'].datas" selected_datas: '{{global_config.somethings|json_query(query)}}' foo_source_ip: '{{data.foo_source_ip}}' something: '{{assigned_something}}' other: '{{data.other}}'
  • 89. Other plugins I have no experience with them, sorry. Key ideas for action plugins, when to write them: - Too many too complicated command/shells in a playbook/role - Needed reusability - Better test coverage - Complicated data types in use
  • 92. Refactoring when adding features ● Use small steps ● Write a plan for refactoring before changing anything ● Paper drawing is advised. ● Use ‘not changed’ status to see if refactoring does not change anything ● Use ansible-playbook --check --diff ● Do two steps refactoring: ○ Change internals without changes in the result ○ Do small, simple changes which to change the result ● Do not forget to add cleanup code if needed ○ Drop it later ● Each step should have separate commit with a multi-line description ○ You can do this, I believe in you!
  • 93. Refactoring when cleaning up mess - Find scenarios for execution - Eliminate false ‘changed’ - Reduce spread between files (no hostvars!) - Split plays into playbooks - Split tasklist into roles - Replace hardcoded values with variables - In templates too! - Do you remember about staging? - Reduce complexity of queries and iterations - Replace ‘shell/command’ with modules - Ansible-lint
  • 94. Refactoring example: Scraps from my table ● Write all ideas, even discarded. ● Write all variables and file names you’ve introduced or changed ● Draw arrows between objects
  • 95. THE END Final advice: ● Every role and every playbook cut the corners. ● Cut as few corners as possible. ● Each ‘cut corner’ has consequences. ● Amount of time dedicated to a role or to a playbook is a function of it’s importance. Be safe, be reasonable, and let ansible-lint to be with you.

Hinweis der Redaktion

  1. - about ansible, pre 2.0, bad 2.3, 2.4, small revolution at 2.5 - about my experience - expectation on audience. Someone knew some things better than me - some I stole from others, some are my own inventions Not in this presentation: vault, tower, network
  2. Why it’s simple Why it’s complicated
  3. A play or a playbook can not be in a role!
  4. Few examples here, they cover almost everything.
  5. Origin of Jinja Explain ‘moment of usage’
  6. Will explain ‘at least once’ VS ‘at most once’
  7. - delegate_to/include/loop will be explained later
  8. 2.5 - just a cosmetics
  9. It’s bad. Too many places, too many ways of thinking
  10. Why so many on tags? Because tags are usefull, but ansible gives no hint on how to use them and when to stop. I wanted to give counterexamples, but they are hard to show because it’s hard to show inconsistency on a short slide
  11. It’s should be in refactoring part too. Pay attention to this.
  12. It doesn’t matter what this photo is about. Key is a spirit - what to do. There are many object and their relationship is compicated. Draw it.