code_style - sk4zuzu/one-deploy GitHub Wiki
The one-deploy
project has been designed from the start as an Ansible Collection, so it's convenient to install it using the ansible-galaxy
CLI command. Ansible collection structure differs from a classic Ansible project, you can read more about it here.
In #10 we've merged Ceph support. The ceph-ansible repository (that we borrow roles from) is not a collection, so the integration had to be done via a git submodule with some awkward ansible.cfg
setup:
[defaults]
collections_paths = ./ansible_collections/
action_plugins = ./vendor/ceph-ansible/plugins/actions/
callback_plugins = ./vendor/ceph-ansible/plugins/callback/
filter_plugins = ./vendor/ceph-ansible/plugins/filter/
roles_path = ./vendor/ceph-ansible/roles/
library = ./vendor/ceph-ansible/library/
module_utils = ./vendor/ceph-ansible/module_utils/
Important
We've made this exception and learned to live with it, but in general we don't want to repeat such mistakes. Please, make sure one-deploy
stays a collection as close to its purest form as possible.
There is nothing special about roles implemented in one-deploy
. There are few design choices however you should probably know:
- Most of the roles depend on the
opennebula.deploy.common
role, it's the place where you can put some global defaults or pre-generate global facts. - Some of the roles are divided into parts (sub-roles) that should be executed in dedicated inventory groups.
- Each role contains a README file written in markdown.
Our inventory markup language of choice is YAML, since it's more flexible than the usual INI format. In particular it allows to neatly express more complex inventories, as you can see in the example below. The vn
object would be much harder to define (and less readable) in an INI-based inventory file.
---
all:
vars:
ansible_user: ubuntu
ensure_keys_for: [ubuntu, root]
one_pass: opennebula
one_version: '6.8'
features: { evpn: true }
ds: { mode: ssh }
vn:
evpn0:
managed: true
template:
VN_MAD: vxlan
VXLAN_MODE: evpn
IP_LINK_CONF: nolearning=
PHYDEV: eth0
AUTOMATIC_VLAN_ID: "YES"
GUEST_MTU: 1450
AR:
TYPE: IP4
IP: 172.17.2.200
SIZE: 48
NETWORK_ADDRESS: 172.17.2.0
NETWORK_MASK: 255.255.255.0
GATEWAY: 172.17.2.1
DNS: 1.1.1.1
one_vip: 10.2.50.86
one_vip_cidr: 24
one_vip_if: eth0
router:
hosts:
n1a1: { ansible_host: 10.2.50.10 }
n1a2: { ansible_host: 10.2.50.11 }
frontend:
hosts:
n1a1: { ansible_host: 10.2.50.10 }
n1a2: { ansible_host: 10.2.50.11 }
node:
hosts:
n1b1: { ansible_host: 10.2.50.20 }
n1b2: { ansible_host: 10.2.50.21 }
The Makefile is optional. Please don't rely on it for implementing any of the core functionalities — just don't put them there. 👎
For example, manual federated deployment requires sequential execution with multiple different inventory files, so the Makefile is not the correct place to implement that. Just let users execute the manual steps manually. ☝️😌
The opennebula.deploy.precheck
role is executed in the opennebula.deploy.pre
playbook. The main purpose of this role is to verify various conditions globally at the cluster level, and prevent execution of the opennebula.deploy.site
playbook if the conditions are not met. While implementing new features or updating existing ones, you should consider adding new pre-checks or updating existing pre-checks. Please make sure you design and test ansible.builtin.assert
logic statements properly — don't rely on intuition, just identify and test all of the test cases. 👍
Warning
The opennebula.deploy.precheck
role doesn't depend on other roles and doesn't load defaults. It was a conscious decision to repeat definitions of defaults inside logic statements.
YAML 1.1 used to support the following boolean values y Y yes Yes YES n N no No NO true True TRUE false False FALSE on On ON off Off OFF
(sic!), which is ridiculous as somebody could use on no
to handle all booleans for off reason (pun intended).
Luckily YAML 1.2 properly recognizes true True TRUE false False FALSE
only.
Important
Ansible on the other hand still supports multiple variants, but we don't care (not all ideas are good ideas)! Please, always use true/false
💢.
Please take a look at the idiom below, this is the way we recommend for implementing complex conditionals. Note, there is no name:
attribute that we skip on purpose, so the whole expression is similar to a regular if statement.
Important
Please add the name:
attribute to most tasks except this when/block
idiom and (if you wish) include/import
statements.
- when: custom_fact == 'custom_value'
block:
- name: Task description
ansible.builtin.set_fact:
custom_fact: another_custom_value
- name: Task description
ansible.builtin.set_fact:
another_custom_fact: custom_value
Warning
A note about ansible-lint
. The code above fails to validate with ansible-lint
due to missing name:
attribute, that's the main reason we don't use ansible-lint
. Also, ansible-lint
is annoying to use in general, so we don't use it. We're open to discussing it if you really (x2) think you can change our minds. 🤔
Please take a look at the example task below. You can see that it actually processes multiple JSON payloads, so you can (arguably) consider it to be complex. Now imagine all the code under vars:
is written as a single line or split into multiple tasks. 💢
Dividing Jinja2 pipelines into smaller and named local vars (prefixing them with _ is the pattern that we adopted to differentiate them from the actual facts) can make the whole functional expression easier to digest. Also, it allows for inserting comments, so just do it 👍.
- name: Detect or select a Leader
ansible.builtin.set_fact:
# If we cannot detect anything, we default to the first host from the current Federation group.
leader: >-
{{ federation.groups.frontend[0] if _running is falsy else _leader }}
# This fact is used to prevent extra/unneeded restart of OpenNebula (triggered via handlers)
# during initial deployment of a Front-end.
oned_no_restart: >-
{{ federation.groups.frontend
if _running is falsy else
federation.groups.frontend | difference(_peer_names) }}
vars:
# We process shell results from all Front-ends at once.
_results: >-
{{ federation.groups.frontend | map('extract', hostvars, ['shell']) | list }}
_running: >-
{{ _results | selectattr('failed', 'false') | list }}
_documents: >-
{{ _running | map(attribute='stdout') | map('from_json') | list }}
_server_pools: >-
{{ _documents | map(attribute='ZONE.SERVER_POOL') | select | list }}
_peers: >-
{{ [_server_pools[0].SERVER] | flatten | list }}
_peer_names: >-
{{ _peers | map(attribute='NAME') | list }}
_leader: >-
{{ (_peers | selectattr('STATE', '==', '3') | first).NAME }}
To speed up Ansible roles we always minimize total number of tasks and loops. If you can achieve something in a single step (no looping, no task splitting, no switch-like statements) with some Jinja2 code (even if it's slightly more complex), just go for it 👍.
All Jinja2 and Ansible filters are allowed, use them at will. You can also use Python-derived attributes, please take a look at links below.
Warning
Please don't add custom filters and modules implemented in Python. The filters and attributes mentioned above should be enough for most cases. Do it only in very specific, well-justified cases, otherwise please just don't do it.
Some examples of what we consider "correct" vs "incorrect" implementations:
Reduce number of tasks:
INCORRECT:
- name: Task description
ansible.builtin.set_fact:
custom_fact: "{{ (custom_fact | d([])) + ['<%s>' % item] }}"
loop: [1, 2, 3, 4, 5]
CORRECT:
- name: Task description
ansible.builtin.set_fact:
custom_fact: >-
{{ _items | map('regex_replace', '^(.*)$', "<\g<1>>") }}
vars:
_items: [1, 2, 3, 4, 5]
- name: Task description
ansible.builtin.set_fact:
custom_fact: >-
{%- set output = [] -%}
{%- for item in _items -%}
{{- output.append('<%s>' % item) -}}
{%- endfor -%}
{{- output -}}
vars:
_items: [1, 2, 3, 4, 5]
Avoid code branching:
INCORRECT:
- name: Task description
ansible.builtin.apt:
name: [vim, qemu-utils]
update_cache: true
when: ansible_os_family == 'Debian'
- name: Task description
ansible.builtin.yum:
name: [vim, qemu-img]
update_cache: true
when: ansible_os_family == 'RedHat'
CORRECT:
- name: Task description
ansible.builtin.package:
name: "{{ _common + _specific[ansible_os_family] }}"
update_cache: true
vars:
_common: [vim]
_specific:
Debian: [qemu-utils]
RedHat: [qemu-img]
Since Federation support was merged in #38, run_once
can no longer be used freely in most of the roles, only in certain special, selected cases. The reason is that the opennebula
role tries to deploy multiple federated Front-ends in parallel, so run_once
has to be emulated in multiple subgroups of machines at the same time.
You can read more about how run_once
works in the Ansible docs, but to summarize, it basically picks the first machine from all play_hosts
.
So, instead of run_once
you could be using constructions such as:
- when: inventory_hostname == federation.groups.frontend[0] # instead of `run_once: true`
block:
- name: Do the work
ansible.builtin.set_fact:
some_result: 123
- name: Use the result somewhere else
ansible.builtin.set_fact:
some_result: "{{ hostvars[federation.groups.frontend[0]].some_result }}"
- Create your personal fork of the one-deploy repository.
- Add some page inside the wiki section (it can be anything).
- After 2. you'll be able to clone git repository dedicated for the wiki, like so:
git clone [email protected]:you/one-deploy.wiki.git
. - Setup the upstream remote:
git remote add upstream [email protected]:OpenNebula/one-deploy.wiki.git
. - Fetch the current documentation from the upstream repository:
git fetch upstream
. - Rebase your master branch:
git rebase upstream/master
. - Remove commit(s) that are not in upstream:
git reset --hard HEAD~1
(for example). - Push master to your wiki fork with force:
git push origin master -f
. - Access the wiki in your fork to verify if the procedure worked.
You can find Molecule-based integration tests inside the one-deploy
checkout, which can be basically used to deploy pre-configured environments inside existing OpenNebula instances.
We haven't implemented per-role Molecule unit tests (yet), but nonetheless feel free to add some container-based ones if you think it's properly justified.
So far testing has been manual labor, you have to actually deploy something to see if your implementation is correct.