Ideas virt_net - ansible/community GitHub Wiki

Everything started with issues with the virt_net modules:

This page collects ideas for dicussion to find the right way of a fix / improvement.

General paradigm

From Ansible use case configuration management:

Ansible features an state-driven resource model that describes the desired state of computer systems and services, not the paths to get them to this state. No matter what state a system is in, Ansible understands how to transform it to the desired state (and also supports a "dry run" mode to preview needed changes). This allows reliable and repeatable IT infrastructure configuration, avoiding the potential failures from scripting and script-based solutions that describe explicit and often irreversible actions rather than the end goal."

Good example from https://hvops.com/articles/ansible-vs-shell-scripts (with slight wording improvements):

---
- hosts: all
  tasks:

  - name: Ensure the PGP key is installed
    apt_key: >
      state=present
      id=AC40B2F7
      url="http://keyserver.ubuntu.com/pks/lookup?op=get&fingerprint=on&search=0x561F9B9CAC40B2F7"

  - name: Ensure https support for apt is installed
    apt: >
      state=present
      pkg=apt-transport-https

  - name: Ensure the passenger apt repository is configured
    apt_repository: >
      state=present
      repo='deb https://oss-binaries.phusionpassenger.com/apt/passenger raring main'

  - name: Ensure nginx is installed
    apt: >
      state=present
      pkg=nginx-full

  - name: Ensure passenger is installed
    apt: >
      state=present
      pkg=passenger
      update_cache=yes

  - name: Ensure the nginx configuration is correct
    copy: >
      src=/app/config/nginx.conf
      dest=/etc/nginx/nginx.conf

  - name: Ensure nginx is running
    service: >
      name=nginx
      state=started

Some critical / skeptical words: https://regebro.wordpress.com/2014/09/17/a-script-is-not-configuration

Operational scenarios and usage examples

This section focuses on the user context, when using virt_net. The purpose is to understand the workflow of the user. It is insufficient to just look at libvirt network features.

Developer using a virtual machine for testing on her own machine

An Ansible developer wants to run a virtual machine as staging environment for her Ansible configuration. Could be a network of several virtual machines. Mainly, I assume the virtual machine runs on the local host in the user space.

Basic steps:

  1. Boot up a fresh virtual machine from a fresh image
  2. Bootstrap Ansible playbook
  3. Test everything
  4. Clean up in the end

As part of the first step, we must ensure the virtual staging network is set up as needed.

Example: Adapt the default network

---
- name: Ensure the test environment is set up correctly
  hosts: localhost
  tasks:
    - name: Ensure the default network defined correctly and running
      community.libvirt.virt_net:
        state: present
        xml: '{{ lookup("template", "network_default.xml") }}'

I do not define parameters here, which are already part of the XML template. Especially I avoided the parameter name in the example to see how it feels. The combination of name and xml has issues in the current implementation (see parameter name). However, the default network already exists. The user needs not specify an XML definition, if she is happy with the default definition of libvirt. In this case, she needs a parameter name.

---
- name: Ensure the test environment is set up correctly
  hosts: localhost
  tasks:
    - name: Ensure the default network is running
      community.libvirt.virt_net:
        name: default
        state: present

Example: Additional network

This network can be non-persistent, but persistence would work in any case.

---
- name: Ensure the test environment is set up correctly
  hosts: localhost
  tasks:
    - name: Ensure the network *development* is defined correctly and running
      community.libvirt.virt_net:
        state: present
        xml: '{{ lookup("template", "network_development.xml") }}'

After running the tests, the developer could clean up the development environment.

---
- name: Ensure a cleaned up development environment
  hosts: localhost
  tasks:
    - name: Ensure the network *development* is removed
      community.libvirt.virt_net:
        state: absent
        name: development

Having the parameter name sometimes in and out makes it a bit difficult, to bring the corresponding definitions together, if there are several network definitions.

Testing virtual machines in a CI environment

Testing systems in a continuous integration environment is basically the next step after the previous use case. The CI system might select the right test machine bootstrap the virtual machine, run test case and clean up everything in the end. Ansible can help to create the virtual machine as well as to deploy the current software in the virtual machine. Note: I think libvirt might be good for small setups. For bigger setups we have usual other suspects like OKD, OpenStack etc.

The Ansible playbook is very similar to the previous use case: non-persistent setup (everything managed by Ansible), but no local host.

---
- name: Ensure the test environment is set up correctly
  hosts: {{ staging_host }}
  tasks:
    - name: Ensure the network *development* is defined correctly and running
      community.libvirt.virt_net:
        state: present
        xml: '{{ lookup("template", "network_development.xml") }}'

Run a service in a virtual machine on a dedicated host

Run a service XY in a virtual machine. Again this is for small environments. The host could be selected by the infrastructure file or by a simple management component. In this case, we would configure the autostart option.

---
- name: Ensure the service XY is running
  hosts: all
  tasks:
    - name: Ensure the network *storage* is defined correctly and running
      community.libvirt.virt_net:
        state: present
        autostart: yes
        xml: '{{ lookup("template", "network_storage.xml") }}'

Related work

Similar modules are

Issues with the current implementation and further design aspects

Parameter name

Background

The current implementation allows to specify conflicting network names in the referenced XML file and the playbook. The implementation does not handle this case actively and the documentation does not mention the issue. The behaviour is undefined and leads to effects like that described in ansible-collections/community.libvirt#47.

Docker compose module
  • If the pod is defined inline, the parameter project_name is required. It is outside of the inline definition.
  • If the pod is red from the file system, the project name is derived from the file system path (parameter project_src is required).
Kubernetes module An inline definition or referenced definition overwrites the top level parameter name.
OpenStack subnet module Everything is defined inline. No parameter to reference an external source. No conflicting name

Proposal

I consider the network name helpful in the playbook for clarity. For this reason, I would see this parameter as required and a definition in the referenced file as optional. The module should set or overwrite the name parameter after reading the referenced definition file.

Templates and network / domain definition in the Ansible files

Docker compose module

TODO ...

State, persistent, active

Background

Docker compose module
  • Parameter state (absent or present) defines the persistence.
  • Additional parameters restarted and stopped. They can be conflicting (not defined in the documentation) and might assume a certain precondition. From my point of view, they easily lead to user errors without seeing the benefit.
Kubernetes module
  • Parameter state: “Determines if an object should be created, patched, or deleted. When set to present, an object will be created, if it does not already exist. If set to absent, an existing object will be deleted. If set to present, an existing object will be patched, if its attributes differ from those specified using resource_definition or src.”
  • No further distinction between running or not.
OpenStack subnet module
  • Just parameter state (absent or present). No further distinction between defined or not.
Current virt_net
  • Parameter state with the possible values active, inactive, present, absent. Allows no clear state definition. Inconsistent with other modules.

Proposal simple

Following the OpenStack subnet module, a network can be present or absent. This proposal considers Ansible as the main configuration source. There is no need for a separate libvirt database. If the autostart option is chosen, the network must be defined in libvirt. These considerations result in the following simple proposal.

  • state: absent: The network is not visible in libvirt. If this is not the case, it must be destroyed. In the end, this network is not running and is not defined anymore.
  • state: present: The network is visible and active. If this is not the case, it must be defined and started.

As you can see, the proposal disregards the intermediate state of a network being defined but not active. This is an intermediate state we must see in the facts database, but not a state of practical use in an production setup with Ansible. The network is always defined in the Ansible database. It is simple to use and understand, in alignment with the OpenStack modules and avoids some pitfalls of the current virt_net implementation.

All other libvirt states can be relevant for development purposes to test something. For this, we have more appropriate tools / scripting languages like Python or shell.

Machines of execution

We can distinguish four machines to bootstrap a virtual machine with Ansible.

  1. Machine that executes the playbook
  2. Machine on which the libvirt client runs
  3. Machine the libvirt client connects to
  4. Instantiated and booted virtual machine that needs to be set up via Ansible

TODO ...

Event API or procedural API

TODO ...

Retrieving information, gather facts

Do we need the commands info, facts, get_xml, status and list_nets all together?

From VMware module: "Note that this play disables the gather_facts parameter, since you don’t want to collect facts about localhost."

TODO ...

Non-idempotent commands

Background

The current virt_net module has some commands, which are not idempotent. They describe actions, not target states of the system.

  • define
  • create
  • start
  • stop
  • destroy
  • undefine
  • modify

Related modules do not use such commands as well.

Docker compose module No direct commands like these.
Kubernetes module No such commands.
OpenStack subnet module No such commands.

Such commands contradict the paradigm proposed in the section general paradigm, too.

Proposal

I propose to deprecate these commands. The user can directly use Python or shell scripting. These tools are made for scripting.

⚠️ **GitHub.com Fallback** ⚠️