Ubuntu LXD Cluster - hpaluch/hpaluch.github.io GitHub Wiki

Ubuntu LXD Cluster

How to configure LXD Cluster in Ubuntu with FAN Overlay Network.

NOTE: FAN Network is necessary to allow Node to Node communication of containers on public clouds. Why?

Public clouds (AWS, Azure and probably GCE) do not allow usage of network Bridge - because bridge uses more than one MAC address and more than one IP address.

However in Azure you have to:

  • use only single MAC address assigned by Cloud
  • use only IP addresses assigned by Cloud (you can use more than one IP addresses, but you have to request them via Cloud API).

To have full network for containers (including Multicast) you can select one of solutions:

  • Ubuntu FAN
  • NTop's N2N
  • Flannel (mainly for K8s)
  • OpenVPN (complex setup)
  • and possibly others

Setup

We need to have 3 Ubuntu 20 Nodes. I use Azure Virtual Network with these parameters:

  • Subnet: 10.101.0.4/24
  • HW: Standard_B2ms - 2 vCPU, 8 GB RAM, 32GB Premium LRS HDD
  • Nodes:
  • juju-cs1, IP: 10.101.0.4
  • juju-cs2, IP: 10.101.0.5
  • juju-cs3, IP: 10.101.0.6

To ensure that we have latest stable LXD Snap we will do this:

$ snap list lxd

Name  Version  Rev    Tracking      Publisher   Notes
lxd   4.0.8    21835  4.0/stable/…  canonical✓  -

$ sudo snap remove --purge lxd

lxd removed

$ sudo snap install lxd

lxd 4.20 from Canonical✓ installed

$ snap list lxd

Name  Version  Rev    Tracking       Publisher   Notes
lxd   4.20     21858  latest/stable  canonical✓  -

Setup: Create LXD Cluster

To Create cluster run these commands on juju-cs1:

# to create cluster you can run init without sudo
lxd init

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What IP address or DNS name should be used to reach this node? [default=10.101.0.4]:
Are you joining an existing cluster? (yes/no) [default=no]:
What name should be used to identify this node in the cluster? [default=juju-cs1]:
Setup password authentication on the cluster? (yes/no) [default=no]: yes
Trust password for new clients:
Again:
Do you want to configure a new local storage pool? (yes/no) [default=yes]:
Name of the storage backend to use (btrfs, dir, lvm, zfs) [default=zfs]: dir
Do you want to configure a new remote storage pool? (yes/no) [default=no]:
Would you like to connect to a MAAS server? (yes/no) [default=no]:
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]:
Would you like to create a new Fan overlay network? (yes/no) [default=yes]:
What subnet should be used as the Fan underlay? [default=auto]:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

We can verify cluster (1 node so far) using:

$ lxc cluster ls

+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
|   NAME   |           URL           |  ROLES   | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
| juju-cs1 | https://10.101.0.4:8443 | database | x86_64       | default        |             | ONLINE | Fully operational |
+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+

Listing containers (should be empty):

$ lxc ls

+------+-------+------+------+------+-----------+----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
+------+-------+------+------+------+-----------+----------+

Now we will launch sample VM and setup Apache2 there to test communication in future:

lxc launch  ubuntu:20.04 ubu-cs1

Verify that container started and got assigned IP address from FAN network (typically it starts with 240.:

$ lxc ls

+---------+---------+--------------------+------+-----------+-----------+----------+
|  NAME   |  STATE  |        IPV4        | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+---------+---------+--------------------+------+-----------+-----------+----------+
| ubu-cs1 | RUNNING | 240.4.0.106 (eth0) |      | CONTAINER | 0         | juju-cs1 |
+---------+---------+--------------------+------+-----------+-----------+----------+

login to container

lxc exec ubu-cs1 -- bash

root@ubu-cs1:~# apt-get install -y apache2 # this will verify Internet connection echo "

Apache on hostname

" > /var/www/html/index.html curl 127.0.0.1

Must return:

Apache on ubu-cs1

Press Ctrl-d to exit container ubu-cs1


We should also verify that we can access container from Host
(use IP address from `lxc ls`:
```bash
$ curl 240.4.0.106
<h1>Apache on ubu-cs1</h1>

Setup: Add 2nd node to LXD Cluster

First login to your 1st node (where you create cluster) and run (replace juju-cs2 with name of your 2nd node):

lxc cluster add juju-cs2
Member juju-cs2 join token:
NOTE_THIS_TOKEN_INCLUDING_TRAILING=

Now login to your 2nd node - juju-cs2 in my example and do this:

Again reinstall LXD to latest version using:

sudo snap remove --purge lxd
sudo snap install lxd
snap list lxd

Join cluster:

# NOTE: For Joining cluster command sudo is required:
$ sudo lxd init

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What IP address or DNS name should be used to reach this node? [default=10.101.0.5]:
Are you joining an existing cluster? (yes/no) [default=no]: yes
Do you have a join token? (yes/no/[token]) [default=no]: yes
Please provide join token: ENTER_TOKEN_FROM_1ST_NODE
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "source" property for storage pool "local":
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

Now you should see both member of cluster using this command (on any node):

$ lxc cluster ls


+----------+-------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
|   NAME   |           URL           |      ROLES       | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+----------+-------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| juju-cs1 | https://10.101.0.4:8443 | database         | x86_64       | default        |             | ONLINE | Fully operational |
+----------+-------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| juju-cs2 | https://10.101.0.5:8443 | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+----------+-------------------------+------------------+--------------+----------------+-------------+--------+-------------------+

Now create 2nd Ubuntu container on juju-cs2 node (with --target parameter):

$ lxc launch --target juju-cs2 ubuntu:20.04 ubu-cs2

Creating ubu-cs2
Starting ubu-cs2

Now note FAN IP Addresses (starting with 240. in my case):

$ lxc ls

+---------+---------+--------------------+------+-----------+-----------+----------+
|  NAME   |  STATE  |        IPV4        | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+---------+---------+--------------------+------+-----------+-----------+----------+
| ubu-cs1 | RUNNING | 240.4.0.106 (eth0) |      | CONTAINER | 0         | juju-cs1 |
+---------+---------+--------------------+------+-----------+-----------+----------+
| ubu-cs2 | RUNNING | 240.5.0.92 (eth0)  |      | CONTAINER | 0         | juju-cs2 |
+---------+---------+--------------------+------+-----------+-----------+----------+

Login to container on 2nd Node and instal Apache:

# login to container
lxc exec ubu-cs2 -- bash
# install Apache2 and setup unique welcome page
root@ubu-cs2:~#
apt-get install apache2
echo "<h1>Apache on `hostname`</h1>" > /var/www/html/index.html
curl 127.0.0.1
# Must return: <h1>Apache on ubu-cs2</h1>

Now still in your container try to access Apache from ubu-cs1 which is running on 1st node (juju-cs1). You can get its FAN IP address using lxc cluster ls on any node

root@ubu-cs2:~#
curl 240.4.0.106
<h1>Apache on ubu-cs1</h1>
# must return welcome page
# from container `ubu-cs1` running on node `juju-cs1`:
<h1>Apache on ubu-cs1</h1>

Now try same thing from ubu-cs to ubu-cs2

# run this command from any node
lxc exec ubu-cs1 -- curl 240.5.0.92

<h1>Apache on ubu-cs2</h1>

Now repeat same steps for 3rd node (juju-cs3).

When finished your cluster should look like this:

lxc cluster ls

+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
|   NAME   |           URL           |  ROLES   | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
| juju-cs1 | https://10.101.0.4:8443 | database | x86_64       | default        |             | ONLINE | Fully operational |
+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
| juju-cs2 | https://10.101.0.5:8443 | database | x86_64       | default        |             | ONLINE | Fully operational |
+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
| juju-cs3 | https://10.101.0.6:8443 | database | x86_64       | default        |             | ONLINE | Fully operational |
+----------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+

Your container list should look like:

lxc ls

+---------+---------+--------------------+------+-----------+-----------+----------+
|  NAME   |  STATE  |        IPV4        | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+---------+---------+--------------------+------+-----------+-----------+----------+
| ubu-cs1 | RUNNING | 240.4.0.106 (eth0) |      | CONTAINER | 0         | juju-cs1 |
+---------+---------+--------------------+------+-----------+-----------+----------+
| ubu-cs2 | RUNNING | 240.5.0.92 (eth0)  |      | CONTAINER | 0         | juju-cs2 |
+---------+---------+--------------------+------+-----------+-----------+----------+
| ubu-cs3 | RUNNING | 240.6.0.232 (eth0) |      | CONTAINER | 0         | juju-cs3 |
+---------+---------+--------------------+------+-----------+-----------+----------+

Summary

Congratulations! You have now working 3 Node cluster where all containers can transparently communicate across all Nodes thanks to FAN network

Bonus - Adding LXD to Juju

WARNING! Work in Progress

If you want to manage your LXD Cluster with Juju we can follow guide from

WARNING!

Our Juju deployment has one serious drawback - the Public URL of published applications are available from Nodes only (where is FAN interface configured). Currently they can't be accessed externally (using normal node IP) without manual tweaks (like using DNAT or something like this).

Do this to setup Juju with LXD cluster:

  • login to any node juju-cs1 or juju-cs2

  • install juju snap:

    sudo snap install juju --classic
    juju 2.9.18 from Canonical✓ installed
  • bootstrap localhost LXD cloud:

  • WARNING! Do not run yet juju clouds - it will configure localhost LXD without asking for anything!!

  • add cloud:

$ juju add-cloud

Select cloud type: lxd
Enter a name for your lxd cloud: lxdcs
Enter the API endpoint url for the remote LXD server: https://10.101.0.4:8443
Enter region [default]:
Enter the API endpoint url for the region [use cloud api url]:
Enter another region? (y/N):
  • add credentials
juju add-credential lxdcs

Enter credential name: lxdcscred
Select region [any region, credential is not region specific]:
Select auth type [interactive]:
Enter trust-password: # from juju-cs1 node setup
  • bootstrap controller
juju bootstrap lxdcs
# no question asked :-)
  • now poll juju status until everything is active and Idle
juju status -m controller

Model       Controller     Cloud/Region   Version  SLA          Timestamp
controller  lxdcs-default  lxdcs/default  2.9.18   unsupported  16:43:52Z

Machine  State    DNS          Inst id        Series  AZ  Message
0        started  240.4.0.116  juju-4aa468-0  focal       Running

Now it is time of truth:

  • we will follow: https://juju.is/docs/olm/get-started-on-a-localhost
  • deploy hello-juju:
    juju deploy hello-juju
    
    
    Located charm "hello-juju" in charm-hub, revision 8
    Deploying "hello-juju" from charm-hub charm "hello-juju", revision 8 in channel stable
  • poll until everything is Workload=active Agent=idle and Machine State=started:
    juju status --watch 2s
    
    Model    Controller     Cloud/Region   Version  SLA          Timestamp
    default  lxdcs-default  lxdcs/default  2.9.18   unsupported  16:47:14Z
    
    App         Version  Status  Scale  Charm       Store     Channel  Rev  OS      Message
    hello-juju           active      1  hello-juju  charmhub  stable     8  ubuntu
    
    Unit           Workload  Agent  Machine  Public address  Ports   Message
    hello-juju/0*  active    idle   0        240.5.0.4       80/tcp
    
    Machine  State    DNS        Inst id        Series  AZ  Message
    0        started  240.5.0.4  juju-0d2428-0  focal       Running
  • if that status does not change over few minutes than something crashed badly...
  • now Expose hello-juju (not sure if needed in our configuration)
    juju expose hello-juju
  • now deploy database - MUST be postgres:
    juju deploy postgresql
    
    Located charm "postgresql" in charm-hub, revision 235
    Deploying "postgresql" from charm-hub charm "postgresql", revision 235 in channel stable
  • poll until everything is Workload=active Agent=idle and Machine State=started:
    juju status --watch 2s
    
    
    Model    Controller     Cloud/Region   Version  SLA          Timestamp
    default  lxdcs-default  lxdcs/default  2.9.18   unsupported  16:54:09Z
    
    App         Version  Status  Scale  Charm       Store     Channel  Rev  OS      Message
    hello-juju           active      1  hello-juju  charmhub  stable     8  ubuntu
    postgresql  12.9     active      1  postgresql  charmhub  stable   235  ubuntu  Live master (12.9)
    
    Unit           Workload  Agent  Machine  Public address  Ports     Message
    hello-juju/0*  active    idle   0        240.5.0.4       80/tcp
    postgresql/0*  active    idle   1        240.6.0.142     5432/tcp  Live master (12.9)
    
    Machine  State    DNS          Inst id        Series  AZ  Message
    0        started  240.5.0.4    juju-0d2428-0  focal       Running
    1        started  240.6.0.142  juju-0d2428-1  focal       Running
  • now final step - connecting hello-juju to postgresql:
    juju relate postgresql:db hello-juju
  • again wait till Everything is active, idle and started, notice additional argument relation:
    juju status --relations --watch 2s
    
    Model    Controller     Cloud/Region   Version  SLA          Timestamp
    default  lxdcs-default  lxdcs/default  2.9.18   unsupported  16:55:58Z
    
    App         Version  Status  Scale  Charm       Store     Channel  Rev  OS      Message
    hello-juju           active      1  hello-juju  charmhub  stable     8  ubuntu
    postgresql  12.9     active      1  postgresql  charmhub  stable   235  ubuntu  Live master (12.9)
    
    Unit           Workload  Agent  Machine  Public address  Ports     Message
    hello-juju/0*  active    idle   0        240.5.0.4       80/tcp
    postgresql/0*  active    idle   1        240.6.0.142     5432/tcp  Live master (12.9)
    
    Machine  State    DNS          Inst id        Series  AZ  Message
    0        started  240.5.0.4    juju-0d2428-0  focal       Running
    1        started  240.6.0.142  juju-0d2428-1  focal       Running
    
    Relation provider       Requirer                Interface    Type     Message
    postgresql:coordinator  postgresql:coordinator  coordinator  peer
    postgresql:db           hello-juju:db           pgsql        regular
    postgresql:replication  postgresql:replication  pgpeer       peer

Now try using Public address from juju status:

curl -fsS 240.5.0.4
# should return html and lot of styles

To see counter (increased by every hit to homepage - above command you can try:

curl -fsS 240.5.0.4/greetings

{"greetings":3}

Please notice that calling /greetings does NOT increase counter. Only calling Homepage increases that counter.

Inspecting database:

  • now connect to postgresql Unit:
    $ juju ssh postgresql/0
    ...
    ubuntu@juju-d85c7d-2:~$ sudo bash
    root@juju-d85c7d-2:/home/ubuntu# su - postgres
    postgres@juju-d85c7d-2:~$ psql -l
                                      List of databases
        Name    |  Owner   | Encoding | Collate | Ctype |       Access privileges
    ------------+----------+----------+---------+-------+--------------------------------
     hello-juju | postgres | UTF8     | C       | C     | =Tc/postgres                  +
                |          |          |         |       | postgres=CTc/postgres         +
                |          |          |         |       | "juju_hello-juju"=CTc/postgres
     postgres   | postgres | UTF8     | C       | C     |
     template0  | postgres | UTF8     | C       | C     | =c/postgres                   +
                |          |          |         |       | postgres=CTc/postgres
     template1  | postgres | UTF8     | C       | C     | =c/postgres                   +
                |          |          |         |       | postgres=CTc/postgres
    (4 rows)
    
    # now peek into db hello-juju:
    $ psql -d hello-juju
    hello-juju=# \dt
    
                List of relations
     Schema |   Name   | Type  |      Owner
    --------+----------+-------+-----------------
     public | greeting | table | juju_hello-juju
    (1 row)
    
    hello-juju=# select * from greeting;
     id |          created_at
    ----+-------------------------------
      1 | 2021-11-13 15:24:29.899203+00
      2 | 2021-11-13 15:24:31.793331+00
      3 | 2021-11-13 15:24:40.734693+00
      4 | 2021-11-13 15:24:47.48249+00
    (4 rows)
    
    #exit psql
    hello-juju=# \q
    # exit container
    # Ctrl-d

OK - done.

Where to find sources:

Deploying Mediawiki

Now we will deploy additional App - MediaWiki + MySQL

  • at first create dedicated model for mediawiki:
    $ juju add-model mediawiki-mdl
    
    Added 'mediawiki-mdl' model on lxd-cs/default with credential 'lxd-cs-creds' for user 'admin'
  • switch from default model to mediawiki-mdl :

$ juju switch mediawiki-mdl lxd-cs-default:admin/mediawiki-mdl (no change)

- verify that model is empty:
```bash
$ juju status

...
Model "admin/mediawiki-mdl" is empty.
  • deploy mysql charm:

    $ juju deploy mysql
    
    Located charm "mysql" in charm-hub, revision 58
    Deploying "mysql" from charm-hub charm "mysql", revision 58 in channel stable
  • deploy mediawiki charm:

    $ juju deploy mediawiki
    
    Located charm "mediawiki" in charm-hub, revision 28
    Deploying "mediawiki" from charm-hub charm "mediawiki", revision 28 in channel stable
  • now wait few minutes polling juju status until all Agent's status is idle and Workload is active or blocked as shown below:

    $ juju status
    
    ...
    Unit          Workload  Agent  Machine  Public address  Ports     Message
    mediawiki/0*  blocked   idle   1        240.4.0.212               Database required
    mysql/0*      active    idle   0        240.5.0.239     3306/tcp  Ready
    ...
  • create relation mediawiki -> MySQL and espose mediawiki (likely no-op in our case)

    $ juju add-relation mysql mediawiki:db
    
    $ juju expose mediawiki
  • again poll juju status until all Workloads are in state active and Agents are idle:

    $ juju status
    
    Unit          Workload  Agent  Machine  Public address  Ports     Message
    mediawiki/0*  active    idle   1        240.4.0.212     80/tcp    Ready
    mysql/0*      active    idle   0        240.5.0.239     3306/tcp  Ready
  • NOTE: TO access mediawiki (IP 240.4.0.212 in my example) from your browser you need to make some kind of tunnel. In my case I use SSH forward:

  • please see https://charmhub.io/mediawiki for more information on wiki configuration

Cluster vs HA

Note that having Cluster does not mean that we have automatically High-Availibility. In our case here are single-points-of failures:

  1. no replicated storage for database (on MicroK8s we can use OpenEBS+Jiva CSI)
  2. juju is connected to single Node of cluster, if such node fails, juju will fail too
  3. there is no floating IP address for endpoints - we can manually start web app on different node, however it will change its IP address

Resources

⚠️ **GitHub.com Fallback** ⚠️