OpenStack StarlingX - hpaluch/hpaluch.github.io GitHub Wiki
StarlingX is packaged Kubernetes and (optionally) OpenStack with few additions (Ceph FS, multiple Helm application Manager - AirshipArmada or in future FluxCD) for so called "Edge Cloud".
To quote https://docs.starlingx.io/
StarlingX is a fully integrated edge cloud software stack that provides everything needed to deploy an edge cloud on one, two, or up to 100 servers.
Edge means that every Location (for example company branch office) has 1 or few local servers that has installed fully autonomous OpenStack (and few other components) for reliable and low latency computing (meaning: be independent of Internet connection to external Cloud).
- one important feature is, for example, that all required repositories (YUM for OS and Docker Registry) are locally mirrored. So you should be able to reinstall your applications even when Internet connection is broken.
 - another important feature is 
sysinv(System Inventory) - all resources (including CPus, Memory, Disk Partitions LVM (LVs, VGs, PVs), network interfaces, Ceph,...) are managed usingsystemcommand (which is just client to API server) and it enforces known system state using Puppet provisioning tool. 
Many parts of StarlingX are developed and supported by WindRiver. Please see https://www.windriver.com/studio/operator/starlingx for more information.
Project homepage is on:
Please note that such Production server(s) should have at least 32GB RAM and 500GB SSD disk space as can be found on:
However for test we may try nested VM using Libvirt.
We will use nested VM using Libvirt. There is official installation guide on: https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex.html that we will mostly follow.
I will use Ubuntu 20.04 LTS VM in Azure as "Host PC".
Our Azure VM must meet 3 requirements:
- must support nested virtualization: see https://azure.microsoft.com/en-us/blog/introducing-the-new-dv3-and-ev3-vm-sizes/ for list os supported VMs
 - must have at least around ~20GB of RAM (nested VM requires around 18 GB of RAM)
 - must have at least 8 cores (nested VM requires 6 cores)
 - I have selected 
Standard_D8s_v3 - WARNING! As of 2022-07-16 such VM costs around $300/month (!)
 - I strongly recommend to monitor your spending and add Auto Shutdown feature (omitted in script!)
 
You have to update at least these variables before running the script below:
- 
subnet=xxxxxto point to your Subnet in your Virtual Net (vNet) - 
ssh_key_path=pwd/hp_vm2.pubto point to your SSH public key that you will use to connect to VM 
Here is my create_vm_ubuntu_for_stx.sh script to setup VM openstack-stx.
Run it in Azure Bash in Azure portal:
with public IP:
#!/bin/bash
set -ue -o pipefail
# Your SubNet ID
subnet=/subscriptions/xxx/resourceGroups/VpnGatewayRG101/providers/Microsoft.Network/virtualNetworks/VNet101/subnets/FrontEnd 
ssh_key_path=`pwd`/hp_vm2.pub 
rg=OsStxRG
loc=germanywestcentral
vm=openstack-stx
IP=$vm-ip
opts="-o table"
# URN from command:
# az vm image list --all -l germanywestcentral -f 0001-com-ubuntu-server-focal -p canonical -s 20_04-lts-gen2 -o table 
image=Canonical:0001-com-ubuntu-server-focal:20_04-lts-gen2:latest
set -x
az group create -l $loc -n $rg $opts
az network public-ip create -g $rg -l $loc --name $IP --sku Basic $opts
az vm create -g $rg -l $loc \
    --image $image  \
    --nsg-rule NONE \
    --subnet $subnet \
    --public-ip-address "$IP" \
    --storage-sku Premium_LRS \
    --size Standard_D8s_v3 \
    --os-disk-size-gb 128 \
    --ssh-key-values $ssh_key_path \
    --admin-username azureuser \
    -n $vm $opts
set +x
cat <<EOF
You may access this VM in 2 ways:
1. using Azure VPN Gateway 
2. Using Public IP - in such case you need to add appropriate
   SSH allow in rule to NSG rules of this created VM
EOF
exit 0Follow above instructions and login to above VM to continue.
Note: in text below I will call:
- 
Host- parent Azure VM (openstack-stx) - 
Libvirt VM- nested VM running StarlingX controller, with libvirt machine name (called domain - from Xen times)simplex-controller-0. Once this nested VM is insatalled it wil have hostnamecontroller-0 
Verify that your Azure VM supports nested virtualization:
$ ls -l /dev/kvm
crw-rw---- 1 root kvm 10, 232 Jul 16 06:42 /dev/kvmIf above device does not exist you need to use different type
(called Size in Azure) of VM.
NOTE!
Just recently found great article
Deploy a virtual StarlingX Simplex nodeon
- https://ericho.github.io/2019-09-12-deploy-virtual-starlingx/
 - https://opendev.org/starlingx/test/src/branch/master/automated-robot-suite/README.rst where author use this StarlingX test project:
 git clone https://opendev.org/starlingx/test.git cd automated-robot-suiteIt seems to be even even more comfortable (just specifying which python test suite to run).
However I did not try it yet.
Inside VM prepare system:
sudo apt-get update
sudo apt-get dist-upgrade
# reboot recommended if kernel or critical system components (libc)
# were updated.Please ensure that your shell is Bash:
echo $SHELL
/bin/bashNow we will get source and install required packages:
sudo apt-get install git
cd
git clone https://opendev.org/starlingx/tools.git stx-tools
cd stx-tools/
git describe --always --long
# my version is: vr/stx.6.0-404-g591be74
cd deployment/libvirt/
sudo ./install_packages.sh 
Package libvirt-bin is not available, but is referred to by another package.Ubuntu 20.04 LTS no longer contains above package so we have to fix it manually:
sudo apt-get install virt-managerOptional: if you want to run virsh command without
sudo you can add yourself to libvirt group using:
sudo /usr/sbin/usermod -G libvirt -a $USERLogout and login to Host (Azure VM) so this change take in effect.
Run again:
sudo ./install_packages.sh Now you can safely ignore all libvirt-bin related errors. Manually restart
right service:
sudo systemctl restart libvirtdAnd again follow guide:
sudo apt install -y apparmor-profiles
sudo apt-get install -y ufw
sudo ufw disable
sudo ufw status
# should output:
# Status: inactiveNow run setup_network.sh:
./setup_network.shVerify that this script really setup network:
- 
it should created 4 bridges:
$ ip -br l | fgrep stxbr stxbr1 UNKNOWN 76:7f:f1:6e:f0:37 <BROADCAST,MULTICAST,UP,LOWER_UP> stxbr2 UNKNOWN e6:f2:94:07:73:9e <BROADCAST,MULTICAST,UP,LOWER_UP> stxbr3 UNKNOWN e2:fa:74:8c:ed:95 <BROADCAST,MULTICAST,UP,LOWER_UP> stxbr4 UNKNOWN 7a:cc:b7:d0:aa:87 <BROADCAST,MULTICAST,UP,LOWER_UP>
 - 
first bridge
stxbr1should have assigned hardcoded IP address:$ ip -br -4 a | fgrep stxbr stxbr1 UNKNOWN 10.10.10.1/24 - 
there should be created this NAT rule that allows Internet access from above
stxbr1:$ sudo /sbin/iptables -t nat -L POSTROUTING Chain POSTROUTING (policy ACCEPT) target prot opt source destination LIBVIRT_PRT all -- anywhere anywhere MASQUERADE all -- 10.10.10.0/24 anywhere - 
last rule IP
10.10.10.0/24allows Internet access fromstxbr1 
If all above requirements are met you can continue:
- we have to follow guide and download ISO
 - in your browser go to: http://mirror.starlingx.cengn.ca/mirror/starlingx/
 - and look for latest ISO
 - in my case this link is right:
 
So back In VM download above ISO using:
cd
curl -fLO http://mirror.starlingx.cengn.ca/mirror/starlingx/release/6.0.0/centos/flock/outputs/iso/bootimage.iso
# optional for Azure - copy ISO to SSD for better speed:
sudo cp ~/bootimage.iso /mntBefore creating VM I recommend to up memory and CPUs. In my case I made these changes
diff --git a/deployment/libvirt/controller_allinone.xml b/deployment/libvirt/controller_allinone.xml
index 6f7272e..ec209a1 100644
--- a/deployment/libvirt/controller_allinone.xml
+++ b/deployment/libvirt/controller_allinone.xml
@@ -1,8 +1,8 @@
 <domain type='kvm' id='164'>
   <name>NAME</name>
-  <memory unit='GiB'>18</memory>
-  <currentMemory unit='GiB'>18</currentMemory>
-  <vcpu placement='static'>6</vcpu>
+  <memory unit='GiB'>26</memory>
+  <currentMemory unit='GiB'>26</currentMemory>
+  <vcpu placement='static'>7</vcpu>
   <resource>
     <partition>/machine</partition>
   </resource>
@@ -16,7 +16,7 @@
   </features>
   <cpu match='exact'>
     <model fallback='forbid'>Nehalem</model>
-    <topology sockets='1' cores='6' threads='1'/>
+    <topology sockets='1' cores='7' threads='1'/>
     <feature policy='optional' name='vmx'/>(My Azure VM has 32GB RAM and 8 vCPUs - so it should be safe).
Now create and start VM controller-0 using:
cd ~/stx-tools/deployment/libvirt/
./setup_configuration.sh -c simplex -i /mnt/bootimage.isoYou can safely ignore cannot open display: message.
- Now connect to serial console using:
$ sudo virsh console simplex-controller-0
 - Do NOT press ENTER yet!!! - Because it would select Wrong type of installation. If you already pressed ENTER accidentaly - press ESC to return to main menu.
 - to redraw menu press 
Ctrl-L - select 
All-in-one Controller Configuration->Serial Console - now there will proceed complete KickStart (Anaconda installation)
 - in my case it installs 
1206 packages - NOTE: you can any time disconnect from serial console using 
Ctrl-] - and later reconnect with same 
virsh consolecommand (sometimes--forceis needed if your connection was canceled abruptly... 
After this nested VM simplex-controller-0 reboots, we can follow
- https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex_install_kubernetes.html#id2
 - there will be several errors, because there no network configured yet
 - login as sysadmin/sysadmin
 - you will be forced to change pasword. Unfortunately there are many strict rules (including dictionary check) that you must adher to change password succesfully...
 
Now we have to temporarily config network in this libvirt
VM. As sysadmin look what network interface to use:
$ ip -br l
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
eth1000          UP             52:54:00:5a:eb:79 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1001          UP             52:54:00:5b:68:3a <BROADCAST,MULTICAST,UP,LOWER_UP>
enp2s1           UP             52:54:00:28:24:a3 <BROADCAST,MULTICAST,UP,LOWER_UP>
enp2s2           UP             52:54:00:e1:c9:34 <BROADCAST,MULTICAST,UP,LOWER_UP>In our case the right device is enp2s1. If you are not sure you can
dump network interface assignment
# run on Host
$ sudo virsh domiflist simplex-controller-0
 Interface   Type     Source   Model    MAC
-----------------------------------------------------------
 vnet0       bridge   stxbr1   e1000    52:54:00:28:24:a3
 vnet1       bridge   stxbr2   e1000    52:54:00:e1:c9:34
 vnet2       bridge   stxbr3   virtio   52:54:00:5a:eb:79
 vnet3       bridge   stxbr4   virtio   52:54:00:5b:68:3a- So 
stxbr1has MAC52:54:00:28:24:a3 - looking inside VM:
ip -br l | fgrep 52:54:00:28:24:a3 enp2s1 UP 52:54:00:28:24:a3 <BROADCAST,MULTICAST,UP,LOWER_UP>
 - NOTE: these mac addresses are hardoced in setup scripts so they should be same
 - now inside Libvirt VM create network setup script 
net_setup.shwith contents:MY_DEV=enp2s1 export CONTROLLER0_OAM_CIDR=10.10.10.3/24 export DEFAULT_OAM_GATEWAY=10.10.10.1 sudo ip address add $CONTROLLER0_OAM_CIDR dev $MY_DEV sudo ip link set up dev $MY_DEV sudo ip route add default via $DEFAULT_OAM_GATEWAY dev $MY_DEV
 - and SOURCE it:
. ./net_setup.sh # can be asked by sudo for password # - enter sysadmin's password to proceed
 - if network is correctly setup than Internet access must work
(however without DNS because there is empty 
/etc/resolv.conf. - so try in Libvirt VM
host www.cnn.com 8.8.8.8 # should return addresses... - NOTE: Default Ansible configuraion contains all necessary information.
You can just copy it to HOME for reference using:
cp /usr/share/ansible/stx-ansible/playbooks/host_vars/bootstrap/default.yml \ ~/ - now cross your fingers and run (we run playbook using sudo
because it sometimes wants password in the middle of
installation and breaks ansible):
sudo ansible-playbook /usr/share/ansible/stx-ansible/playbooks/bootstrap.yml
 - if you are bored while ansible is running you can connect from your Host (Azure VM)
to this libvirt VM using command:
ssh [email protected] # WARNING! After another reboot shi address will change # to 10.10.10.2 !!!
- peek into 
/etc/os-release:PRETTY_NAME="CentOS Linux 7 (Core)" - find more details about magic 
systemcommand (withoutdsuffix):$ rpm -qf /usr/bin/system cgts-client-1.0-276.tis.x86_64 $ rpm -qi cgts-client | egrep '^(Summary|Packager)' Packager : Wind River <[email protected]> Summary : System Client and CLI
 
 - peek into 
 
Ansible installation should end with messages like:
bootstrap/bringup-bootstrap-applications : Check if application already exists -- 16.76s
common/armada-helm : Launch Armada with Helm v3 ------------------------ 16.11s
bootstrap/bringup-bootstrap-applications : Upload application ---------- 14.14s
After ansible finished we must configure OAM (Operations, Administration and Management) network (network, where all StarlingX APIs and services are exposed) to enable controller at all:
- https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex_install_kubernetes.html#id1
 - scroll down to section 
Configure controller-0on above web page - enter these commands in Libvirt VM (user 
sysadmin): - find OAM Interface:
$ ip -br -4 a | fgrep 10.10.10. enp2s1 UP 10.10.10.3/24 - configure 
enp2s1as OAM Interface$ source /etc/platform/openrc $ OAM_IF=enp2s1 $ system host-list +----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | locked | disabled | online | +----+--------------+-------------+----------------+-------------+--------------+ $ system host-if-list 1 +--------------------------------------+------+----------+---------+------+-------+------+------+------------+ | uuid | name | class | type | vlan | ports | uses | used | attributes | | | | | | id | | i/f | by | | | | | | | | | | i/f | | +--------------------------------------+------+----------+---------+------+-------+------+------+------------+ | 8d66dac8-ba2c-499d-98aa-c76ca540af5c | lo | platform | virtual | None | [] | [] | [] | MTU=1500 | +--------------------------------------+------+----------+---------+------+-------+------+------+------------+ # we have to replace loopback with our Admin interface (OAM): $ system host-if-modify controller-0 $OAM_IF -c platform $ system interface-network-assign controller-0 $OAM_IF oam # verify setting: $ system host-if-list 1 +--------------------------------------+--------+----------+----------+---------+-------------+----------+-------------+------------+ | uuid | name | class | type | vlan id | ports | uses i/f | used by i/f | attributes | +--------------------------------------+--------+----------+----------+---------+-------------+----------+-------------+------------+ | 43adf65d-1579-4770-afb1-923f095be6a2 | enp2s1 | platform | ethernet | None | [u'enp2s1'] | [] | [] | MTU=1500 | | 8d66dac8-ba2c-499d-98aa-c76ca540af5c | lo | platform | virtual | None | [] | [] | [] | MTU=1500 | +--------------------------------------+--------+----------+----------+---------+-------------+----------+-------------+------------+
 
Unfortunately we need still lot of thing to configure. Try again this command:
$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname     | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1  | controller-0 | controller  | locked         | disabled    | online       |
+----+--------------+-------------+----------------+-------------+--------------+Notice locked and disabled. It means that our controller is not yet able to service.
Many Kubernetes applicatins require persistent storage. It is called:
- PV - Persistent Volume - configured by administrator
 - PVC - Persistent Volume Claim - ks8 application request presistent storaget using these request
 
So we must configure Ceph
- https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex_install_kubernetes.html#configure-controller-0
 - chapter 
Optionally, initialize a Ceph-based Persistent Storage BackendThere are two options 
- Host base Ceph
 - K8s based Ceph (Rook)
 
I always prefer Host over Containers so try it:
$ system storage-backend-list
Empty output...
$ system storage-backend-add ceph --confirmed
ystem configuration has changed.
Please follow the administrator guide to complete configuring the system.
+--------------------------------------+------------+---------+------------+------+----------+----------------+
| uuid                                 | name       | backend | state      | task | services | capabilities   |
+--------------------------------------+------------+---------+------------+------+----------+----------------+
| 960f7afc-c309-47d6-bc0f-78afe530c5b1 | ceph-store | ceph    | configured | None | None     | min_replicatio |
|                                      |            |         |            |      |          | n: 1           |
|                                      |            |         |            |      |          | replication: 1 |
|                                      |            |         |            |      |          |                |
+--------------------------------------+------------+---------+------------+------+----------+----------------+
$ system host-disk-list 1                 
+--------------------------------------+-------------+------------+-------------+----------+---------------+-...
| uuid                                 | device_node | device_num | device_type | size_gib | available_gib | ...
+--------------------------------------+-------------+------------+-------------+----------+---------------+-...
| a205e8e0-c5aa-41f8-92bb-daec6381fab1 | /dev/sda    | 2048       | HDD         | 600.0    | 371.679       | ...
| 4880a1d2-e629-4513-baf2-399dfb064410 | /dev/sdb    | 2064       | HDD         | 200.0    | 199.997       | ...
| 1fdeb451-6d6f-4950-9d99-d8215c10ed47 | /dev/sdc    | 2080       | HDD         | 200.0    | 199.997       | ...
+--------------------------------------+-------------+------------+-------------+----------+---------------+ ...
# note UUID for /dev/sdb:
# 4880a1d2-e629-4513-baf2-399dfb064410 
$ system host-stor-add 1 4880a1d2-e629-4513-baf2-399dfb064410
$ system host-stor-list 1           
+--------------------------------------+----------+-------+-----------------------+-...
| uuid                                 | function | osdid | state                 | ...
+--------------------------------------+----------+-------+-----------------------+-...
| 292ae029-f652-4e6c-b046-18daacd80a76 | osd      | 0     | configuring-on-unlock | ...
+--------------------------------------+----------+-------+-----------------------+-...Notice configuring-on-unlock in column state
We want to use openstack so we have to follow For OpenStack only:
section in guide.
Create script setup_os.sh with contents:
#!/bin/bash
set -xeuo pipefail
DATA0IF=eth1000
DATA1IF=eth1001
export NODE=controller-0
PHYSNET0='physnet0'
PHYSNET1='physnet1'
SPL=/tmp/tmp-system-port-list
SPIL=/tmp/tmp-system-host-if-list
system host-port-list ${NODE} --nowrap > ${SPL}
system host-if-list -a ${NODE} --nowrap > ${SPIL}
DATA0PCIADDR=$(cat $SPL | grep $DATA0IF |awk '{print $8}')
DATA1PCIADDR=$(cat $SPL | grep $DATA1IF |awk '{print $8}')
DATA0PORTUUID=$(cat $SPL | grep ${DATA0PCIADDR} | awk '{print $2}')
DATA1PORTUUID=$(cat $SPL | grep ${DATA1PCIADDR} | awk '{print $2}')
DATA0PORTNAME=$(cat $SPL | grep ${DATA0PCIADDR} | awk '{print $4}')
DATA1PORTNAME=$(cat  $SPL | grep ${DATA1PCIADDR} | awk '{print $4}')
DATA0IFUUID=$(cat $SPIL | awk -v DATA0PORTNAME=$DATA0PORTNAME '($12 ~ DATA0PORTNAME) {print $2}')
DATA1IFUUID=$(cat $SPIL | awk -v DATA1PORTNAME=$DATA1PORTNAME '($12 ~ DATA1PORTNAME) {print $2}')
system datanetwork-add ${PHYSNET0} vlan
system datanetwork-add ${PHYSNET1} vlan
system host-if-modify -m 1500 -n data0 -c data ${NODE} ${DATA0IFUUID}
system host-if-modify -m 1500 -n data1 -c data ${NODE} ${DATA1IFUUID}
system interface-datanetwork-assign ${NODE} ${DATA0IFUUID} ${PHYSNET0}
system interface-datanetwork-assign ${NODE} ${DATA1IFUUID} ${PHYSNET1}
exit 0And run it.
Now we have to follow OpenStack-specific host configuration:
system host-label-assign controller-0 openstack-control-plane=enabled
system host-label-assign controller-0 openstack-compute-node=enabled
system host-label-assign controller-0 openvswitch=enabledNow we have to folow For OpenStack Only: Set up disk partition for nova-local volume group, which is needed for stx-openstack nova ephemeral disks.:
Create script setup_os_storage.sh
#!/bin/bash
set -xeuo pipefail
export NODE=controller-0
echo ">>> Getting root disk info"
ROOT_DISK=$(system host-show ${NODE} | grep rootfs | awk '{print $4}')
ROOT_DISK_UUID=$(system host-disk-list ${NODE} --nowrap | grep ${ROOT_DISK} | awk '{print $2}')
echo "Root disk: $ROOT_DISK, UUID: $ROOT_DISK_UUID"
echo ">>>> Configuring nova-local"
NOVA_SIZE=34
NOVA_PARTITION=$(system host-disk-partition-add -t lvm_phys_vol ${NODE} ${ROOT_DISK_UUID} ${NOVA_SIZE})
NOVA_PARTITION_UUID=$(echo ${NOVA_PARTITION} | grep -ow "| uuid | [a-z0-9\-]* |" | awk '{print $4}')
system host-lvg-add ${NODE} nova-local
sleep 60
system host-pv-add ${NODE} nova-local ${NOVA_PARTITION_UUID}
sleep 60
exit 0And run it.
Now moment of truth - Unlocking controller - this time we shall see if all components really work:
$ system host-unlock controller-0
# WARNING! Restart will follow....NOTE: In my case network startup took around 3 minutes. Don't know why...
After reboot the main IP of Libvirt Vm changed, you now have to use:
To connect to Libvirt VM.
After reboot, login as sysadmin and verify host status:
$ source /etc/platform/openrc
$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname     | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1  | controller-0 | controller  | unlocked       | enabled     | available    |
+----+--------------+-------------+----------------+-------------+--------------+It must be in state unlocked and enabled and available
Also try:
$ system application-list | sed -r 's/(.{75}).*/\1.../'
+--------------------------+---------+-----------------------------------+-...
| application              | version | manifest name                     | ...
+--------------------------+---------+-----------------------------------+-...
| cert-manager             | 1.0-26  | cert-manager-manifest             | ...
| nginx-ingress-controller | 1.1-18  | nginx-ingress-controller-manifest | ...
| oidc-auth-apps           | 1.0-61  | oidc-auth-manifest                | ...
| platform-integ-apps      | 1.0-44  | platform-integration-manifest     | ...
| rook-ceph-apps           | 1.0-14  | rook-ceph-manifest                | ...
+--------------------------+---------+-----------------------------------+-...
$ ceph -s
  cluster:
    id:     d993e564-bd99-4a4d-946c-a3aa090da4f9
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum controller-0 (age 23m)
    mgr: controller-0(active, since 21m)
    mds: kube-cephfs:1 {0=controller-0=up:active}
    osd: 1 osds: 1 up (since 20m), 1 in (since 20m)
 
  data:
    pools:   3 pools, 192 pgs
    objects: 22 objects, 2.2 KiB
    usage:   107 MiB used, 199 GiB / 199 GiB avail
    pgs:     192 active+clean
$ kubectl get ns
NAME              STATUS   AGE
armada            Active   108m
cert-manager      Active   102m
default           Active   109m
deployment        Active   102m
kube-node-lease   Active   109m
kube-public       Active   109m
kube-system       Active   109mAt least when using Host based Ceph (my case) there was observerd improper shutdown - where kernel RBD client reported lost-write to Ceph (which was already shut).
TODO:
Even after lot of work we have not installed OpenStack yet(!). We have to follow this guide
- https://docs.starlingx.io/deploy_install_guides/r6_release/openstack/install.html
 - login to Libvirt VM as 
sysadmin 
First ve have to increase LV for Docker from 30GB to 60GB.
- verify current assignments:
$ system host-fs-list 1 | sed -r 's/.{39}//' +---------+-------------+----------------+ | FS Name | Size in GiB | Logical Volume | +---------+-------------+----------------+ | backup | 25 | backup-lv | | docker | 30 | docker-lv | | kubelet | 10 | kubelet-lv | | scratch | 16 | scratch-lv | +---------+-------------+----------------+
 - theoretically  it should be easy:
$ system host-fs-modify controller-0 docker=60 HostFs update failed: Not enough free space on cgts-vg. Current free space 16 GiB, requested total increase 30 GiB
 - it can be confirmed with this command:
sudo vgs VG #PV #LV #SN Attr VSize VFree cgts-vg 1 12 0 wz--n- <178.97g <16.16g nova-local 1 1 0 wz--n- <34.00g 0
 
Now we clone script that created VG nova-local and reuse it
to create new disk partition and add it to VG cgts-vg
Create script resize_os_docker.sh with these contents:
!/bin/bash
set -xeuo pipefail
export NODE=controller-0
echo ">>> Getting root disk info"
ROOT_DISK=$(system host-show ${NODE} | grep rootfs | awk '{print $4}')
ROOT_DISK_UUID=$(system host-disk-list ${NODE} --nowrap | grep ${ROOT_DISK} | awk '{print $2}')
echo "Root disk: $ROOT_DISK, UUID: $ROOT_DISK_UUID"
echo ">>>> Extending VG cgts-vg +50GB"
NOVA_SIZE=50
NOVA_PARTITION=$(system host-disk-partition-add -t lvm_phys_vol ${NODE} ${ROOT_DISK_UUID} ${NOVA_SIZE})
NOVA_PARTITION_UUID=$(echo ${NOVA_PARTITION} | grep -ow "| uuid | [a-z0-9\-]* |" | awk '{print $4}')
sleep 60  # it takes time before PV is created
system host-pv-add ${NODE} cgts-vg ${NOVA_PARTITION_UUID}
sleep 60  # it takes time before PV is added to VG !!!
exit 0And run it. Unfortunately there are few timing races - so sometimes it is needed to recreate it manually. If above script was succesfull you can verify it with:
sudo pvs
  PV         VG         Fmt  Attr PSize    PFree  
  /dev/sda5  cgts-vg    lvm2 a--  <178.97g <16.16g
  /dev/sda6  nova-local lvm2 a--   <34.00g      0 
  /dev/sda8  cgts-vg    lvm2 a--   <49.97g <49.97g # << NEW PVNOTE: You will likely have /dev/sda7 as partition (I made some
experiments before runing it).
Aand finally VG should after a while see new free space:
sudo vgs
  VG         #PV #LV #SN Attr   VSize    VFree 
  cgts-vg      2  12   0 wz--n- <228.94g 66.12g
  nova-local   1   1   0 wz--n-  <34.00g     0 Notice VFree 66GB - should be enough for docker.
Now you can finally resume guide on
- https://docs.starlingx.io/deploy_install_guides/r6_release/openstack/install.html
 - and run:
$ system host-fs-modify controller-0 docker=60 ...+---------+-------------+----------------+ ...| FS Name | Size in GiB | Logical Volume | ...+---------+-------------+----------------+ ...| backup | 25 | backup-lv | ...| docker | 60 | docker-lv | ...| kubelet | 10 | kubelet-lv | ...| scratch | 16 | scratch-lv | ...+---------+-------------+----------------+
 
Now we have to find latest OpenStack application
- go to http://mirror.starlingx.cengn.ca/mirror/starlingx/release/6.0.0/centos/flock/outputs/helm-charts/
 - and download to your Libvirt vm (as sysadmin) suitable package,
for example:
[sysadmin@controller-0 ~(keystone_admin)]$ cd curl -fLO http://mirror.starlingx.cengn.ca/mirror/starlingx/release/6.0.0/centos/flock/outputs/helm-charts/stx-openstack-1.0-140-centos-stable-latest.tgz ls -l stx-openstack-1.0-140-centos-stable-latest.tgz -rw-r--r-- 1 sysadmin sys_protected 1804273 Jul 16 15:43 stx-openstack-1.0-140-centos-stable-latest.tgz
 - if you are brave upload Application:
$ system application-upload stx-openstack-1.0-140-centos-stable-latest.tgz # now poll using command $ system application-show stx-openstack # it must report progress: completed, status: uploaded
 - and install it:
system application-apply stx-openstack
 - again poll with:
watch -n 60 system application-show stx-openstack
 
If you were lucky you can access OpenStack by following:
WARNING!
When I rebooted machine with installed OpenStack, I saw these grave messages on console
rbd: lost write It means that there were active RBD (Remote Block Device) - networked disk device connected to ceph.
I suspect that systemd terminates services in wrong order, thus risking data loss to any containers using PV from Ceph (!!!)
Kernel messages:
EXT4-fs error (device rbd0): __ext4_find_entry:1536: inode #2: comm start.py: reading directory lblock 0
libceph: connect (1)192.168.204.2:6789 error -101
libceph: connect (1)192.168.204.2:6789 error -101
To access OpenStack at all we have to create different context file, by following docs run:
sed '/export OS_AUTH_URL/c\export OS_AUTH_URL=http://keystone.openstack.svc.cluster.local/v3' /etc/platform/openrc > ~/openrc.os
Now you have to remember:
- to access OpenStack run:
source ~/openrc.os
 - to access Starlingx (
systemcommand , etc.), runsource /etc/platform/openrc 
These commands should work:
source ~/openrc.os
openstack flavor list
openstack image listWARNING! In my case OpenStack is excssively hungry:
$ uptime
 07:36:40 up 46 min,  2 users,  load average: 18.09, 24.35, 19.93
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
15  0      0 3326296 242220 4103136    0    0   332   158 2115 1180 42 15 40  4  0
36  1      0 3295668 242304 4103532    0    0    72   668 11102 18076 52 12 32  4  0
34  1      0 3295984 242372 4103664    0    0     0  1320 9557 14930 54  8 36  2  0
46  0      0 3274756 242496 4103976    0    0    96  1032 13529 22478 58 13 28  0  0You can see that 1st column (number of processes in Run Queue - processed that need CPU but has to wait for it) is sometimes up to 50...
It is also confirmed by alarms:
source /etc/platform/openrc
fm alarm-list
# if you want to see details run
fm alarm-list --uuid
# and pass uuid to: fm alarm-showSo I'm not sure - how can be that used for low latency edge computing...
Another problem:
But - after reboot I had no luck:
openstack flavor list
Failed to discover available identity versions when contacting http://keystone.openstack.svc.cluster.local/v3. Attempting to parse version from URL.
Service Unavailable (HTTP 503)To see all openstack components in Kubernetes we can use this command
kubectl get pod -n openstackThere are 111 pods in my case(!).
To quickly find problematic pods we can try:
$ kubectl get pod -n openstack | fgrep Running | fgrep '0/'
mariadb-ingress-847cdb5dfb-4zgd6 0/1 Running 1  15h
mariadb-server-0                 0/1 Running 2  15hRunning 0 out of 1 is definitely problem... Trying:
kubectl logs mariadb-server-0 -n openstack
# nothing suspiciousBut:
kubectl describe pod/mariadb-server-0 -n openstack
# Hmm
  Normal   Killing                 11m                kubelet                  Container mariadb failed startup probe, will be restarted
Warning  Unhealthy  8m46s (x12 over 20m)  kubelet  Startup probe failed:To get terminal in container:
kubectl exec -it mariadb-server-0 -n openstack sh
# use exit to exit containerAlso:
source /etc/platform/openrc 
fm alarm-list
+-------+----------------------------+---------------+----------+-------------+
| Alarm | Reason Text                | Entity ID     | Severity | Time Stamp  |
| ID    |                            |               |          |             |
+-------+----------------------------+---------------+----------+-------------+
| 270.  | Host controller-0 compute  | host=         | critical | 2022-07-17T |
| 001   | services failure           | controller-0. |          | 07:26:01.   |
|       |                            | services=     |          | 637858      |
|       |                            | compute       |          |             |
|       |                            |               |          |             |
...However you need fm alarm-list  --uuid to get
UUID for fm alarm-show (ooohhhh).
NOTE: After around 30 minutes this error resolved....
So I can now see flavours:
$ openstack flavor list | sed -r 's/.{39}/.../;s/.{12}$/.../'
...+-----------+-------+------+-----------+-------+...
...| Name      |   RAM | Disk | Ephemeral | VCPUs |...
...+-----------+-------+------+-----------+-------+...
...| m1.small  |  2048 |   20 |         0 |     1 |...
...| m1.large  |  8192 |   80 |         0 |     4 |...
...| m1.medium |  4096 |   40 |         0 |     2 |...
...| m1.xlarge | 16384 |  160 |         0 |     8 |...
...| m1.tiny   |   512 |    1 |         0 |     1 |...
...+-----------+-------+------+-----------+-------+...Hmm:
openstack image list
# no image...Fortunately I alredy wrote guide on OpenStack-from-Scratch. So try:
source ~/openrc.os
cd
curl -OLf http://download.cirros-cloud.net/0.5.1/cirros-0.5.1-x86_64-disk.img
openstack image create --public --container-format bare \
   --disk-format qcow2 --file cirros-0.5.1-x86_64-disk.img cirros
openstack image list
+--------------------------------------+--------+--------+
| ID                                   | Name   | Status |
+--------------------------------------+--------+--------+
| 22958ef4-05f1-4a9f-b1e9-1ca9eb1f5ebf | cirros | active |
+--------------------------------------+--------+--------+Now we can follow another my guide: OpenStack AIO in Azure. We need some network:
openstack network list
# Hmm, empty output...We have to follow (sort of) guide from:
First we must know data network type using this command:
$ system datanetwork-list  
+--------------------------------------+----------+--------------+------+
| uuid                                 | name     | network_type | mtu  |
+--------------------------------------+----------+--------------+------+
| a4b7cc2e-68fe-47ad-b0ff-67dd147d85b0 | physnet0 | vlan         | 1500 |
| 2bba400c-4a2d-4111-8247-4db251b6ad31 | physnet1 | vlan         | 1500 |
+--------------------------------------+----------+--------------+------+So we know which snippet from above wiki to use - this one:
Create script setup_os_net.sh with contents
#!/bin/bash
set -euo pipefail
set -x
ADMINID=$(openstack project show -f value -c id admin)
[[ $ADMINID =~ ^[a-f0-9]{32}$ ]] || {
	echo "Unable to get ID for project 'admin'" >&2
	exit 1
}
PHYSNET0='physnet0'
PHYSNET1='physnet1'
PUBLICNET0='public-net0'
PUBLICNET1='public-net1'
PUBLICSUBNET0='public-subnet0'
PUBLICSUBNET1='public-subnet1'
openstack network segment range create ${PHYSNET0}-a \
  --network-type vlan --physical-network ${PHYSNET0} \
  --minimum 400 --maximum 499 --private --project ${ADMINID}
openstack network segment range create ${PHYSNET1}-a \
  --network-type vlan --physical-network ${PHYSNET1} \
  --minimum 500 --maximum 599 --private --project ${ADMINID}
openstack network create --project ${ADMINID} \
  --provider-network-type=vlan --provider-physical-network=${PHYSNET0} \
  --provider-segment=400 ${PUBLICNET0}
openstack network create --project ${ADMINID} \
  --provider-network-type=vlan --provider-physical-network=${PHYSNET1} \
  --provider-segment=500 ${PUBLICNET1}
openstack subnet create --project ${ADMINID} ${PUBLICSUBNET0} \
  --network ${PUBLICNET0} --subnet-range 192.168.101.0/24
openstack subnet create --project ${ADMINID} ${PUBLICSUBNET1} \
  --network ${PUBLICNET1} --subnet-range 192.168.102.0/24
exit 0And run it that way
source ~/openrc.os
chmod +x ~/setup_os_net.sh
~/setup_os_net.shNow list available networks using:
$ openstack network list
+--------------------------------------+-------------+--------------------------------------+
| ID                                   | Name        | Subnets                              |
+--------------------------------------+-------------+--------------------------------------+
| c9160a9d-4f4f-424f-b205-f17e2fbfadc6 | public-net0 | 46a911af-5a12-428e-82bc-10b26a344a81 |
| fa016989-93ac-41f6-9d70-dd3c690a433f | public-net1 | 6f8eeffd-78c4-4501-977e-7b8d23a96521 |
+--------------------------------------+-------------+--------------------------------------+And finally try to run VM:
openstack server create --flavor m1.tiny --image cirros \
    --nic net-id=c9160a9d-4f4f-424f-b205-f17e2fbfadc6 \
    test-cirrosHmm, nearly done:
Unknown Error (HTTP 504)
But it seems that it just needs a bit of time:
$ openstack server list
+--------------------------------------+-------------+--------+----------+--------+---------+
| ID                                   | Name        | Status | Networks | Image  | Flavor  |
+--------------------------------------+-------------+--------+----------+--------+---------+
| 5490d592-f594-496c-9573-6b0922041f29 | test-cirros | BUILD  |          | cirros | m1.tiny |
+--------------------------------------+-------------+--------+----------+--------+---------+Few minutes later
$ openstack server list
+--------------------------------------+-------------+--------+----------------------------+--------+---------+
| ID                                   | Name        | Status | Networks                   | Image  | Flavor  |
+--------------------------------------+-------------+--------+----------------------------+--------+---------+
| 5490d592-f594-496c-9573-6b0922041f29 | test-cirros | ACTIVE | public-net0=192.168.101.70 | cirros | m1.tiny |
+--------------------------------------+-------------+--------+----------------------------+--------+---------+
# ACTIVE - umm in my case the system was critically overloaded...
# if you are lucky
openstack console log show test-cirros
openstack console url show test-cirros
WARNING! That VLAN network is not much usefull (no DHCP etc...) Not sure if FLAT is still supported - was in older docs:
To shutdown:
- login as 
sysadminto Libvirt VM and runsudo init 0
 - now 
StopAzure VM from portal. Remember that if you just shutdown Azure VM internally (usingsudo init 0it will NOT stop billing! 
Power Up:
- start Azure VM 
openstack-stx - login to Azure VM
 - run
cd ~/stx-tools/deployment/libvirt ./setup_network.sh
 - if your user is member of 
libvirtyou can omitsudoin these two commands below: - run VM 
sudo virsh start simplex-controller-0 - in my case got error:
error: Failed to start domain simplex-controller-0 error: Cannot access storage file '/mnt/bootimage.iso': No such file or directory- In such case - find offending device:
$ virsh domblklist simplex-controller-0 Target Source -------------------------------------------------------------- sda /var/lib/libvirt/images/simplex-controller-0-0.img sdb /var/lib/libvirt/images/simplex-controller-0-1.img sdc /var/lib/libvirt/images/simplex-controller-0-2.img sdd /mnt/bootimage.iso $ virsh change-media simplex-controller-0 /mnt/bootimage.iso --eject Successfully ejected media. $ virsh domblklist simplex-controller-0 Target Source 
sda /var/lib/libvirt/images/simplex-controller-0-0.img sdb /var/lib/libvirt/images/simplex-controller-0-1.img sdc /var/lib/libvirt/images/simplex-controller-0-2.img sdd -- problem fixed, start this VM again - In such case - find offending device:
 - run console to see progress 
sudo virsh console simplex-controller-0 - once simplex-controller-0
shows login prompt you can login on serial console or viassh [email protected]` (I prefer this, because serial console has some quirks) - verify that all k8s applications are 
X/X RunningorCompletedkubectl get pods -A
 - important - wait until Ceph cluster is in 
HEALTH_OKstate using:ceph -s
 
There is an official guide how to rebuild StarlingX by yourself:
- https://docs.starlingx.io/developer_resources/build_guide.html I did not try it yet (at least 32GB RAM and 500GB disk required), but there are few noteworthy things:
 - 
https://opendev.org/starlingx/manifest/src/branch/master/default.xml list of git repositories
- processed by Android's 
repotool (see below) 
 - processed by Android's 
 - https://opendev.org/starlingx/tools (you already know this project)
 
Interesting is Dockerfile:
- https://opendev.org/starlingx/tools/src/branch/master/Dockerfile You can see there
 - how hard it is today to pin to exact CentOS 7 and EPEL 7 versions.
 - how to get go version of Android repo tool (to fetch repos from 
/manifest/default.xmlcurl https://storage.googleapis.com/git-repo-downloads/repo > /usr/local/bin/repo && \ chmod a+x /usr/local/bin/repo
 
- 
STX- StarlingX - 
DC- Distributed Cloud - 
WR- WindRiver (key contributor to StarlingX project) - 
CGCS- original project nameWind River Carrier Grade Communications Server, renamed to StarlingX, see https://opendev.org/starlingx/config/commit/d9f2aea0fb228ed69eb9c9262e29041eedabc15d You can seecgcs-Xas prefix in various RPM packacges - 
CGTS- original project nameWind River Carrier Grade Telecom (or Titanium???) Server (???), renamed to StarlingX You can seecgts-Xin prefix in various RPM packacges - 
TIS- original project nameWind River Titanium Server platform, renamed to StarlingX, see https://opendev.org/starlingx/config/commit/d9f2aea0fb228ed69eb9c9262e29041eedabc15d You can still seetisas Vendor tag in StarlingX RPM packages. - 
sysinv- System Inventory - accessed with famoussystemcommand in StarlingX 
Alternative way to quickly install StarlingX in nested VM by Erich Cordoba (I did not try it yet):
- https://ericho.github.io/2019-09-12-deploy-virtual-starlingx/
 - https://opendev.org/starlingx/test/src/branch/master/automated-robot-suite/README.rst
 
StarlingX guides hard to find (for old StarlingX but still useful):
- https://wiki.openstack.org/wiki/StarlingX
 - https://kontron.com/products/cg2400/starlingx-cg2400-installation-guide-r1.0.pdf
 - Important network setup examples
 - How to setup cirros image and basic network:
 - Training materials
 
Azure VMs that support nested virtualization:
This work is licensed under a