OpenStack StarlingX - hpaluch/hpaluch.github.io GitHub Wiki
StarlingX is packaged Kubernetes and (optionally) OpenStack with few additions (Ceph FS, multiple Helm application Manager - AirshipArmada or in future FluxCD) for so called "Edge Cloud".
To quote https://docs.starlingx.io/
StarlingX is a fully integrated edge cloud software stack that provides everything needed to deploy an edge cloud on one, two, or up to 100 servers.
Edge means that every Location (for example company branch office) has 1 or few local servers that has installed fully autonomous OpenStack (and few other components) for reliable and low latency computing (meaning: be independent of Internet connection to external Cloud).
- one important feature is, for example, that all required repositories (YUM for OS and Docker Registry) are locally mirrored. So you should be able to reinstall your applications even when Internet connection is broken.
- another important feature is
sysinv
(System Inventory) - all resources (including CPus, Memory, Disk Partitions LVM (LVs, VGs, PVs), network interfaces, Ceph,...) are managed usingsystem
command (which is just client to API server) and it enforces known system state using Puppet provisioning tool.
Many parts of StarlingX are developed and supported by WindRiver. Please see https://www.windriver.com/studio/operator/starlingx for more information.
Project homepage is on:
Please note that such Production server(s) should have at least 32GB RAM and 500GB SSD disk space as can be found on:
However for test we may try nested VM using Libvirt.
We will use nested VM using Libvirt. There is official installation guide on: https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex.html that we will mostly follow.
I will use Ubuntu 20.04 LTS VM in Azure as "Host PC".
Our Azure VM must meet 3 requirements:
- must support nested virtualization: see https://azure.microsoft.com/en-us/blog/introducing-the-new-dv3-and-ev3-vm-sizes/ for list os supported VMs
- must have at least around ~20GB of RAM (nested VM requires around 18 GB of RAM)
- must have at least 8 cores (nested VM requires 6 cores)
- I have selected
Standard_D8s_v3
- WARNING! As of 2022-07-16 such VM costs around $300/month (!)
- I strongly recommend to monitor your spending and add Auto Shutdown feature (omitted in script!)
You have to update at least these variables before running the script below:
-
subnet=xxxxx
to point to your Subnet in your Virtual Net (vNet) -
ssh_key_path=
pwd/hp_vm2.pub
to point to your SSH public key that you will use to connect to VM
Here is my create_vm_ubuntu_for_stx.sh
script to setup VM openstack-stx
.
Run it in Azure Bash
in Azure portal:
with public IP:
#!/bin/bash
set -ue -o pipefail
# Your SubNet ID
subnet=/subscriptions/xxx/resourceGroups/VpnGatewayRG101/providers/Microsoft.Network/virtualNetworks/VNet101/subnets/FrontEnd
ssh_key_path=`pwd`/hp_vm2.pub
rg=OsStxRG
loc=germanywestcentral
vm=openstack-stx
IP=$vm-ip
opts="-o table"
# URN from command:
# az vm image list --all -l germanywestcentral -f 0001-com-ubuntu-server-focal -p canonical -s 20_04-lts-gen2 -o table
image=Canonical:0001-com-ubuntu-server-focal:20_04-lts-gen2:latest
set -x
az group create -l $loc -n $rg $opts
az network public-ip create -g $rg -l $loc --name $IP --sku Basic $opts
az vm create -g $rg -l $loc \
--image $image \
--nsg-rule NONE \
--subnet $subnet \
--public-ip-address "$IP" \
--storage-sku Premium_LRS \
--size Standard_D8s_v3 \
--os-disk-size-gb 128 \
--ssh-key-values $ssh_key_path \
--admin-username azureuser \
-n $vm $opts
set +x
cat <<EOF
You may access this VM in 2 ways:
1. using Azure VPN Gateway
2. Using Public IP - in such case you need to add appropriate
SSH allow in rule to NSG rules of this created VM
EOF
exit 0
Follow above instructions and login to above VM to continue.
Note: in text below I will call:
-
Host
- parent Azure VM (openstack-stx
) -
Libvirt VM
- nested VM running StarlingX controller, with libvirt machine name (called domain - from Xen times)simplex-controller-0
. Once this nested VM is insatalled it wil have hostnamecontroller-0
Verify that your Azure VM supports nested virtualization:
$ ls -l /dev/kvm
crw-rw---- 1 root kvm 10, 232 Jul 16 06:42 /dev/kvm
If above device does not exist you need to use different type
(called Size
in Azure) of VM.
NOTE!
Just recently found great article
Deploy a virtual StarlingX Simplex node
on
- https://ericho.github.io/2019-09-12-deploy-virtual-starlingx/
- https://opendev.org/starlingx/test/src/branch/master/automated-robot-suite/README.rst where author use this StarlingX test project:
git clone https://opendev.org/starlingx/test.git cd automated-robot-suite
It seems to be even even more comfortable (just specifying which python test suite to run).
However I did not try it yet.
Inside VM prepare system:
sudo apt-get update
sudo apt-get dist-upgrade
# reboot recommended if kernel or critical system components (libc)
# were updated.
Please ensure that your shell is Bash:
echo $SHELL
/bin/bash
Now we will get source and install required packages:
sudo apt-get install git
cd
git clone https://opendev.org/starlingx/tools.git stx-tools
cd stx-tools/
git describe --always --long
# my version is: vr/stx.6.0-404-g591be74
cd deployment/libvirt/
sudo ./install_packages.sh
Package libvirt-bin is not available, but is referred to by another package.
Ubuntu 20.04 LTS no longer contains above package so we have to fix it manually:
sudo apt-get install virt-manager
Optional: if you want to run virsh
command without
sudo
you can add yourself to libvirt
group using:
sudo /usr/sbin/usermod -G libvirt -a $USER
Logout and login to Host (Azure VM) so this change take in effect.
Run again:
sudo ./install_packages.sh
Now you can safely ignore all libvirt-bin
related errors. Manually restart
right service:
sudo systemctl restart libvirtd
And again follow guide:
sudo apt install -y apparmor-profiles
sudo apt-get install -y ufw
sudo ufw disable
sudo ufw status
# should output:
# Status: inactive
Now run setup_network.sh
:
./setup_network.sh
Verify that this script really setup network:
-
it should created 4 bridges:
$ ip -br l | fgrep stxbr stxbr1 UNKNOWN 76:7f:f1:6e:f0:37 <BROADCAST,MULTICAST,UP,LOWER_UP> stxbr2 UNKNOWN e6:f2:94:07:73:9e <BROADCAST,MULTICAST,UP,LOWER_UP> stxbr3 UNKNOWN e2:fa:74:8c:ed:95 <BROADCAST,MULTICAST,UP,LOWER_UP> stxbr4 UNKNOWN 7a:cc:b7:d0:aa:87 <BROADCAST,MULTICAST,UP,LOWER_UP>
-
first bridge
stxbr1
should have assigned hardcoded IP address:$ ip -br -4 a | fgrep stxbr stxbr1 UNKNOWN 10.10.10.1/24
-
there should be created this NAT rule that allows Internet access from above
stxbr1
:$ sudo /sbin/iptables -t nat -L POSTROUTING Chain POSTROUTING (policy ACCEPT) target prot opt source destination LIBVIRT_PRT all -- anywhere anywhere MASQUERADE all -- 10.10.10.0/24 anywhere
-
last rule IP
10.10.10.0/24
allows Internet access fromstxbr1
If all above requirements are met you can continue:
- we have to follow guide and download ISO
- in your browser go to: http://mirror.starlingx.cengn.ca/mirror/starlingx/
- and look for latest ISO
- in my case this link is right:
So back In VM download above ISO using:
cd
curl -fLO http://mirror.starlingx.cengn.ca/mirror/starlingx/release/6.0.0/centos/flock/outputs/iso/bootimage.iso
# optional for Azure - copy ISO to SSD for better speed:
sudo cp ~/bootimage.iso /mnt
Before creating VM I recommend to up memory and CPUs. In my case I made these changes
diff --git a/deployment/libvirt/controller_allinone.xml b/deployment/libvirt/controller_allinone.xml
index 6f7272e..ec209a1 100644
--- a/deployment/libvirt/controller_allinone.xml
+++ b/deployment/libvirt/controller_allinone.xml
@@ -1,8 +1,8 @@
<domain type='kvm' id='164'>
<name>NAME</name>
- <memory unit='GiB'>18</memory>
- <currentMemory unit='GiB'>18</currentMemory>
- <vcpu placement='static'>6</vcpu>
+ <memory unit='GiB'>26</memory>
+ <currentMemory unit='GiB'>26</currentMemory>
+ <vcpu placement='static'>7</vcpu>
<resource>
<partition>/machine</partition>
</resource>
@@ -16,7 +16,7 @@
</features>
<cpu match='exact'>
<model fallback='forbid'>Nehalem</model>
- <topology sockets='1' cores='6' threads='1'/>
+ <topology sockets='1' cores='7' threads='1'/>
<feature policy='optional' name='vmx'/>
(My Azure VM has 32GB RAM and 8 vCPUs - so it should be safe).
Now create and start VM controller-0
using:
cd ~/stx-tools/deployment/libvirt/
./setup_configuration.sh -c simplex -i /mnt/bootimage.iso
You can safely ignore cannot open display:
message.
- Now connect to serial console using:
$ sudo virsh console simplex-controller-0
- Do NOT press ENTER yet!!! - Because it would select Wrong type of installation. If you already pressed ENTER accidentaly - press ESC to return to main menu.
- to redraw menu press
Ctrl
-L
- select
All-in-one Controller Configuration
->Serial Console
- now there will proceed complete KickStart (Anaconda installation)
- in my case it installs
1206 packages
- NOTE: you can any time disconnect from serial console using
Ctrl
-]
- and later reconnect with same
virsh console
command (sometimes--force
is needed if your connection was canceled abruptly...
After this nested VM simplex-controller-0
reboots, we can follow
- https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex_install_kubernetes.html#id2
- there will be several errors, because there no network configured yet
- login as sysadmin/sysadmin
- you will be forced to change pasword. Unfortunately there are many strict rules (including dictionary check) that you must adher to change password succesfully...
Now we have to temporarily config network in this libvirt
VM. As sysadmin
look what network interface to use:
$ ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth1000 UP 52:54:00:5a:eb:79 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1001 UP 52:54:00:5b:68:3a <BROADCAST,MULTICAST,UP,LOWER_UP>
enp2s1 UP 52:54:00:28:24:a3 <BROADCAST,MULTICAST,UP,LOWER_UP>
enp2s2 UP 52:54:00:e1:c9:34 <BROADCAST,MULTICAST,UP,LOWER_UP>
In our case the right device is enp2s1
. If you are not sure you can
dump network interface assignment
# run on Host
$ sudo virsh domiflist simplex-controller-0
Interface Type Source Model MAC
-----------------------------------------------------------
vnet0 bridge stxbr1 e1000 52:54:00:28:24:a3
vnet1 bridge stxbr2 e1000 52:54:00:e1:c9:34
vnet2 bridge stxbr3 virtio 52:54:00:5a:eb:79
vnet3 bridge stxbr4 virtio 52:54:00:5b:68:3a
- So
stxbr1
has MAC52:54:00:28:24:a3
- looking inside VM:
ip -br l | fgrep 52:54:00:28:24:a3 enp2s1 UP 52:54:00:28:24:a3 <BROADCAST,MULTICAST,UP,LOWER_UP>
- NOTE: these mac addresses are hardoced in setup scripts so they should be same
- now inside Libvirt VM create network setup script
net_setup.sh
with contents:MY_DEV=enp2s1 export CONTROLLER0_OAM_CIDR=10.10.10.3/24 export DEFAULT_OAM_GATEWAY=10.10.10.1 sudo ip address add $CONTROLLER0_OAM_CIDR dev $MY_DEV sudo ip link set up dev $MY_DEV sudo ip route add default via $DEFAULT_OAM_GATEWAY dev $MY_DEV
- and SOURCE it:
. ./net_setup.sh # can be asked by sudo for password # - enter sysadmin's password to proceed
- if network is correctly setup than Internet access must work
(however without DNS because there is empty
/etc/resolv.conf
. - so try in Libvirt VM
host www.cnn.com 8.8.8.8 # should return addresses...
- NOTE: Default Ansible configuraion contains all necessary information.
You can just copy it to HOME for reference using:
cp /usr/share/ansible/stx-ansible/playbooks/host_vars/bootstrap/default.yml \ ~/
- now cross your fingers and run (we run playbook using sudo
because it sometimes wants password in the middle of
installation and breaks ansible):
sudo ansible-playbook /usr/share/ansible/stx-ansible/playbooks/bootstrap.yml
- if you are bored while ansible is running you can connect from your Host (Azure VM)
to this libvirt VM using command:
ssh [email protected] # WARNING! After another reboot shi address will change # to 10.10.10.2 !!!
- peek into
/etc/os-release
:PRETTY_NAME="CentOS Linux 7 (Core)"
- find more details about magic
system
command (withoutd
suffix):$ rpm -qf /usr/bin/system cgts-client-1.0-276.tis.x86_64 $ rpm -qi cgts-client | egrep '^(Summary|Packager)' Packager : Wind River <[email protected]> Summary : System Client and CLI
- peek into
Ansible installation should end with messages like:
bootstrap/bringup-bootstrap-applications : Check if application already exists -- 16.76s
common/armada-helm : Launch Armada with Helm v3 ------------------------ 16.11s
bootstrap/bringup-bootstrap-applications : Upload application ---------- 14.14s
After ansible finished we must configure OAM (Operations, Administration and Management) network (network, where all StarlingX APIs and services are exposed) to enable controller at all:
- https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex_install_kubernetes.html#id1
- scroll down to section
Configure controller-0
on above web page - enter these commands in Libvirt VM (user
sysadmin
): - find OAM Interface:
$ ip -br -4 a | fgrep 10.10.10. enp2s1 UP 10.10.10.3/24
- configure
enp2s1
as OAM Interface$ source /etc/platform/openrc $ OAM_IF=enp2s1 $ system host-list +----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | locked | disabled | online | +----+--------------+-------------+----------------+-------------+--------------+ $ system host-if-list 1 +--------------------------------------+------+----------+---------+------+-------+------+------+------------+ | uuid | name | class | type | vlan | ports | uses | used | attributes | | | | | | id | | i/f | by | | | | | | | | | | i/f | | +--------------------------------------+------+----------+---------+------+-------+------+------+------------+ | 8d66dac8-ba2c-499d-98aa-c76ca540af5c | lo | platform | virtual | None | [] | [] | [] | MTU=1500 | +--------------------------------------+------+----------+---------+------+-------+------+------+------------+ # we have to replace loopback with our Admin interface (OAM): $ system host-if-modify controller-0 $OAM_IF -c platform $ system interface-network-assign controller-0 $OAM_IF oam # verify setting: $ system host-if-list 1 +--------------------------------------+--------+----------+----------+---------+-------------+----------+-------------+------------+ | uuid | name | class | type | vlan id | ports | uses i/f | used by i/f | attributes | +--------------------------------------+--------+----------+----------+---------+-------------+----------+-------------+------------+ | 43adf65d-1579-4770-afb1-923f095be6a2 | enp2s1 | platform | ethernet | None | [u'enp2s1'] | [] | [] | MTU=1500 | | 8d66dac8-ba2c-499d-98aa-c76ca540af5c | lo | platform | virtual | None | [] | [] | [] | MTU=1500 | +--------------------------------------+--------+----------+----------+---------+-------------+----------+-------------+------------+
Unfortunately we need still lot of thing to configure. Try again this command:
$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | locked | disabled | online |
+----+--------------+-------------+----------------+-------------+--------------+
Notice locked
and disabled
. It means that our controller is not yet able to service.
Many Kubernetes applicatins require persistent storage. It is called:
- PV - Persistent Volume - configured by administrator
- PVC - Persistent Volume Claim - ks8 application request presistent storaget using these request
So we must configure Ceph
- https://docs.starlingx.io/deploy_install_guides/r6_release/virtual/aio_simplex_install_kubernetes.html#configure-controller-0
- chapter
Optionally, initialize a Ceph-based Persistent Storage Backend
There are two options
- Host base Ceph
- K8s based Ceph (Rook)
I always prefer Host over Containers so try it:
$ system storage-backend-list
Empty output...
$ system storage-backend-add ceph --confirmed
ystem configuration has changed.
Please follow the administrator guide to complete configuring the system.
+--------------------------------------+------------+---------+------------+------+----------+----------------+
| uuid | name | backend | state | task | services | capabilities |
+--------------------------------------+------------+---------+------------+------+----------+----------------+
| 960f7afc-c309-47d6-bc0f-78afe530c5b1 | ceph-store | ceph | configured | None | None | min_replicatio |
| | | | | | | n: 1 |
| | | | | | | replication: 1 |
| | | | | | | |
+--------------------------------------+------------+---------+------------+------+----------+----------------+
$ system host-disk-list 1
+--------------------------------------+-------------+------------+-------------+----------+---------------+-...
| uuid | device_node | device_num | device_type | size_gib | available_gib | ...
+--------------------------------------+-------------+------------+-------------+----------+---------------+-...
| a205e8e0-c5aa-41f8-92bb-daec6381fab1 | /dev/sda | 2048 | HDD | 600.0 | 371.679 | ...
| 4880a1d2-e629-4513-baf2-399dfb064410 | /dev/sdb | 2064 | HDD | 200.0 | 199.997 | ...
| 1fdeb451-6d6f-4950-9d99-d8215c10ed47 | /dev/sdc | 2080 | HDD | 200.0 | 199.997 | ...
+--------------------------------------+-------------+------------+-------------+----------+---------------+ ...
# note UUID for /dev/sdb:
# 4880a1d2-e629-4513-baf2-399dfb064410
$ system host-stor-add 1 4880a1d2-e629-4513-baf2-399dfb064410
$ system host-stor-list 1
+--------------------------------------+----------+-------+-----------------------+-...
| uuid | function | osdid | state | ...
+--------------------------------------+----------+-------+-----------------------+-...
| 292ae029-f652-4e6c-b046-18daacd80a76 | osd | 0 | configuring-on-unlock | ...
+--------------------------------------+----------+-------+-----------------------+-...
Notice configuring-on-unlock
in column state
We want to use openstack so we have to follow For OpenStack only:
section in guide.
Create script setup_os.sh
with contents:
#!/bin/bash
set -xeuo pipefail
DATA0IF=eth1000
DATA1IF=eth1001
export NODE=controller-0
PHYSNET0='physnet0'
PHYSNET1='physnet1'
SPL=/tmp/tmp-system-port-list
SPIL=/tmp/tmp-system-host-if-list
system host-port-list ${NODE} --nowrap > ${SPL}
system host-if-list -a ${NODE} --nowrap > ${SPIL}
DATA0PCIADDR=$(cat $SPL | grep $DATA0IF |awk '{print $8}')
DATA1PCIADDR=$(cat $SPL | grep $DATA1IF |awk '{print $8}')
DATA0PORTUUID=$(cat $SPL | grep ${DATA0PCIADDR} | awk '{print $2}')
DATA1PORTUUID=$(cat $SPL | grep ${DATA1PCIADDR} | awk '{print $2}')
DATA0PORTNAME=$(cat $SPL | grep ${DATA0PCIADDR} | awk '{print $4}')
DATA1PORTNAME=$(cat $SPL | grep ${DATA1PCIADDR} | awk '{print $4}')
DATA0IFUUID=$(cat $SPIL | awk -v DATA0PORTNAME=$DATA0PORTNAME '($12 ~ DATA0PORTNAME) {print $2}')
DATA1IFUUID=$(cat $SPIL | awk -v DATA1PORTNAME=$DATA1PORTNAME '($12 ~ DATA1PORTNAME) {print $2}')
system datanetwork-add ${PHYSNET0} vlan
system datanetwork-add ${PHYSNET1} vlan
system host-if-modify -m 1500 -n data0 -c data ${NODE} ${DATA0IFUUID}
system host-if-modify -m 1500 -n data1 -c data ${NODE} ${DATA1IFUUID}
system interface-datanetwork-assign ${NODE} ${DATA0IFUUID} ${PHYSNET0}
system interface-datanetwork-assign ${NODE} ${DATA1IFUUID} ${PHYSNET1}
exit 0
And run it.
Now we have to follow OpenStack-specific host configuration
:
system host-label-assign controller-0 openstack-control-plane=enabled
system host-label-assign controller-0 openstack-compute-node=enabled
system host-label-assign controller-0 openvswitch=enabled
Now we have to folow For OpenStack Only: Set up disk partition for nova-local volume group, which is needed for stx-openstack nova ephemeral disks.
:
Create script setup_os_storage.sh
#!/bin/bash
set -xeuo pipefail
export NODE=controller-0
echo ">>> Getting root disk info"
ROOT_DISK=$(system host-show ${NODE} | grep rootfs | awk '{print $4}')
ROOT_DISK_UUID=$(system host-disk-list ${NODE} --nowrap | grep ${ROOT_DISK} | awk '{print $2}')
echo "Root disk: $ROOT_DISK, UUID: $ROOT_DISK_UUID"
echo ">>>> Configuring nova-local"
NOVA_SIZE=34
NOVA_PARTITION=$(system host-disk-partition-add -t lvm_phys_vol ${NODE} ${ROOT_DISK_UUID} ${NOVA_SIZE})
NOVA_PARTITION_UUID=$(echo ${NOVA_PARTITION} | grep -ow "| uuid | [a-z0-9\-]* |" | awk '{print $4}')
system host-lvg-add ${NODE} nova-local
sleep 60
system host-pv-add ${NODE} nova-local ${NOVA_PARTITION_UUID}
sleep 60
exit 0
And run it.
Now moment of truth - Unlocking controller - this time we shall see if all components really work:
$ system host-unlock controller-0
# WARNING! Restart will follow....
NOTE: In my case network startup took around 3 minutes. Don't know why...
After reboot the main IP of Libvirt Vm changed, you now have to use:
To connect to Libvirt VM.
After reboot, login as sysadmin and verify host status:
$ source /etc/platform/openrc
$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
It must be in state unlocked
and enabled
and available
Also try:
$ system application-list | sed -r 's/(.{75}).*/\1.../'
+--------------------------+---------+-----------------------------------+-...
| application | version | manifest name | ...
+--------------------------+---------+-----------------------------------+-...
| cert-manager | 1.0-26 | cert-manager-manifest | ...
| nginx-ingress-controller | 1.1-18 | nginx-ingress-controller-manifest | ...
| oidc-auth-apps | 1.0-61 | oidc-auth-manifest | ...
| platform-integ-apps | 1.0-44 | platform-integration-manifest | ...
| rook-ceph-apps | 1.0-14 | rook-ceph-manifest | ...
+--------------------------+---------+-----------------------------------+-...
$ ceph -s
cluster:
id: d993e564-bd99-4a4d-946c-a3aa090da4f9
health: HEALTH_OK
services:
mon: 1 daemons, quorum controller-0 (age 23m)
mgr: controller-0(active, since 21m)
mds: kube-cephfs:1 {0=controller-0=up:active}
osd: 1 osds: 1 up (since 20m), 1 in (since 20m)
data:
pools: 3 pools, 192 pgs
objects: 22 objects, 2.2 KiB
usage: 107 MiB used, 199 GiB / 199 GiB avail
pgs: 192 active+clean
$ kubectl get ns
NAME STATUS AGE
armada Active 108m
cert-manager Active 102m
default Active 109m
deployment Active 102m
kube-node-lease Active 109m
kube-public Active 109m
kube-system Active 109m
At least when using Host based Ceph (my case) there was observerd improper shutdown - where kernel RBD client reported lost-write to Ceph (which was already shut).
TODO:
Even after lot of work we have not installed OpenStack yet(!). We have to follow this guide
- https://docs.starlingx.io/deploy_install_guides/r6_release/openstack/install.html
- login to Libvirt VM as
sysadmin
First ve have to increase LV for Docker from 30GB to 60GB.
- verify current assignments:
$ system host-fs-list 1 | sed -r 's/.{39}//' +---------+-------------+----------------+ | FS Name | Size in GiB | Logical Volume | +---------+-------------+----------------+ | backup | 25 | backup-lv | | docker | 30 | docker-lv | | kubelet | 10 | kubelet-lv | | scratch | 16 | scratch-lv | +---------+-------------+----------------+
- theoretically it should be easy:
$ system host-fs-modify controller-0 docker=60 HostFs update failed: Not enough free space on cgts-vg. Current free space 16 GiB, requested total increase 30 GiB
- it can be confirmed with this command:
sudo vgs VG #PV #LV #SN Attr VSize VFree cgts-vg 1 12 0 wz--n- <178.97g <16.16g nova-local 1 1 0 wz--n- <34.00g 0
Now we clone script that created VG nova-local and reuse it
to create new disk partition and add it to VG cgts-vg
Create script resize_os_docker.sh
with these contents:
!/bin/bash
set -xeuo pipefail
export NODE=controller-0
echo ">>> Getting root disk info"
ROOT_DISK=$(system host-show ${NODE} | grep rootfs | awk '{print $4}')
ROOT_DISK_UUID=$(system host-disk-list ${NODE} --nowrap | grep ${ROOT_DISK} | awk '{print $2}')
echo "Root disk: $ROOT_DISK, UUID: $ROOT_DISK_UUID"
echo ">>>> Extending VG cgts-vg +50GB"
NOVA_SIZE=50
NOVA_PARTITION=$(system host-disk-partition-add -t lvm_phys_vol ${NODE} ${ROOT_DISK_UUID} ${NOVA_SIZE})
NOVA_PARTITION_UUID=$(echo ${NOVA_PARTITION} | grep -ow "| uuid | [a-z0-9\-]* |" | awk '{print $4}')
sleep 60 # it takes time before PV is created
system host-pv-add ${NODE} cgts-vg ${NOVA_PARTITION_UUID}
sleep 60 # it takes time before PV is added to VG !!!
exit 0
And run it. Unfortunately there are few timing races - so sometimes it is needed to recreate it manually. If above script was succesfull you can verify it with:
sudo pvs
PV VG Fmt Attr PSize PFree
/dev/sda5 cgts-vg lvm2 a-- <178.97g <16.16g
/dev/sda6 nova-local lvm2 a-- <34.00g 0
/dev/sda8 cgts-vg lvm2 a-- <49.97g <49.97g # << NEW PV
NOTE: You will likely have /dev/sda7
as partition (I made some
experiments before runing it).
Aand finally VG should after a while see new free space:
sudo vgs
VG #PV #LV #SN Attr VSize VFree
cgts-vg 2 12 0 wz--n- <228.94g 66.12g
nova-local 1 1 0 wz--n- <34.00g 0
Notice VFree 66GB - should be enough for docker.
Now you can finally resume guide on
- https://docs.starlingx.io/deploy_install_guides/r6_release/openstack/install.html
- and run:
$ system host-fs-modify controller-0 docker=60 ...+---------+-------------+----------------+ ...| FS Name | Size in GiB | Logical Volume | ...+---------+-------------+----------------+ ...| backup | 25 | backup-lv | ...| docker | 60 | docker-lv | ...| kubelet | 10 | kubelet-lv | ...| scratch | 16 | scratch-lv | ...+---------+-------------+----------------+
Now we have to find latest OpenStack application
- go to http://mirror.starlingx.cengn.ca/mirror/starlingx/release/6.0.0/centos/flock/outputs/helm-charts/
- and download to your Libvirt vm (as sysadmin) suitable package,
for example:
[sysadmin@controller-0 ~(keystone_admin)]$ cd curl -fLO http://mirror.starlingx.cengn.ca/mirror/starlingx/release/6.0.0/centos/flock/outputs/helm-charts/stx-openstack-1.0-140-centos-stable-latest.tgz ls -l stx-openstack-1.0-140-centos-stable-latest.tgz -rw-r--r-- 1 sysadmin sys_protected 1804273 Jul 16 15:43 stx-openstack-1.0-140-centos-stable-latest.tgz
- if you are brave upload Application:
$ system application-upload stx-openstack-1.0-140-centos-stable-latest.tgz # now poll using command $ system application-show stx-openstack # it must report progress: completed, status: uploaded
- and install it:
system application-apply stx-openstack
- again poll with:
watch -n 60 system application-show stx-openstack
If you were lucky you can access OpenStack by following:
WARNING!
When I rebooted machine with installed OpenStack, I saw these grave messages on console
rbd: lost write It means that there were active RBD (Remote Block Device) - networked disk device connected to ceph.
I suspect that systemd terminates services in wrong order, thus risking data loss to any containers using PV from Ceph (!!!)
Kernel messages:
EXT4-fs error (device rbd0): __ext4_find_entry:1536: inode #2: comm start.py: reading directory lblock 0
libceph: connect (1)192.168.204.2:6789 error -101
libceph: connect (1)192.168.204.2:6789 error -101
To access OpenStack at all we have to create different context file, by following docs run:
sed '/export OS_AUTH_URL/c\export OS_AUTH_URL=http://keystone.openstack.svc.cluster.local/v3' /etc/platform/openrc > ~/openrc.os
Now you have to remember:
- to access OpenStack run:
source ~/openrc.os
- to access Starlingx (
system
command , etc.), runsource /etc/platform/openrc
These commands should work:
source ~/openrc.os
openstack flavor list
openstack image list
WARNING! In my case OpenStack is excssively hungry:
$ uptime
07:36:40 up 46 min, 2 users, load average: 18.09, 24.35, 19.93
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
15 0 0 3326296 242220 4103136 0 0 332 158 2115 1180 42 15 40 4 0
36 1 0 3295668 242304 4103532 0 0 72 668 11102 18076 52 12 32 4 0
34 1 0 3295984 242372 4103664 0 0 0 1320 9557 14930 54 8 36 2 0
46 0 0 3274756 242496 4103976 0 0 96 1032 13529 22478 58 13 28 0 0
You can see that 1st column (number of processes in Run Queue - processed that need CPU but has to wait for it) is sometimes up to 50...
It is also confirmed by alarms:
source /etc/platform/openrc
fm alarm-list
# if you want to see details run
fm alarm-list --uuid
# and pass uuid to: fm alarm-show
So I'm not sure - how can be that used for low latency edge computing...
Another problem:
But - after reboot I had no luck:
openstack flavor list
Failed to discover available identity versions when contacting http://keystone.openstack.svc.cluster.local/v3. Attempting to parse version from URL.
Service Unavailable (HTTP 503)
To see all openstack components in Kubernetes we can use this command
kubectl get pod -n openstack
There are 111 pods in my case(!).
To quickly find problematic pods we can try:
$ kubectl get pod -n openstack | fgrep Running | fgrep '0/'
mariadb-ingress-847cdb5dfb-4zgd6 0/1 Running 1 15h
mariadb-server-0 0/1 Running 2 15h
Running 0 out of 1 is definitely problem... Trying:
kubectl logs mariadb-server-0 -n openstack
# nothing suspicious
But:
kubectl describe pod/mariadb-server-0 -n openstack
# Hmm
Normal Killing 11m kubelet Container mariadb failed startup probe, will be restarted
Warning Unhealthy 8m46s (x12 over 20m) kubelet Startup probe failed:
To get terminal in container:
kubectl exec -it mariadb-server-0 -n openstack sh
# use exit to exit container
Also:
source /etc/platform/openrc
fm alarm-list
+-------+----------------------------+---------------+----------+-------------+
| Alarm | Reason Text | Entity ID | Severity | Time Stamp |
| ID | | | | |
+-------+----------------------------+---------------+----------+-------------+
| 270. | Host controller-0 compute | host= | critical | 2022-07-17T |
| 001 | services failure | controller-0. | | 07:26:01. |
| | | services= | | 637858 |
| | | compute | | |
| | | | | |
...
However you need fm alarm-list --uuid
to get
UUID for fm alarm-show
(ooohhhh).
NOTE: After around 30 minutes this error resolved....
So I can now see flavours:
$ openstack flavor list | sed -r 's/.{39}/.../;s/.{12}$/.../'
...+-----------+-------+------+-----------+-------+...
...| Name | RAM | Disk | Ephemeral | VCPUs |...
...+-----------+-------+------+-----------+-------+...
...| m1.small | 2048 | 20 | 0 | 1 |...
...| m1.large | 8192 | 80 | 0 | 4 |...
...| m1.medium | 4096 | 40 | 0 | 2 |...
...| m1.xlarge | 16384 | 160 | 0 | 8 |...
...| m1.tiny | 512 | 1 | 0 | 1 |...
...+-----------+-------+------+-----------+-------+...
Hmm:
openstack image list
# no image...
Fortunately I alredy wrote guide on OpenStack-from-Scratch. So try:
source ~/openrc.os
cd
curl -OLf http://download.cirros-cloud.net/0.5.1/cirros-0.5.1-x86_64-disk.img
openstack image create --public --container-format bare \
--disk-format qcow2 --file cirros-0.5.1-x86_64-disk.img cirros
openstack image list
+--------------------------------------+--------+--------+
| ID | Name | Status |
+--------------------------------------+--------+--------+
| 22958ef4-05f1-4a9f-b1e9-1ca9eb1f5ebf | cirros | active |
+--------------------------------------+--------+--------+
Now we can follow another my guide: OpenStack AIO in Azure. We need some network:
openstack network list
# Hmm, empty output...
We have to follow (sort of) guide from:
First we must know data network type using this command:
$ system datanetwork-list
+--------------------------------------+----------+--------------+------+
| uuid | name | network_type | mtu |
+--------------------------------------+----------+--------------+------+
| a4b7cc2e-68fe-47ad-b0ff-67dd147d85b0 | physnet0 | vlan | 1500 |
| 2bba400c-4a2d-4111-8247-4db251b6ad31 | physnet1 | vlan | 1500 |
+--------------------------------------+----------+--------------+------+
So we know which snippet from above wiki to use - this one:
Create script setup_os_net.sh
with contents
#!/bin/bash
set -euo pipefail
set -x
ADMINID=$(openstack project show -f value -c id admin)
[[ $ADMINID =~ ^[a-f0-9]{32}$ ]] || {
echo "Unable to get ID for project 'admin'" >&2
exit 1
}
PHYSNET0='physnet0'
PHYSNET1='physnet1'
PUBLICNET0='public-net0'
PUBLICNET1='public-net1'
PUBLICSUBNET0='public-subnet0'
PUBLICSUBNET1='public-subnet1'
openstack network segment range create ${PHYSNET0}-a \
--network-type vlan --physical-network ${PHYSNET0} \
--minimum 400 --maximum 499 --private --project ${ADMINID}
openstack network segment range create ${PHYSNET1}-a \
--network-type vlan --physical-network ${PHYSNET1} \
--minimum 500 --maximum 599 --private --project ${ADMINID}
openstack network create --project ${ADMINID} \
--provider-network-type=vlan --provider-physical-network=${PHYSNET0} \
--provider-segment=400 ${PUBLICNET0}
openstack network create --project ${ADMINID} \
--provider-network-type=vlan --provider-physical-network=${PHYSNET1} \
--provider-segment=500 ${PUBLICNET1}
openstack subnet create --project ${ADMINID} ${PUBLICSUBNET0} \
--network ${PUBLICNET0} --subnet-range 192.168.101.0/24
openstack subnet create --project ${ADMINID} ${PUBLICSUBNET1} \
--network ${PUBLICNET1} --subnet-range 192.168.102.0/24
exit 0
And run it that way
source ~/openrc.os
chmod +x ~/setup_os_net.sh
~/setup_os_net.sh
Now list available networks using:
$ openstack network list
+--------------------------------------+-------------+--------------------------------------+
| ID | Name | Subnets |
+--------------------------------------+-------------+--------------------------------------+
| c9160a9d-4f4f-424f-b205-f17e2fbfadc6 | public-net0 | 46a911af-5a12-428e-82bc-10b26a344a81 |
| fa016989-93ac-41f6-9d70-dd3c690a433f | public-net1 | 6f8eeffd-78c4-4501-977e-7b8d23a96521 |
+--------------------------------------+-------------+--------------------------------------+
And finally try to run VM:
openstack server create --flavor m1.tiny --image cirros \
--nic net-id=c9160a9d-4f4f-424f-b205-f17e2fbfadc6 \
test-cirros
Hmm, nearly done:
Unknown Error (HTTP 504)
But it seems that it just needs a bit of time:
$ openstack server list
+--------------------------------------+-------------+--------+----------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------------+--------+----------+--------+---------+
| 5490d592-f594-496c-9573-6b0922041f29 | test-cirros | BUILD | | cirros | m1.tiny |
+--------------------------------------+-------------+--------+----------+--------+---------+
Few minutes later
$ openstack server list
+--------------------------------------+-------------+--------+----------------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------------+--------+----------------------------+--------+---------+
| 5490d592-f594-496c-9573-6b0922041f29 | test-cirros | ACTIVE | public-net0=192.168.101.70 | cirros | m1.tiny |
+--------------------------------------+-------------+--------+----------------------------+--------+---------+
# ACTIVE - umm in my case the system was critically overloaded...
# if you are lucky
openstack console log show test-cirros
openstack console url show test-cirros
WARNING! That VLAN network is not much usefull (no DHCP etc...) Not sure if FLAT is still supported - was in older docs:
To shutdown:
- login as
sysadmin
to Libvirt VM and runsudo init 0
- now
Stop
Azure VM from portal. Remember that if you just shutdown Azure VM internally (usingsudo init 0
it will NOT stop billing!
Power Up:
- start Azure VM
openstack-stx
- login to Azure VM
- run
cd ~/stx-tools/deployment/libvirt ./setup_network.sh
- if your user is member of
libvirt
you can omitsudo
in these two commands below: - run VM
sudo virsh start simplex-controller-0
- in my case got error:
error: Failed to start domain simplex-controller-0 error: Cannot access storage file '/mnt/bootimage.iso': No such file or directory
- In such case - find offending device:
$ virsh domblklist simplex-controller-0 Target Source -------------------------------------------------------------- sda /var/lib/libvirt/images/simplex-controller-0-0.img sdb /var/lib/libvirt/images/simplex-controller-0-1.img sdc /var/lib/libvirt/images/simplex-controller-0-2.img sdd /mnt/bootimage.iso $ virsh change-media simplex-controller-0 /mnt/bootimage.iso --eject Successfully ejected media. $ virsh domblklist simplex-controller-0 Target Source
sda /var/lib/libvirt/images/simplex-controller-0-0.img sdb /var/lib/libvirt/images/simplex-controller-0-1.img sdc /var/lib/libvirt/images/simplex-controller-0-2.img sdd -- problem fixed, start this VM again
- In such case - find offending device:
- run console to see progress
sudo virsh console simplex-controller-0
- once simplex-controller-0
shows login prompt you can login on serial console or via
ssh [email protected]` (I prefer this, because serial console has some quirks) - verify that all k8s applications are
X/X Running
orCompleted
kubectl get pods -A
- important - wait until Ceph cluster is in
HEALTH_OK
state using:ceph -s
There is an official guide how to rebuild StarlingX by yourself:
- https://docs.starlingx.io/developer_resources/build_guide.html I did not try it yet (at least 32GB RAM and 500GB disk required), but there are few noteworthy things:
-
https://opendev.org/starlingx/manifest/src/branch/master/default.xml list of git repositories
- processed by Android's
repo
tool (see below)
- processed by Android's
- https://opendev.org/starlingx/tools (you already know this project)
Interesting is Dockerfile
:
- https://opendev.org/starlingx/tools/src/branch/master/Dockerfile You can see there
- how hard it is today to pin to exact CentOS 7 and EPEL 7 versions.
- how to get go version of Android repo tool (to fetch repos from
/manifest/default.xml
curl https://storage.googleapis.com/git-repo-downloads/repo > /usr/local/bin/repo && \ chmod a+x /usr/local/bin/repo
-
STX
- StarlingX -
DC
- Distributed Cloud -
WR
- WindRiver (key contributor to StarlingX project) -
CGCS
- original project nameWind River Carrier Grade Communications Server
, renamed to StarlingX, see https://opendev.org/starlingx/config/commit/d9f2aea0fb228ed69eb9c9262e29041eedabc15d You can seecgcs-X
as prefix in various RPM packacges -
CGTS
- original project nameWind River Carrier Grade Telecom (or Titanium???) Server (???)
, renamed to StarlingX You can seecgts-X
in prefix in various RPM packacges -
TIS
- original project nameWind River Titanium Server platform
, renamed to StarlingX, see https://opendev.org/starlingx/config/commit/d9f2aea0fb228ed69eb9c9262e29041eedabc15d You can still seetis
as Vendor tag in StarlingX RPM packages. -
sysinv
- System Inventory - accessed with famoussystem
command in StarlingX
Alternative way to quickly install StarlingX in nested VM by Erich Cordoba (I did not try it yet):
- https://ericho.github.io/2019-09-12-deploy-virtual-starlingx/
- https://opendev.org/starlingx/test/src/branch/master/automated-robot-suite/README.rst
StarlingX guides hard to find (for old StarlingX but still useful):
- https://wiki.openstack.org/wiki/StarlingX
- https://kontron.com/products/cg2400/starlingx-cg2400-installation-guide-r1.0.pdf
- Important network setup examples
- How to setup cirros image and basic network:
- Training materials
Azure VMs that support nested virtualization: