Hypershift Agent CI on PowerVS for RH upstream - hypershift-on-power/hack GitHub Wiki
Hypershift Agent CI on PowerVS for RH upstream:
Setting up a test workflow in RedHat upstream CI for Agent based hosted cluster on Power.
Introduction:
The requirement is to add a conformance test job on agent based hosted cluster with power worker nodes in redhat prow CI. Since redhat's CI framework can access machines only in public network decided to use IBM Cloud's PowerVS service to host the worker nodes for agent cluster. This document will try to cover the different aspects of setting up this workflow in redhat's CI framework.
Setup infra:
Set up infra involves setting up a way to boot the VSIs with the discovery ISO generated from agent cluster in IBM Cloud PowerVS service. Since it does not have an option to boot from ISO straight away need to use network based boot by hosting services required in a bastion node. Worker nodes will be created in same network where bastion is configured and will receive boot request from the bastion with the discovery ISO and boot from it to join the agent cluster.
https://github.com/hypershift-on-power/hack/wiki/Configure-PXE-boot-for-Hypershift-Agent-Power-CI this doc has detailed instructions on how to configure PXE boot on a bastion.
Configure CI:
Configure CI involves creating a workflow and create a periodic prow job and invoke the workflow created from the prow job.
Create workflow:
Creating workflow is the most essential step in this process. To know more about the CI concepts please read this.
We have created the workflow called hypershift-mce-power-conformance consists of various steps.
Idea is to create a management or hub cluster using aws ipi existing workflow and on top of it host the agent cluster.
Below is the content of the workflow written. Will try to give a simple explanation for each step.
pre:
- ref: ingress-aws-nlb-manifest
- chain: ipi-aws-pre
- ref: hypershift-mce-install
- ref: hypershift-mce-power-create
test:
- chain: hypershift-mce-power-test
post:
- chain: hypershift-mce-power-dump
- ref: hypershift-mce-power-destroy
- chain: ipi-aws-post
Pre steps:
Pre steps will be executed before test steps. If any of the pre steps gets failed, CI will directly execute the post steps.
- ingress-aws-nlb-manifest - This step will create configuration required to setup ingress with network load balancer in ipi management cluster. It allows downloading large files from MCE operator. Faced issues with normal application load balancer while downloading the discovery ISOs.
- ipi-aws-pre - This step will setup a ipi aws management cluster. We are going to host our agent cluster here.
- hypershift-mce-install - This step will install MCE version specified in
MCE_VERSIONenv var. - hypershift-mce-power-create - This step will setup the worker nodes and agent cluster. Will explain in detail about this workflow below.
Test steps:
- hypershift-mce-power-test - This step will do the actual conformance test on agent cluster created in pre steps. It will wait for all the cluster operator to available.
Post steps:
- hypershift-mce-power-dump - This step will run dump command on the agent cluster as well as on the management cluster. So we can debug something goes wrong.
- hypershift-mce-power-destroy - This step will destroy the resources created for agent cluster like worker nodes, DNS entries and clean up the bastion node.
- ipi-aws-post - This step will destroy the ipi aws management cluster.
Important Steps:
hypershift-mce-power-create:
This is the main step in this workflow where below steps will be executed in the order they listed on a high level.
- Worker nodes creation on IBM Cloud PowerVS
- Agent cluster creation
- Discovery iso generation
- Setting up PXE boot on bastion node with the iso generated
- Rebooting the nodes to boot it from the iso
- Approving them and scaling the node pool to add them into the cluster
hypershift-mce-power-destroy:
This step will clean the resources created externally for the agent cluster. Below steps will be executed in the order they listed on a high level.
- Worker nodes deletion
- Cleaning up PXE boot configuration on bastion node which created during create step Skipped the agent cluster deletion in management cluster since it does not hold any real world objects since nodes and DNS entries are created externally and used for this cluster.
Create prow job config:
Need to create prow job config inside config dir like this. Create a config named e2e-mce-power-conformance and refer it in prow job in next step. Use this config to target the workflow to run and declare prow job level env vars required.
Create prow job:
Need to declare prow job in jobs dir like this block
Links:
Redhat JIRA - https://issues.redhat.com/browse/MULTIARCH-3502
PR for the prow job and workflow in redhat CI - https://github.com/openshift/release/pull/41195
PR contains scripts to configure bastion node - https://github.com/ppc64le-cloud/hypershift-agent-automation/pull/2