Agent based Hosted Cluster on PowerVM using MCE with Assisted Service and Hypershift - hypershift-on-power/hack GitHub Wiki

Introduction:

  • This document is trying to provide the steps to create agent based hosted cluster on PowerVM using hypershift.
  • Using MCE operator from operator hub to install the necessary operators like assisted-service, hypershift and other required custom resources.

Steps:

  1. Install the MCE operator
  2. Create AgentServiceConfig
  3. Create Hosted Control Plane
  4. Create InfraEnv
  5. Add agents
  6. Scale Node Pool
  7. Confirm cluster is working fine

Install the MCE operator

  • Install the latest version of MultiCluster Engine operator from operator hub by following this link.

    • If you are sure that the currently available MCE operator from Operator Hub is latest, you can install it or else can follow next instruction to get the latest version of MCE.

    • Latest version can be made available to download by creating stage catalog source which requires updating the cluster pull secret with brew.registry.redhat.io's creds and updating brew mirror info in /etc/containers/registries.conf in worker nodes.

      • Instructions to update global pull secret can be found here.
      • Once pull secret is updated, nodes need to be replaced with the new set of nodes which will inject the updated pull secret to worker nodes.
      • Update /etc/containers/registries.conf with below content in all the worker nodes.
      [[registry]]
        location = "registry.stage.redhat.io"
        insecure = false
        blocked = false
        mirror-by-digest-only = true
        prefix = ""
      
        [[registry.mirror]]
          location = "brew.registry.redhat.io"
          insecure = false
      
      [[registry]]
        location = "registry.redhat.io/multicluster-engine"
        insecure = false
        blocked = false
        mirror-by-digest-only = true
        prefix = ""
      
        [[registry.mirror]]
          location = "brew.registry.redhat.io/multicluster-engine"
          insecure = false
      
      • Reboot the worker nodes.
      • Create stage catalog
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      metadata:
        name: redhat-operators-stage
        namespace: openshift-marketplace
      spec:
        sourceType: grpc
        publisher: redhat
        displayName: Red Hat Operators v4.13 Stage
        image: quay.io/openshift-release-dev/ocp-release-nightly:iib-int-index-art-operators-4.13
      • Once catalog is created, latest MCE operator would be available to install from operator hub.
  • Once MCE operator installed, need to create an instance of MultiClusterEngine which can be created in console itself while installing MCE operator.

  • Enable hypershift-preview component in MultiClusterEngine engine instance to install hypershift operator and Hosted Cluster CRDs.

$ oc edit mce
  • assisted-service operator requires the BMH CRD, install the CRD using following instruction to install if OpenShift doesn't contain it.
# IBM Cloud's ROKS solution doesn't have this CRD, hence this step is mandatory if you use ROKS as management cluster.
$ oc apply -f https://raw.githubusercontent.com/openshift/assisted-service/master/hack/crds/metal3.io_baremetalhosts.yaml
  • List ClusterImageSet to verify the release that you want to use has a respective ClusterImageSet or not.
# This would list the ClusterImageSet available in the cluster
$ oc get ClusterImageSet
  • If ClusterImageSet for the ocp-release that you are looking to use, is not exist, please create it like below
# Trying to create ClusterImageSet for OCP 4.13.0-multi
cat <<EOF | oc create -f -
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: img4.13.0-multi-appsub
spec:
  releaseImage: quay.io/openshift-release-dev/ocp-release:4.13.0-ec.3-multi
EOF

Create AgentServiceConfig

Here we need to create an agent service configuration custom resource that will tell the operator how much storage we need for the various components like database and filesystem and it will also define what OpenShift versions to maintain. Here is an example for OCP 4.12.0, please substitute correct values as per your env.

export DB_VOLUME_SIZE="10Gi"
export FS_VOLUME_SIZE="10Gi"
export OCP_VERSION="4.12.0"
export OCP_MAJMIN=${OCP_VERSION%.*}
export ARCH="ppc64le"
export OCP_RELEASE_VERSION=$(curl -s https://mirror.openshift.com/pub/openshift-v4/${ARCH}/clients/ocp/${OCP_VERSION}/release.txt | awk '/machine-os / { print $2 }')
export ISO_URL="https://mirror.openshift.com/pub/openshift-v4/${ARCH}/dependencies/rhcos/${OCP_MAJMIN}/${OCP_VERSION}/rhcos-${OCP_VERSION}-${ARCH}-live.${ARCH}.iso"
export ROOT_FS_URL="https://mirror.openshift.com/pub/openshift-v4/${ARCH}/dependencies/rhcos/${OCP_MAJMIN}/${OCP_VERSION}/rhcos-${OCP_VERSION}-${ARCH}-live-rootfs.${ARCH}.img"

envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
 name: agent
spec:
  databaseStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: ${DB_VOLUME_SIZE}
  filesystemStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: ${FS_VOLUME_SIZE}
  osImages:
    - openshiftVersion: "${OCP_VERSION}"
      version: "${OCP_RELEASE_VERSION}"
      url: "${ISO_URL}"
      rootFSUrl: "${ROOT_FS_URL}"
      cpuArchitecture: "${ARCH}"
EOF
  • If stage catalog with brew mirror is used to install latest MCE operator, need to update AgentServiceConfig with the brew mirror information. For that need to do following steps.
    • Create a config map in the same namespace where MCE is installed that is multicluster-engine usually.
    cat <<EOF | oc create -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: mirror-config
      namespace: multicluster-engine # please verify that this namespace is where MCE is installed. 
      labels:
        app: assisted-service
    data:
      registries.conf: |
        unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
    
        [[registry]]
          location = "registry.stage.redhat.io"
          insecure = false
          blocked = false
          mirror-by-digest-only = true
          prefix = ""
    
          [[registry.mirror]]
            location = "brew.registry.redhat.io"
            insecure = false
    
        [[registry]]
          location = "registry.redhat.io/multicluster-engine"
          insecure = false
          blocked = false
          mirror-by-digest-only = true
          prefix = ""
    
          [[registry.mirror]]
            location = "brew.registry.redhat.io/multicluster-engine"
            insecure = false
    EOF
    • Refer config map created in AgentServiceConfig created.
    $ oc edit AgentServiceConfig agent
    Add below field in spec
    mirrorRegistryRef:
      name: mirror-config

Create Hosted Control Plane

  • Build hypershift binary
git clone https://github.com/openshift/hypershift.git
cd hypershift
make build
  • Create agent cluster

When you create a hosted cluster with the Agent platform, HyperShift installs the Agent CAPI provider in the Hosted Control Plane (HCP) namespace.

#!/usr/bin/env bash

export CLUSTERS_NAMESPACE="clusters"
export HOSTED_CLUSTER_NAME="example"
export HOSTED_CONTROL_PLANE_NAMESPACE="${CLUSTERS_NAMESPACE}-${HOSTED_CLUSTER_NAME}"

# domain managed via IBM CIS
# Please add correct domain as per your IBM CIS instance
# export BASEDOMAIN="<CIS_DOMAIN>"
export BASEDOMAIN="hypershift-ppc64le.com"

# Please make sure auth info for `brew.registry.redhat.io` exists if it's used as mirror registry to install MCE operator.
export PULL_SECRET_FILE=${HOME}/.hypershift/pull_secret.txt

export OCP_RELEASE=4.12.0-multi
export MACHINE_CIDR=192.168.122.0/24
# Typically the namespace is created by the hypershift-operator
# but agent cluster creation generates a capi-provider role that
# needs the namespace to already exist
oc create ns ${HOSTED_CONTROL_PLANE_NAMESPACE}

bin/hypershift create cluster agent \
    --name=${HOSTED_CLUSTER_NAME} \
    --pull-secret="${PULL_SECRET_FILE}" \
    --agent-namespace=${HOSTED_CONTROL_PLANE_NAMESPACE} \
    --base-domain=${BASEDOMAIN} \
    --api-server-address=api.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN} \
    --ssh-key ${HOME}/.ssh/id_rsa.pub \
    --release-image=quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE} --render> cluster-agent.yaml
  • Modify the rendered file

Change the servicePublishingStrategy to LoadBalancer and Route(because ROKS cluster is deployed in the cloud environment and nodes are in private network and can't be accessed by the workers directly).

Note: If management cluster is in name network as workers and if workers can talk to nodes in the management cluster then no changes needed

  - service: APIServer
    servicePublishingStrategy:
      nodePort:
        address: api.example.hypershift-ppc64le.com
      type: NodePort
  - service: OAuthServer
    servicePublishingStrategy:
      nodePort:
        address: api.example.hypershift-ppc64le.com
      type: NodePort
  - service: OIDC
    servicePublishingStrategy:
      nodePort:
        address: api.example.hypershift-ppc64le.com
      type: None
  - service: Konnectivity
    servicePublishingStrategy:
      nodePort:
        address: api.example.hypershift-ppc64le.com
      type: NodePort
  - service: Ignition
    servicePublishingStrategy:
      nodePort:
        address: api.example.hypershift-ppc64le.com
      type: NodePort
  - service: OVNSbDb
    servicePublishingStrategy:
      nodePort:
        address: api.example.hypershift-ppc64le.com
      type: NodePort

to

  - service: APIServer
    servicePublishingStrategy:
      type: LoadBalancer
  - service: OAuthServer
    servicePublishingStrategy:
      type: Route
  - service: OIDC
    servicePublishingStrategy:
      type: None
  - service: Konnectivity
    servicePublishingStrategy:
      type: Route
  - service: Ignition
    servicePublishingStrategy:
      type: Route
  - service: OVNSbDb
    servicePublishingStrategy:
      type: Route
  • Create it
$ oc apply -f cluster-agent.yaml
  • Update the DNS

Once Hosted cluster pods are deployed, list the Kubernetes api server svc and create the DNS entries(Creating entries in the IBM Cloud CIS for this use case)

$ oc get svc kube-apiserver -n ${HOSTED_CONTROL_PLANE_NAMESPACE}
NAME             TYPE           CLUSTER-IP    EXTERNAL-IP                           PORT(S)          AGE
kube-apiserver   LoadBalancer   172.21.75.9   77e7905f-us-east.lb.appdomain.cloud   6443:32729/TCP   6h50m
image

Note: Add *.apps.example entry when we have worker IP address later in the doc.

Create InfraEnv

An InfraEnv is a enviroment to which hosts booting the live ISO can join as Agents. In this case, the Agents will be created in the same namespace as our HostedControlPlane.

export SSH_PUB_KEY=$(cat $HOME/.ssh/id_rsa.pub)
export CLUSTERS_NAMESPACE="clusters"
export HOSTED_CLUSTER_NAME="example"
export HOSTED_CONTROL_PLANE_NAMESPACE="${CLUSTERS_NAMESPACE}-${HOSTED_CLUSTER_NAME}"
export ARCH="ppc64le"

envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: ${HOSTED_CLUSTER_NAME}
  namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
spec:
  cpuArchitecture: $ARCH
  pullSecretRef:
    name: pull-secret
  sshAuthorizedKey: ${SSH_PUB_KEY}
EOF
  • Once InfraEnv is created, a minimal iso would be generated by assisted service which can be used to bring up the worker nodes. Get the ISO Download URL from below command.
$ oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get InfraEnv ${HOSTED_CLUSTER_NAME} -ojsonpath="{.status.isoDownloadURL}"

(Optional) Create static IP config for workers

If you want to configure static ip for the workers, need to create an instance of NMStateConfig resource and use a label to refer it in InfraEnv.

Need to ensure appropriate Mac address is used to map the network interface to set the static ip address. Interface name would get overrode with the interface name present in the VM.

apiVersion: agent-install.openshift.io/v1beta1
kind: NMStateConfig
metadata:
  name: static-ip-test-nmstate-config
  namespace: clusters-static-ip-test
  labels:
    infraenv: static-ip-test-ppc64le
spec:
  config:
    interfaces:
      - name: eth0
        type: ethernet
        state: up
        mac-address: fa:16:3e:f0:41:4b
        ipv4:
          enabled: true
          address:
            - ip: 9.114.97.133
              prefix-length: 24
          dhcp: false
    dns-resolver:
      config:
        server:
          - 9.3.1.200
    routes:
      config:
        - destination: 0.0.0.0/0
          next-hop-address: 9.114.96.1
          next-hop-interface: eth0
          table-id: 254
  interfaces:
    - name: "eth0"
      macAddress: "fa:16:3e:f0:41:4b"

Use below label mapping in InfraEnv to use the NMStateConfig created.

  nmStateConfigLabelSelector:
    matchLabels:
      infraenv: static-ip-test-ppc64le

Add Agents

  • Create VM under PowerVM hypervisor with the downloaded ISO as boot medium.

  • After sometime, you will see the agents

$ oc get agents -n ${HOSTED_CONTROL_PLANE_NAMESPACE} -o=wide
NAMESPACE          NAME                                   CLUSTER   APPROVED   ROLE          STAGE   HOSTNAME            REQUESTED HOSTNAME
clusters-example   5e65c2be-ea88-3796-533f-4a4b5c2e420b             false      auto-assign           52-54-00-ee-66-ff
  • Once you see the agents, approve the agent.
$ oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} patch agent 5e65c2be-ea88-3796-533f-4a4b5c2e420b -p '{"spec":{"approved":true}}' --type merge

Accessing the HostedCluster

We have the HostedControlPlane running and the Agents ready to join the HostedCluster. Before we join the Agents let's access the HostedCluster.

First, we need to generate the kubeconfig:

$ hypershift create kubeconfig --namespace clusters --name example > example.kubeconfig

If we access the cluster we will see that we don't have any nodes and that the ClusterVersion is trying to reconcile the OCP release:

$ oc --kubeconfig example.kubeconfig get clusterversion,nodes
NAME                                         VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
clusterversion.config.openshift.io/version             False       True          8m6s    Unable to apply 4.12.0: some cluster operators have not yet rolled out

In order to get the cluster in a running state we need to add some nodes to it. Let's do it.

Scale Node Pool

We add nodes to our HostedCluster by scaling the NodePool object. In this case we will start by scaling the NodePool object to one node. You can scale the node pool like below which will randomly choose the agent which is ready to bound to the cluster.

$ oc -n clusters scale nodepool example --replicas 1

If you want to bound a specific agent, need to update node pool created with label selector and scale the node pool like the command mentioned above.

    agent:
      agentLabelSelector:
        matchLabels:
          inventory.agent-install.openshift.io/cpu-architecture: x86_64

Need to make sure the agent you want to bound has this inventory.agent-install.openshift.io/cpu-architecture: x86_64 label. By default agents will be populated with cpu-architecture label.

Confirm cluster is working fine

  • Once agent is approved and nodepool is scaled, agent will go through the installation process and you would see a couple of reboots, after quite sometime, you would see the agent reaching the Done stage, which means agent is successfully added to the hosted cluster and you can start using it.
  • Those agents go through different states and finally join the hosted cluster as OpenShift Container Platform nodes. The states pass from binding to discovering to insufficient to installing to installing-in-progress to added-to-existing-cluster. Below command's outputs denotes that resources are in good state.
$ oc get agents -A -o=wide
NAMESPACE          NAME                                   CLUSTER   APPROVED   ROLE     STAGE   HOSTNAME            REQUESTED HOSTNAME
clusters-example   5e65c2be-ea88-3796-533f-4a4b5c2e420b   example   true       worker   Done    52-54-00-ee-66-ff   worker-0.hypershift-ppc64le.com
  • Status of the agents
$ oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agent -o jsonpath='{range .items[*]}BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} Agent: {@.metadata.name} State: {@.status.debugInfo.state}{"\n"}{end}'

BMH:  Agent: 5e65c2be-ea88-3796-533f-4a4b5c2e420b State: added-to-existing-cluster
  • Hypershift resources
$ oc get clusters -A -o=wide
NAMESPACE          NAME            PHASE         AGE     VERSION
clusters-example   example-znwfk   Provisioned   6h43m

$ oc get machines -o=wide -A
NAMESPACE          NAME                       CLUSTER         NODENAME                          PROVIDERID                                     PHASE     AGE     VERSION
clusters-example   example-656c864db8-8zgzv   example-znwfk   worker-0.hypershift-ppc64le.com   agent://5e65c2be-ea88-3796-533f-4a4b5c2e420b   Running   6h35m   4.12.0

$ oc get agentmachines -A
NAMESPACE          NAME            AGE
clusters-example   example-zm2kf   6h35m
  • Access the hostedcluster via kubeconfig
$ hypershift create kubeconfig --namespace clusters --name example > example.kubeconfig

$ export KUBECONFIG=example.kubeconfig

$ oc get nodes -o=wide
NAME                              STATUS   ROLES    AGE   VERSION           INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                  CONTAINER-RUNTIME
worker-0.hypershift-ppc64le.com   Ready    worker   19m   v1.25.4+77bec7a   192.168.122.149   <none>        Red Hat Enterprise Linux CoreOS 412.86.202301061548-0 (Ootpa)   4.18.0-372.40.1.el8_6.ppc64le   cri-o://1.25.1-5.rhaos4.12.git6005903.el8

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          6h57m   Unable to apply 4.12.0: some cluster operators are not available
⚠️ **GitHub.com Fallback** ⚠️