openshift 4.7 hybrid small topology 1.4 ahr manual - apigee/ahr GitHub Wiki

Apigee Hybrid 1.4 at OpenShift 4.7 at GCP

[ ] WIP: 0%

Install OpenShift 4.7 at GCP

REF: https://docs.openshift.com/container-platform/4.7/installing/installing_gcp/installing-gcp-account.html

Network Planning

?. Define CIDRs

https://access.redhat.com/documentation/en-us/reference_architectures/2017/html/deploying_and_managing_openshift_container_platform_3_on_google_cloud_platform/components_and_considerations

  • Machine Network: 10.0.0.0/16

    • Control Plane Subnet: 10.0.0.0/20
    • Compute Subnet: 10.0.16.0/20
  • Cluster Network: 10.1.0.0/20

  • Service Network: 10.2.0.0/16

?. Prepare environment variables configuration

cat <<"EOF" > ~/ocp-network.env
export OPENSHIFT_SA=ocp-sa

export BASE_DOMAIN=exco-ocp.com
export NETWORK=ocp
export REGION=us-central1
export ZONE_1=us-central1-a
export ZONE_2=us-central1-c

export NETWORK_CIDR=10.0.0.0/9


export BASTION_NETWORK=ocp
#export BASTION_REGION=europe-west1
#export BASTION_SUBNET=machine-subnet
#export BASTION_ZONE=europe-west1-b
export BASTION_REGION=us-central1
export BASTION_ZONE=us-central1-a
export BASTION_SUBNET=machine-subnet


# last: 10.0.255.255     65,536
export MACHINE_NETWORK_CIDR=10.0.0.0/16

export MACHINE_SUBNET=machine-subnet
# last: 10.0.15.255   4,096
export MACHINE_SUBNET_CIDR=10.0.0.0/20

# The IP address pools for pods.
# last: 10.7.255.255   262,144
export CLUSTER_NETWORK_CIDR=10.4.0.0/14
export CLUSTER_NETWORK_HOST_PREFIX=23

# The IP address pools for services. 
# last: 10.8.255.255, 64,536
export SERVICE_NETWORK_CIDR=10.8.0.0/16

export CONTROL_PLANE_SUBNET=control-plane-subnet
# last: 10.0.31.255   4,096
export CONTROL_PLANE_SUBNET_CIDR=10.0.16.0/20

export COMPUTE_SUBNET=compute-subnet
# last: 10.0.47.255   4,096
export COMPUTE_SUBNET_CIDR=10.0.32.0/20
EOF

CloudShell: Configure ocp VPC

?. Define PROJECT environment variable

export PROJECT=<project-id>

?. Source environment variables

source ~/ocp-network.env

?. Create ocp network and firewall rules

gcloud compute networks create $NETWORK --project=$PROJECT --subnet-mode=custom

gcloud compute firewall-rules create $NETWORK-allow-internal --network $NETWORK --allow tcp,udp,icmp --source-ranges $NETWORK_CIDR

gcloud compute firewall-rules create $NETWORK-allow-access --network $NETWORK --allow tcp:22,tcp:3389,icmp

?. Create subnets

gcloud compute networks subnets create $MACHINE_SUBNET --project=$PROJECT --range=$MACHINE_SUBNET_CIDR --network=$NETWORK --region=$BASTION_REGION

gcloud compute networks subnets create $CONTROL_PLANE_SUBNET --project=$PROJECT --range=$CONTROL_PLANE_SUBNET_CIDR --network=$NETWORK --region=$REGION

gcloud compute networks subnets create $COMPUTE_SUBNET --project=$PROJECT --range=$COMPUTE_SUBNET_CIDR --network=$NETWORK --region=$REGION

?. Create Cloud NAT instance

gcloud compute routers create ocp-router \
    --network $NETWORK \
    --region=$REGION

gcloud compute routers nats create ocp-nat \
    --region=$REGION \
    --router=ocp-router \
    --auto-allocate-nat-external-ips \
    --nat-all-subnet-ip-ranges

CloudShell: Create Bastion VM

?. WARNING: For simplicity we made SA a project owner. It still is better than using your credentials on a shared VM. You might want to concider list of more granular roles before doing this in production.

export INSTALLER_SA_ID=installer-sa

gcloud iam service-accounts create $INSTALLER_SA_ID

roles='roles/owner'

for r in $roles; do
    gcloud projects add-iam-policy-binding $PROJECT \
        --member=serviceAccount:$INSTALLER_SA_ID@$PROJECT.iam.gserviceaccount.com \
        --role=$r
done

gcloud compute instances create bastion \
    --network $BASTION_NETWORK \
    --subnet $BASTION_SUBNET \
    --zone=$BASTION_ZONE \
    --machine-type=e2-standard-2 \
    --service-account $INSTALLER_SA_ID@$PROJECT.iam.gserviceaccount.com \
    --scopes cloud-platform

Bastion: Create OCP Cluster

?. SSH into bastion host

?. Make sure required utilities are installed

sudo apt install -y mc dnsutils git jq

?. Repeat cat <<"EOF" > ~/ocp-network.env command from the Network planning section at the bastion host.

cat <<EOF > ~/ocp-network.env
...

?. Define PROJECT environment variable again. This time in the current shell

export PROJECT=<project-id>

?. Source Environment Variables

source ~/ocp-network.env

?. Set current project

gcloud config set project $PROJECT

?. Enabling required API services in GCP

gcloud services enable \
    compute.googleapis.com \
    cloudapis.googleapis.com \
    cloudresourcemanager.googleapis.com \
    dns.googleapis.com \
    iamcredentials.googleapis.com \
    iam.googleapis.com \
    servicemanagement.googleapis.com \
    serviceusage.googleapis.com \
    storage-api.googleapis.com \
    storage-component.googleapis.com

Generating an SSH private key and adding it to the agent

?. Create an ssh key, no password for OCP nodes troubeshooting

ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_openshift

?. Start ssh-agent

eval "$(ssh-agent -s)"

?. Add key to the session

ssh-add ~/.ssh/id_openshift

Prerequisites: OCP install utility, Service Account, and Pull Secret

?. Define directories for downloads, OpenShift configuration files, and OpenShift intallation artefacts and logs.

mkdir ~/_downloads
mkdir -p ~/openshift-config
mkdir -p ~/openshift-install

?. Create OpenShift Service Account with required roles

gcloud iam service-accounts create $OPENSHIFT_SA \
    --description="OpenShift SA" \
    --display-name="OpenShift SA"

?. Define roles

export roles='roles/owner'


for r in $roles; do
    gcloud projects add-iam-policy-binding $PROJECT \
        --member=serviceAccount:$OPENSHIFT_SA@$PROJECT.iam.gserviceaccount.com \
        --role=$r
done

?. Download SA's key in JSON format

gcloud iam service-accounts keys create ~/openshift-config/$OPENSHIFT_SA.json --iam-account=$OPENSHIFT_SA@$PROJECT.iam.gserviceaccount.com

?. Download and upload pull secret. You can find it from your RedHat account. Keep the file name as a ~/pull-secret.txt for compatibility with commands.

Use Settings cog/Upload file if you have it at you local file system or copy-paste it.

?. Download OCP Installer

REF: https://cloud.redhat.com/openshift/install

cd ~/_downloads
curl -LO https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.7.52/openshift-install-linux.tar.gz

curl -LO https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.7.52/openshift-client-linux.tar.gz

?. Untar oc, kubectl, and openshift-install utilities to ~/.bin directory

mkdir ~/bin
source ~/.profile

tar xvf ~/_downloads/openshift-install-linux.tar.gz -C ~/bin
tar xvf ~/_downloads/openshift-client-linux.tar.gz -C ~/bin

?. Define SSH_KEY and PULL_SECRET variables

export SSH_KEY=$(cat ~/.ssh/id_openshift.pub)
export PULL_SECRET=$(cat ~/pull-secret.txt)

?. Validate non-empty values of SSH_KEY and PULL_SECRET variables

echo $SSH_KEY

echo $PULL_SECRET

OCP cluster installation configuration file

?. Define install-config.yaml in the ~/openshift-config folder.

Remember, this file will get deleted after openshift-install config files are created in the ~/openshift-install directory. Thus, if you want to start from scratch, you better have the original install config file copy in the ~/openshift-config directory.

cat <<EOF > ~/openshift-config/install-config.yaml
apiVersion: v1
baseDomain: $BASE_DOMAIN
controlPlane: 
  hyperthreading: Enabled   
  name: master
  platform:
    gcp:
      type: e2-standard-4
      zones:
      - $ZONE_1
      - $ZONE_2
      osDisk:
        diskType: pd-ssd
        diskSizeGB: 48
  replicas: 3
compute: 
- hyperthreading: Enabled 
  name: worker
  platform:
    gcp:
      type: e2-standard-4
      zones:
      - $ZONE_1
      - $ZONE_2
      osDisk:
        diskType: pd-standard
        diskSizeGB: 96
  replicas: 2
metadata:
  name: hybrid-cluster 
networking:
  clusterNetwork:
  - cidr: $CLUSTER_NETWORK_CIDR
    hostPrefix: $CLUSTER_NETWORK_HOST_PREFIX
  machineNetwork:
  - cidr: $MACHINE_NETWORK_CIDR
  networkType: OpenShiftSDN
  serviceNetwork:
  - $SERVICE_NETWORK_CIDR
platform:
  gcp:
    projectID: $PROJECT
    region: $REGION
    network: $NETWORK
    controlPlaneSubnet: $CONTROL_PLANE_SUBNET
    computeSubnet: $COMPUTE_SUBNET 
pullSecret: '$PULL_SECRET'
fips: false 
sshKey: "$SSH_KEY"
publish: Internal
EOF

Deploying the cluster

?. Let's use GOOGLE_CREDENTIALS variable to configure OCP install role

export GOOGLE_CREDENTIALS=~/openshift-config/$OPENSHIFT_SA.json

?. Copy `install-config.yaml' file into ~/openshift-install folder.

cp ~/openshift-config/install-config.yaml ~/openshift-install

?. Launch OCP installation. It could take 30-40 minutes to install it.

cd ~/openshift-install

time openshift-install create cluster --dir .

Sample Output:

INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/student-00-0a66f0db2708/openshift-install/auth/kubeconfig' 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.hybrid-cluster.exco-ocp.com 
INFO Login to the console with user: "kubeadmin", and password: "D6HT6-bN7hu-yJq9k-FGN4h" 
INFO Time elapsed: 36m44s                         

real    36m44.574s
user    0m51.641s
sys     0m3.263s

Make a note of kubectl credentials file as well as OCP Console URLs and user/password.

API Server: <internal-load-balancer>:6443

The api server serves requests at a configure load balancer $INFRA_ID-api-internal or port 6443 of any master node.

?. Configure openshift-install's kubeconfig as account default

mkdir -p ~/.kube
cp ~/openshift-install/auth/kubeconfig ~/.kube/config

?. Configure kubectl auto-completion

sudo apt install -y bash-completion

source ~/.profile

source <(kubectl completion bash)

?. Get infrastructure ID

export OCP_INFRA_ID=$(oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster)

echo $OCP_INFRA_ID-master

?. Identify IP Address for api server's internal LB

gcloud compute forwarding-rules describe $OCP_INFRA_ID-api-internal --region $REGION --format="value(IPAddress)"

10.0.16.3

?. A sample request to get an OCP version from a bastion host looks like:

curl -k https://10.0.16.3:6443/version

Sample Output:

{
  "major": "1",
  "minor": "20",
  "gitVersion": "v1.20.0+7d0a2b2",
  "gitCommit": "7d0a2b269a27413f5f125d30c9d726684886c69a",
  "gitTreeState": "clean",
  "buildDate": "2021-04-16T13:08:35Z",
  "goVersion": "go1.15.7",
  "compiler": "gc",
  "platform": "linux/amd64"
}

As we deliberatly installed a private cluster only, you would need to use any preffered technique to access it from your computer.

It is trivial to configure GCLB to expose it.

Alternativaly, an easy and secure way is to use IAP TCP forwarding via gcloud command.

API Server Exposed via IAP Tunnel

?. Bastion: Verify correct OCP cluster infrastructure ID value

echo $OCP_INFRA_ID-master

Bastion-side configuraton

?. Bastion: Create a firewall rule

gcloud compute firewall-rules create $NETWORK-apiserver-allow-ingress-from-iap \
  --direction=INGRESS \
  --action=allow \
  --rules=tcp:6443 \
  --network=$NETWORK \
  --target-tags=$OCP_INFRA_ID-master,$OCP_INFRA_ID-worker \
  --source-ranges=35.235.240.0/20

?. Bastion: Grant $USER permission to use IAP

gcloud projects add-iam-policy-binding $PROJECT \
    --member=user:$USER@qwiklabs.net  \
    --role=roles/iap.tunnelResourceAccessor

Local Terminal-side configuraton

At your PC/Laptom, open a new terminal window. We need to keep it open while tunnel is running.

?. Local Terminal: at the terminal of your PC, log into correct account

gcloud auth login

?. Local Terminal: Correct as appropriate env vars values

export OCP_INFRA_ID=<populate>
export HOST=$OCP_INFRA_ID-master-0
export PROJECT=<populate>
export ZONE=us-central1-a

?. Local Terminal: Start the tunnel to API Server

gcloud compute start-iap-tunnel $HOST 6443 \
    --local-host-port=localhost:6443 \
    --zone=$ZONE \
    --project $PROJECT

Expected output:

Testing if tunnel connection works.
Listening on port [6443].

?. Another Local Terminal: Open another terminal window.

?. Another Local Terminal: Sudo-edit /etc/hosts. Append following lines.

NOTE: Make sure a correct cluster value is used:

127.0.0.1    api.hybrid-cluster.exco-ocp.com
127.0.0.1    console-openshift-console.apps.hybrid-cluster.exco-ocp.com
127.0.0.1    oauth-openshift.apps.hybrid-cluster.exco-ocp.com

?. Another Local Terminal: Download a kubectl config file and set up session KUBECONFIG variable pointing on it

export KUBECONFIG=$PWD/config

?. Another Local Terminal: Verify connection setup

kubectl get nodes

The list of nodes of your cluster will be displayed.

OCP Console

An OCP Console is available as an internal load balancer, configured by OCP cluster for router-default service in openshift-ingress namespace.

?. Another Local Terminal: In another local terminal window, open port forwarded to a local port.

sudo `which kubectl` --kubeconfig $KUBECONFIG port-forward -n openshift-ingress service/router-default 443:443

Expected Output:

Forwarding from 127.0.0.1:443 -> 443
Forwarding from [::1]:443 -> 443

Keep this window open as long as you are working with OCP Console

?. Open https://console-openshift-console.apps.hybrid-cluster.exco-ocp.com URL in your browser.

Ignore certificate warning and proceed.

Use kubeadmin and its password from the install log output to login.

OpenShift Resource Usage Optimisation

We would need around 6.6 vCPUs to install given Apigee Hybrid instance topology. That would required 3 worker nodes. For a default typical OCP installation, 3 masters, 3 workers, considering we want to have a bastion host in the same region (otherwise openshift-install process is getting tricker), we are constrained by GCP region CPU quota limit: 24 per region. Taking into account that bootstrap node is removed after installation, however not before it prevents 2nd worker node to get up, there are two cluster operations we want to perform to have enough compute:

  • add another worker node (4 vCPUs)

  • configure masters to be able to run workloads.

?. Display list of machine sets

oc get machinesets -n openshift-machine-api

?. Scale a worker-a machine set to two replicas

oc scale --replicas=2 machineset $OCP_INFRA_ID-worker-a -n openshift-machine-api

?. Configure master nodes as schedulable

REF: https://docs.openshift.com/container-platform/4.7/nodes/nodes/nodes-nodes-working.html#nodes-nodes-working-master-schedulable_nodes-nodes-working

oc patch schedulers.config.openshift.io cluster --type json \
     -p '[{"op": "add", "path": "/spec/mastersSchedulable", "value": true}]'

This command will remove your master's noSchedule taint. This also will add the worker label to the master nodes.

Intall Apigee Hybrid at OpenShift Cluster

?. Clone ahr repository

cd ~
git clone https://github.com/apigee/ahr.git

?. Add ahr utilities directory to the PATH

export AHR_HOME=~/ahr
export PATH=$AHR_HOME/bin:$PATH

?. Define HYBRID_HOME installation folder

export HYBRID_HOME=~/apigee-hybrid-install
mkdir -p $HYBRID_HOME

?. Define install directory location and install environment file configuration

export HYBRID_ENV=$HYBRID_HOME/hybrid-1.4.env

?. Clone hybrid 1.4 template configuration

mkdir -p $HYBRID_HOME
cp $AHR_HOME/examples/hybrid-sz-s-1.4.sh $HYBRID_ENV

?. Source configuration environement variables

source $HYBRID_ENV

?. Enable required APIs

ahr-verify-ctl api-enable

GKE Hub: Cluster membership registration

?. Configure SCC rule to allow gke-connect container as root [Anthod prerequisite requirement]. Create a manifest file.

cat <<EOF > $HYBRID_HOME/gke-connect-scc.yaml 
# Connect Agent SCC 
apiVersion: v1
kind: SecurityContextConstraints
metadata:
  name: gke-connect-scc
allowPrivilegeEscalation: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
requiredDropCapabilities:
- ALL
runAsUser:
  type: MustRunAsNonRoot
seLinuxContext:
  type: RunAsAny
supplementalGroups:
  type: MustRunAs 
  ranges:
  - min: 1
    max: 65535
fsGroup:
  type: MustRunAs 
  ranges:
  - min: 1
    max: 65535
volumes:
- secret
readOnlyRootFilesystem: true
seccompProfiles:
- docker/default
users:
groups:
  - system:serviceaccounts:gke-connect
EOF

?. Apply the manifest

oc create -f $HYBRID_HOME/gke-connect-scc.yaml

?. Register Cluster with GCP Anthos hub

ahr-cluster-ctl anthos-hub-register

?. Create KSA account for anthos-user whose credentials [token] we will use to log into the cluster.

ahr-cluster-ctl anthos-user-ksa-create

?. Use commands from the output of the previous command to get the token and register the cluster via GCP Console Kubernetes Engine/Cluster web page

You can now use Anthos UI to control state and status of your OCP cluster.

?. Install cert manager [Apigee hybrid prerequisite]

kubectl apply --validate=false -f $CERT_MANAGER_MANIFEST

?. Grant anyuid SCC (based on this link) to install ASM.

oc adm policy add-scc-to-group anyuid system:serviceaccounts:istio-system

Provision Static IP Address for Istio Ingress Gateway's Load Balancer

Create a regional static IP address we will use for istio ingress gateway configuration.

gcloud compute addresses create runtime-ip --region $REGION

?. Fetch the ip address of the runtime-ip address and persist it in the $HYBRID_ENV config file.

export RUNTIME_IP=$(gcloud compute addresses describe runtime-ip --region $REGION --format='value(address)')

sed -i -E "s/^(export RUNTIME_IP=).*/\1$RUNTIME_IP/g" $HYBRID_ENV

Install Istio [a.k.a ASM]

?. Define variables to generate istio-operator.yaml manifest file from provided template

export ASM_PROFILE=asm-multicloud
sed -i -E "s/^(export ASM_PROFILE=).*/\1$ASM_PROFILE/g" $HYBRID_ENV
export ASM_RELEASE=$(echo "$ASM_VERSION"|awk '{sub(/\.[0-9]+-asm\.[0-9]+/,"");print}')

?. Make a local copy of the istio operator template in case we need to customise it.

```sh
cp $AHR_HOME/templates/istio-operator-$ASM_RELEASE-$ASM_PROFILE.yaml $HYBRID_HOME/istio-operator-ocp-template.yaml

?. Resolve template into a final manifest state.

ahr-cluster-ctl template $HYBRID_HOME/istio-operator-ocp-template.yaml > $ASM_CONFIG

The $ASM_CONFIG file now contains the IstioOperator spec we are going to use for ASM installation.

?. Download ASM distribution. It contains istioctl binary. This source syntax will also add istioctl location to the PATH value.

source <(ahr-cluster-ctl asm-get $ASM_VERSION)

?. Install ASM/istio

istioctl install -f $ASM_CONFIG

Apigee Organization Provisioning

?. Fetch apigectl distribution and setup PATH variable

source <(ahr-runtime-ctl get-apigeectl $HYBRID_HOME/$APIGEECTL_TARBALL)

?. Provisioning Apigee org, env, and env-group

ahr-runtime-ctl install-profile small asm-gcp -c apigee-org

Apigee Hybrid Runtime Installation

?. Apigee hybrid SCC. Apply the following before proceeding with any apigeectl commands. We need permissions to allow apigee logger containers to access local file system.

cat <<EOF > $HYBRID_HOME/apigee-scc.yaml 
# Apigee SCC
apiVersion: v1
kind: SecurityContextConstraints
metadata:
  name: apigee-scc
allowedCapabilities:
- 'IPC_LOCK'
- 'SYS_RESOURCE'
priority: 9
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: RunAsAny
fsGroup:
  type: RunAsAny
users:
groups:
  - system:serviceaccounts:apigee
---
apiVersion: v1
kind: SecurityContextConstraints
metadata:
  name: apigee-system-scc
priority: 9
runAsUser:
  type: MustRunAsRange
  uidRangeMin: 999
  uidRangeMax: 999
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: MustRunAs
  ranges:
  - min: 998
    max: 998
fsGroup:
  type: MustRunAs
  ranges:
  - min: 998
    max: 998
users:
groups:
  - system:serviceaccounts:apigee-system
EOF
oc create -f $HYBRID_HOME/apigee-scc.yaml

?. Make RUNTIME_CONFIG file

ahr-runtime-ctl install-profile small asm-gcp -c runtime-config

This file is important. When we need to change hybrid instance settings, we will do it via this manifest. Keep it safe.

?. Install hybrid runtime

ahr-runtime-ctl install-profile small asm-gcp -c runtime

Ping Proxy Deployment and Testing

?. Use provided script to import and deploy a ping proxy

$AHR_HOME/proxies/deploy.sh

?. In the same terminal session, execute following command to call ping API

curl --cacert $RUNTIME_SSL_CERT https://$RUNTIME_HOST_ALIAS/ping -v --resolve "$RUNTIME_HOST_ALIAS:443:$RUNTIME_IP" --http1.1

?. For other terminal session, you would need to initiate some variable

export RUNTIME_IP=<as-appropriate>
export RUNTIME_HOST_ALIAS=<as-appropriate>
export RUNTIME_SSL_CERT=<as-appropriate>
?. Copy RUNTIME_SSL_CERT to other VM or use `-k` to ignore trust certificate.

```sh
curl --cacert $RUNTIME_SSL_CERT https://$RUNTIME_HOST_ALIAS/ping -v --resolve "$RUNTIME_HOST_ALIAS:443:$RUNTIME_IP" --http1.1

Troubleshooting OCP installation process

When you endevour and change installation proces, here are some commands to help you if things go wrong.

  • Collect logs from VMs created during bootstrap phase
openshift-install --dir ~/openshift-install gather bootstrap --key ~/.ssh/id_openshift
  • [After you many any changes], wait till bootstrap phase completes
openshift-install wait-for bootstrap-complete --log-level debug  --dir ~/openshift-install
  • [After you many any changes], wait till install phase completes
openshift-install wait-for install-complete --log-level debug --dir ~/openshift-install 
  • [When you reach bootstrap installed point] Get list of deployment
oc --kubeconfig=$HOME/openshift-install/auth/kubeconfig --namespace=openshift-machine-api get deployments
  • Ssh into a bootstrap or master node for troubleshooting
ssh core@<node-ip> -i ~/.ssh/id_openshift
  • Nuke the cluster
openshift-install destroy cluster

Troubleshooting: Problems

Until 4.8, if constraints/storage.uniformBucketLevelAccess is set up in a GCP org, openshift-install will not be able to proceed the setup virtually right after start.

openshift-install in GCP projects: https://bugzilla.redhat.com/show_bug.cgi?id=1936375 "Error: googleapi: Error 412: Request violates constraint 'constraints/storage.uniformBucketLevelAccess', conditionNotMet

⚠️ **GitHub.com Fallback** ⚠️