Multi Region Cluster, Large footprint - apigee/ahr GitHub Wiki
The main tasks to create a dual data centre are:
- Plan your cluster, hybrid topology layout, and required infrastructure components;
- Create clusters;
- Create multi-Region Cassandra ring;
- Install Runtimes in each data centre;
- Configure GTM.
?. Let's define configuration of the installation.
Project: emea-cs-hybrid-demo6
Cluster Type: Multi-Zonal
config | region/zones | data x 3 | runtime x 3 | |
---|---|---|---|---|
common | dc-all.sh | e2-standard-4 | e2-standard-4 | |
dc1-cluster | dc1-cluster-s-1.1.0.sh | us-east1 b, c, d | ||
dc2-cluster | dc2-cluster-s-1.1.0.sh | asia-east1 a, b, c |
NOTE: Keep an eye on total core count as well as mem requirements. COMING SOON: resource calculator...
TODO: [ ] those are generic ahr usage instructions. move to a separate page.
?. Clone ahr and source ahr environment
NOTE: ahr scripts require yq for yaml file processing. To install yq on linux:
curl -L https://github.com/mikefarah/yq/releases/download/3.2.1/yq_linux_amd64 -o ~/bin/yq
chmod +x ~/bin/yq
mkdir -p ~/apigee-hybrid
cd ~/apigee-hybrid
git clone https://github.com/apigee/ahr
export AHR_HOME=~/apigee-hybrid/ahr
?. Create a project directory:
export HYBRID_HOME=~/apigee-hybrid/dual-dc-hybrid-110
mkdir -p $HYBRID_HOME
?. Copy examples of mult-region environment variable files.
cp $AHR_HOME/examples/dc-all.sh $HYBRID_HOME
cp $AHR_HOME/examples/dc1-cluster-l-1.1.0.sh $HYBRID_HOME
cp $AHR_HOME/examples/dc2-cluster-l-1.1.0.sh $HYBRID_HOME
?. Runtime Configuration
vi $HYBRID_HOME/dc-all.sh
Consider varibles common to every data centre. Put them inot dc-all.sh config file. Define differing elements in the each data centre file.
Three main groups of variables are:
- Hybrid version
- Project definition
- Cluster parameters
- Runtime configuration
TODO:? [ ] Expand??
Changes in this case:
export PROJECT=emea-cs-hybrid-demo6 # as we plan to use apigeeconnect, in this version mart ip and hostname must be defined but will not be used. export MART_HOST_ALIAS=$ORG-mart.hybrid-apigee.net export MART_IP=35.197.194.6 TODO: configure lb first! for each DC regions and zones runtime IPs
?. We keep things nice and tidy, and define cluster credentials config file in the project directory.
export KUBECONFIG=$PWD/config-dual-dc
?. Configure kubectl aliases and autocomplete, ahr-*-ctl path, and current project setting.
source $HYBRID_HOME/dc-all.sh
source $AHR_HOME/bin/ahr-env
Check that project reflect correct project.
?. OPTIONAL: You can cd to a project directory, however to keep things CI/CD-friendly, all file invocations are using full paths, and therefore current-location independent.
cd $HYBRID_HOME
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;
ahr-cluster-ctl template $CLUSTER_TEMPLATE > $CLUSTER_CONFIG;
ahr-cluster-ctl create
)
( source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;
ahr-cluster-ctl template $CLUSTER_TEMPLATE > $CLUSTER_CONFIG;
ahr-cluster-ctl create )
After the clusters are created, your config-dual-dc has two DCs configured, dc1-cluster and dc2-cluster.
?. Source kubectl configuration for a dc1 and check a cluster version source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh; source <(ahr-runtime-ctl home) kubectl version
TIP: Your session will expire. This is a set of statements to copy-and-paste it to your terminal to reset it again to point to a Project level [+ DC1 cluster [+ project directorty]]:
# for project level export AHR_HOME=~/apigee-hybrid/ahr export HYBRID_HOME=~/apigee-hybrid/dual-dc-hybrid-110 export KUBECONFIG=$HYBRID_HOME/config-dual-dc source $HYBRID_HOME/dc-all.sh source $AHR_HOME/bin/ahr-env
# for DC-cluster level source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh source <(ahr-runtime-ctl home)
# go to project directory cd $HYBRID_HOME
In our case, we create a set of project SAs that are used for each cluster.
ahr-sa-ctl create all
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
ahr-verify
)
(
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
ahr-verify
)
TIP: ahr-verify stops on error. User
ahr-verify --stoponerror=false
if you want to check all known violations.
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)
ahr-runtime-ctl template $RUNTIME_TEMPLATE > $RUNTIME_CONFIG;
)
(
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)
ahr-runtime-ctl template $RUNTIME_TEMPLATE > $RUNTIME_CONFIG;
)
ahr-runtime-ctl get
# dc1
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)
ahr-runtime-ctl apigeectl init -f $RUNTIME_CONFIG
)
To setup Cassandra Ring, we follow those steps:
- Install Cassandra in DC1
- Boot up a new region DC2 with an external seed from DC1
- Change seed host in DC2 back to its local cluster
- Reconfigure replication and rebuild nodes
This time, we will execute manual steps from the official documentation. Besides being error-prone, these steps are also harder to automate for CI/CD inclusion. ahr-cs-ctl solves this problem. Ahr includes ahr-cs-ctl command, which converts those steps into three actions:
ahr-cs-ctl keyspaces-list
ahr-cs-ctl keyspaces-expand
ahr-cs-ctl nodetool <args>
?. Install Cassandra in dc1
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
cd $APIGEECTL_HOME
apigeectl -c cassandra apply -f $RUNTIME_CONFIG
)
TIP: If you need to delete Cassandra component completely, don't forget about PVCs.
(cd $APIGEECTL_HOME apigeectl -c cassandra delete -f $RUNTIME_CONFIG kubectl delete pvc -l app=apigee-cassandra )
?. Check Cassandra ring status
kubectl --context dc1-cluster -n apigee exec -it apigee-cassandra-0 -- nodetool status
Output:
Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.44.0.5 90.72 KiB 256 100.0% bcbee035-2984-4e09-bec3-d117fbbbf80d ra-1
UN 10.44.3.3 94.15 KiB 256 100.0% 744e362b-d911-470d-8cd0-ff5dec0da4d8 ra-1
UN 10.44.5.3 112.3 KiB 256 100.0% 43c8ed0a-bf87-44da-ac6d-fc5ece8298f5 ra-1
?. Configure dc2-cluster as active cluster
# dc2
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)
?. To install dc1 cluster Root CA certificate and key to the dc2:
- create namespace cert-manager in dc2
- fetch certificates from dc1, apigee-ca
- put certificates to dc2
?. Create cert-manager namespace i dc2
kubectl create namespace cert-manager
?. Replicate apigee-ca key and certificate to dc2-cluster
kubectl --context=dc1-cluster get secret apigee-ca --namespace=cert-manager --export -o yaml | kubectl --context=dc2-cluster apply --namespace=cert-manager -f -
?. Installing Supporting Components at dc2
ahr-runtime-ctl apigeectl init -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl wait-for-ready -f $RUNTIME_CONFIG
Seed hosts are local cluster members. To boot up a new region an external seed host is required. Once a region boots up you need to change the seed hosts back to their local clusters in your runtime config yaml and then reapply the configuration.
cassandra:
multiRegionSeedHost: <ip-address-of-first-cs-node-in-dc1>
datacenter: "dc-2"
rack: "ra-1"
IMPORTANT: There is a bug in 1.1.x versions of Hybrid that pohibit correct processing of .cassandra.multiRegionSeedHost property. You hit this problem, if you see an error like:
Debug: Name does not resolve ERROR io.apigee.common.format.ErrorMessages - getFormattedMessage() : Unable to locate a resource bundle for error code apigee-cassandra-0.apigee-cassandra.apigee.svc.cluster.local,10.44.5.9: Name does not resolve apigee-cassandra-0: node: gke-dc2-cluster-apigee-data-999924fc-219f.
We need to patch the 4_cps-cassandra-setup.yaml file
?. vi $APIGEECTL_HOME/templates/4_cps-cassandra-setup.yaml
?. Edit line 6 from
{{- $cassSeed = (printf "%s,%s" $cassSeed .cassandra.multiRegionSeedHost) }}
to
{{- $cassSeed = (printf "%s" .cassandra.multiRegionSeedHost) }}
?. Lookup the result of nodetool status
command and note the IP address of first cassandra node. In our case, it's 10.44.0.5. We will use this node as an external seed node for dc1.
export DC1_CS_SEED_NODE=10.44.0.5
?. Add multiRegionSeedHost, datacenter, and rack properties into dc2-runtime.yaml config file. You should be in active dc2 environment.
echo $CLUSTER
yq m -i $RUNTIME_CONFIG - <<EOF
cassandra:
multiRegionSeedHost: $DC1_CS_SEED_NODE
datacenter: "dc-2"
rack: "ra-1"
EOF
? Install cassandra component into dc2
ahr-runtime-ctl apigeectl -c cassandra apply -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl -c cassandra wait-for-ready -f $RUNTIME_CONFIG
You now can see 6 PVCs, in the Storage page of Kubernetes Engine.
?. Run nodetool status command at dc1 to see that Cassandra ring is now 3 nodes in each dc.
kubectl --context dc1-cluster -n apigee exec -it apigee-cassandra-0 -- nodetool status
Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.44.0.5 426.04 KiB 256 100.0% bcbee035-2984-4e09-bec3-d117fbbbf80d ra-1
UN 10.44.3.3 427.08 KiB 256 100.0% 744e362b-d911-470d-8cd0-ff5dec0da4d8 ra-1
UN 10.44.5.3 420.85 KiB 256 100.0% 43c8ed0a-bf87-44da-ac6d-fc5ece8298f5 ra-1
Datacenter: dc-2
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.32.5.3 79.04 KiB 256 0.0% 17376e22-12ba-4b67-9b26-e0772cad2276 ra-1
UN 10.32.2.4 106.69 KiB 256 0.0% 8372f1a9-acae-418d-8b6f-abb37cd46dea ra-1
UN 10.32.0.4 79.04 KiB 256 0.0% 0f558429-f3ec-4cd7-ad34-b6e783e7d29a ra-1
?. Create an interactive containter. The bash shell will start.
kubectl run -i --tty --restart=Never --rm --image google/apigee-hybrid-cassandra-client:1.0.0 cqlsh
?. At the container bash prompt, execute cqlsh utility. Use
cqlsh apigee-cassandra-0.apigee-cassandra.apigee.svc.cluster.local -u ddl_user --ssl
# Password: iloveapis123
?. At the cqlsh prompt, run following commands to change replication and check state of before and after. Replace a project id in the keyspace name to yours.
NOTE: Please note, that a keyspace name contains a project id with dashes transcribed to underscores. As we execute cql statements in the cqlsh, we cannot use shell environment expansion. Replace a project id in the keyspace name to yours manually.
Example, for
kvm_$PROJECT_hybrid
, foremea-cs-hybrid-demo6
project, usekms_emea_cs_hybrid_demo6_hybrid
SELECT * from system_schema.keyspaces;
ALTER KEYSPACE cache_emea_cs_hybrid_demo6_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE kms_emea_cs_hybrid_demo6_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE kvm_emea_cs_hybrid_demo6_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE perses WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE quota_emea_cs_hybrid_demo6_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
SELECT * from system_schema.keyspaces;
?. exit from cqlsh and cqlsh container.
exit
exit
?. Rebuild nodes in dc1.
kubectl --context=dc2-cluster exec apigee-cassandra-0 -- nodetool rebuild dc-1
kubectl --context=dc2-cluster exec apigee-cassandra-1 -- nodetool rebuild dc-1
kubectl --context=dc2-cluster exec apigee-cassandra-2 -- nodetool rebuild dc-1
?. You can verify rebuild process using logs -f command. Example for the first CS node
kubectl --context=dc2-cluster logs apigee-cassandra-0 -f
...
INFO 22:54:33 rebuild from dc: dc-1, (All keyspaces), (All tokens)
INFO 22:54:34 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] Executing streaming plan for Rebuild
INFO 22:54:34 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] Starting streaming to /10.44.0.5
INFO 22:54:36 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8, ID#0] Beginning stream session with /10.44.0.5
INFO 22:54:37 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8 ID#0] Prepare completed. Receiving 6 files(5.122KiB), sending 0 files(0.000KiB)
INFO 22:54:38 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] Session with /10.44.0.5 is complete
INFO 22:54:38 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] All sessions completed
TIP: Something wit your Cassandra install might get wrong and you'd need to rebuild second cluster. I know, because that's what happened with me.
In this case, you would need to remove CS PVCs and repair Cassandra topology, remove nodes that correspond to deleted PVCS.
For this, run nodetool status on dc1 and notice uuids of non-existant nodes.
Execute nodetool remove operation to clean the topology. I.e.:
kubectl --context dc1-cluster exec -it apigee-cassandra-0 -- nodetool removenode > 0f558429-f3ec-4cd7-ad34-b6e783e7d29a
?. Remove ' multiRegionSeedHost: 10.44.5.9' in dc2-runtime.yaml
[ ] delete pod/apigee-cps-setup-emea-cs-hybrid-demo2 [ ] re-apply; check: apigee-cps-setup-emea-cs-hybrid-demo2 in dc2-cluster
?. Remove multiRegionSeedHost: property from .yaml file and delete/apply apigee-cps-setup* component to switch external seed node to the local datacentre seed node.
yq d -i $RUNTIME_CONFIG cassandra.multiRegionSeedHost
kubectl delete pod apigee-cps-setup-emea-cs-hybrid-demo6
ahr-runtime-ctl apigeectl -c cassandra apply -f $RUNTIME_CONFIG
?. Check status of the ring and observe that both datacenters have replicates data correctly.
kubectl --context dc1-cluster -n apigee exec -it apigee-cassandra-0 -- nodetool status
Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.44.0.5 466.28 KiB 256 100.0% bcbee035-2984-4e09-bec3-d117fbbbf80d ra-1
UN 10.44.3.3 451.89 KiB 256 100.0% 744e362b-d911-470d-8cd0-ff5dec0da4d8 ra-1
UN 10.44.5.3 430.05 KiB 256 100.0% 43c8ed0a-bf87-44da-ac6d-fc5ece8298f5 ra-1
Datacenter: dc-2
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.32.5.4 660.88 KiB 256 100.0% 17376e22-12ba-4b67-9b26-e0772cad2276 ra-1
UN 10.32.2.5 677.26 KiB 256 100.0% 8372f1a9-acae-418d-8b6f-abb37cd46dea ra-1
UN 10.32.0.5 614.42 KiB 256 100.0% 0f558429-f3ec-4cd7-ad34-b6e783e7d29a ra-1
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)
ahr-runtime-ctl apigeectl apply -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl wait-for-ready apply -f $RUNTIME_CONFIG
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)
# apigeectl apply in dc2
ahr-runtime-ctl apigeectl apply -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl wait-for-ready apply -f $RUNTIME_CONFIG
[ ] clear setsync
[ ] remove SAs
[ ] delete clusters
[ ] remove PVCs
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
ahr-cluster-ctl delete
)
(
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
ahr-cluster-ctl delete
)