Cilium multi cluster POC on AWS EKS - johnzheng1975/devops

Purpose

Connect two kubernetes, let they can access each other thorugh Pod IP, backup each other.
Prepare istio-multi cluster

EKS/ Cilium Creation

Requriment For ClusterMesh (I)

PodCIDR ranges in all clusters must be non-conflicting.

Challenge: EKS do not support "PodCIDR allocation"
Solution: Combine Cilium with AWS VPC CNI
- Create two EKS with differnt VPC cidr (10.248.0.0/18; 192.168.0.0/16)
- Use "AWS VPC CNI" to assign Pod IP address, which will belong to VPC cidr
- Install and Use Cilium with "chainingMode"

Requriment For ClusterMesh (II)

Etcd need be managed by Cilium using etcd-operator. Use a TLS protected etcd cluster with Cilium.

Solution: Install cilium with managed etcd

Command

# For eks creation, you can use terraform or eksctl, with specified the VPC range.

# For cilium installation, you can use below
$curl -LO https://github.com/cilium/cilium/archive/v1.6.8.tar.gz
$tar xzvf v1.6.8.tar.gz
$cd cilium-1.6.8/
$cd install/kubernetes/
$helm3 template cilium   --namespace kube-system   --version 1.6.8   --set global.etcd.enabled=true   --set global.etcd.managed=true   --set global.cni.chainingMode=aws-cni   --set global.masquerade=false   --set global.tunnel=disabled   --set global.nodeinit.enabled=true   > cilium.yaml
$cat cilium.yaml
$kubectl create -f cilium.yaml
$kubectl get pods -A -o wide

EKS VPC Peering

Requriment For ClusterMesh:

Nodes in all clusters must have IP connectivity between each other. The network between clusters must allow the inter-cluster communication.

Solution:
- Create AWS Peering Connections for these two VPC
- Change routetable for each vpc, point to each other
- Change "inbound rules" of security group binded on two cluster nodes, allow "All Traffic" for intenal access (10.248.0.0/18; 192.168.0.0/16)

Improve Cilium Privileges

Requriment For ClusterMesh (I):

Cilium interacts with the Linux kernel to install BPF program which will then perform networking tasks and implement security rules. In order to install BPF programs system-wide, CAP_SYS_ADMIN privileges are required. These privileges must be granted to cilium-agent.

Solution:: Edit cilium daemonset, add rights "SYS_ADMIN"

Command:: kubectl edit ds -n kube-system cilium; Make sure below is SYS_ADMIN is added

        securityContext:
        capabilities:
          add:
          - NET_ADMIN
          - SYS_ADMIN
          - SYS_MODULE
        privileged: true

Requriment For ClusterMesh (II):

Cilium requires access to the host networking namespace.

Solution:: Edit cilium daemonset, add rights "SYS_ADMIN"
Command:: kubectl edit ds -n kube-system cilium; Make sure below exists:
```
hostNetwork: true
```

Clustermesh Setup

Specify unique cluster name and ID

For each cluster, make name/ ID unique.

kubectl -n kube-system edit cm cilium-config
[ ... add/edit ... ]
cluster-name: cluster1
cluster-id: "1"

Expose the Cilium etcd to other clusters

Create etcd external service for each cluster, apply below yaml file.

apiVersion: v1
kind: Service
metadata:
  name: cilium-etcd-external
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
spec:
  type: LoadBalancer
  ports:
  - port: 2379
  selector:
    app: etcd
    etcd_cluster: cilium-etcd
    io.cilium/app: etcd-operator

Extract the TLS keys and generate the etcd configuration

Clone the cilium/clustermesh-tools repository.

$git clone https://github.com/cilium/clustermesh-tools.git
$cd clustermesh-tools

Extract the TLS certificate, key and root CA authority for each cluster.
```
$./extract-etcd-secrets.sh
```
Repeat this step for all clusters, copy the result to same folder.
Generate a single Kubernetes secret from all the keys and certificates extracted.
```
$./generate-secret-yaml.sh > clustermesh.yaml
```

Ensure that the etcd service names can be resolved.

$./generate-name-mapping.sh > ds.patch

#The ds.patch will like:
#spec:
#  template:
#    spec:
#      hostAliases:
#      - ip: "10.138.0.18"
#        hostnames:
#        - cluster1.mesh.cilium.io
#      - ip: "10.138.0.19"
#        hostnames:
#        - cluster2.mesh.cilium.io

#Apply it
$kubectl -n kube-system patch ds cilium -p "$(cat ds.patch)"

Apply clustermesh.yaml, prepared secrets/ keys

kubectl -n kube-system apply -f clustermesh.yaml

Restart all pods for cilium, cilium-operator, cilium-etcd, etcd-operator, coredns
```
kubectl -n kube-system delete pods --all
```
Wait 10 - 40 minutes, until all pods works and not restart again.

Check cilium node list can get full list of nodes discovered.

tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ k exec -ti cilium-scwh8 -n ks -- cilium node list
Name                                                  IPv4 Address     Endpoint CIDR   IPv6 Address   Endpoint CIDR
ctryc2/ip-10-248-11-160.us-west-2.compute.internal    10.248.11.160    10.160.0.0/16                  
ctryc2/ip-10-248-5-152.us-west-2.compute.internal     10.248.5.152     10.136.0.0/16                  
ctryc2/ip-10-248-7-35.us-west-2.compute.internal      10.248.7.35      10.35.0.0/16                   
ctryc3/ip-192-168-36-48.us-west-2.compute.internal    192.168.36.48    10.48.0.0/16                   
ctryc3/ip-192-168-4-254.us-west-2.compute.internal    192.168.4.254    10.254.0.0/16                  
ctryc3/ip-192-168-67-107.us-west-2.compute.internal   192.168.67.107   10.107.0.0/16                  
ctryc3/ip-192-168-84-58.us-west-2.compute.internal    192.168.84.58    10.58.0.0/16

Test

One cluster pod can access another cluster pod through IP

# Pods IP in one cluster (tryc2)
tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ k get pods -n legacy -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
httpbin-5446f4d9b4-x4jfw   1/1     Running   0          18h   10.248.0.25    ip-10-248-5-152.us-west-2.compute.internal   <none>           <none>
sleep-5bbf6b4f77-5zp2t     1/1     Running   0          18h   10.248.6.155   ip-10-248-7-35.us-west-2.compute.internal    <none>           <none>

# Pods IP in another cluster (tryc3)
tryc3@ip-172-31-0-31:~/clustermesh-tools$ k get pods -n legacy -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP               NODE                                          NOMINATED NODE   READINESS GATES
httpbin-5446f4d9b4-6hvnb   1/1     Running   0          18h   192.168.13.185   ip-192-168-4-254.us-west-2.compute.internal   <none>           <none>
sleep-5bbf6b4f77-v7dfq     1/1     Running   0          18h   192.168.68.184   ip-192-168-84-58.us-west-2.compute.internal   <none>           <none>

# The pod in cluster tryc2 can access the pod ip in cluster tryc3 directly
tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ k exec -ti sleep-5bbf6b4f77-5zp2t  -n legacy -- curl 192.168.13.185/ip
{
  "origin": "10.248.7.35"
}

# The pod in cluster tryc3 can access the pod ip in cluster tryc2 directly
tryc3@ip-172-31-0-31:~/clustermesh-tools$ k exec -ti sleep-5bbf6b4f77-v7dfq  -n legacy -- curl 10.248.0.25/ip
{
  "origin": "192.168.84.58"
}

Support global service, point to the pods cross two clusters. You can implement this through add io.cilium/global-service: "true" in annotations.

# Deploy in cluster1
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.7.1/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml

# Deploy in cluster2
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.7.1/examples/kubernetes/clustermesh/global-service-example/cluster2.yaml

# Access the service, you can get response from pods which are in different clusters
tryc3@ip-172-31-0-31:~/clustermesh-tools$ k exec -ti x-wing-5db7fc5c8f-2xhxj -- curl rebel-base
{"Galaxy": "Alderaan", "Cluster": "Cluster-2"}
tryc3@ip-172-31-0-31:~/clustermesh-tools$ k exec -ti x-wing-5db7fc5c8f-2xhxj -- curl rebel-base
{"Galaxy": "Alderaan", "Cluster": "Cluster-1"}

Debug

Refer to http://docs.cilium.io/en/stable/gettingstarted/clustermesh/#troubleshooting Use similar command to make sure each cilium pods works well

tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ kubectl get pods -n kube-system -l  k8s-app=cilium |grep cilium|awk '{print $1}'|xargs -i sh -c 'echo "\n";kubectl logs {} -n kube-system|grep " remote "'

level=info msg="New remote cluster configuration" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
level=info msg="Connection to remote cluster established" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
level=info msg="Established connection to remote etcd" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus="etcd: 1/1 connected, lease-ID=1cd771353dce2904, lock lease-ID=1cd771353dce2906, has-quorum=true: https://ctryc3.mesh.cilium.io:2379 - 3.3.12 (Leader)" subsys=clustermesh

level=info msg="New remote cluster configuration" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
level=info msg="Connection to remote cluster established" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
level=info msg="Established connection to remote etcd" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus="etcd: 1/1 connected, lease-ID=6da571353d29d95b, lock lease-ID=6da571353d29d95d, has-quorum=true: https://ctryc3.mesh.cilium.io:2379 - 3.3.12 (Leader)" subsys=clustermesh

level=info msg="New remote cluster configuration" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
level=info msg="Connection to remote cluster established" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
level=info msg="Established connection to remote etcd" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus="etcd: 1/1 connected, lease-ID=73cd71353d503821, lock lease-ID=73cd71353d503823, has-quorum=true: https://ctryc3.mesh.cilium.io:2379 - 3.3.12" subsys=clustermesh

If you find one cilium not healty, restart it manually.

Concern/ Need Improve

I think this is may focus on POC, still need improve for production environment.

etcd nodes operated by the etcd-operator will not use persistent storage. Once the etcd cluster looses quorum, the etcd cluster is automatically re-created by the cilium-etcd-operator. Cilium will automatically recover and re-create all state in etcd. This operation can take couple of seconds John: may take even 10 minutes to recovery after all kube-system pods restart and may cause minor disruptions as ongoing distributed locks are invalidated and security identities have to be re-allocated. (http://docs.cilium.io/en/stable/gettingstarted/k8s-install-etcd-operator/#limitations)
The cilium pod may unhealthy, or cannot join in clustermesh sometimes, until you restart it manually.

Cilium multi cluster POC on AWS EKS - johnzheng1975/devops_way GitHub Wiki

Purpose

EKS/ Cilium Creation

Requriment For ClusterMesh (I)

Requriment For ClusterMesh (II)

Command

EKS VPC Peering

Requriment For ClusterMesh:

Improve Cilium Privileges

Requriment For ClusterMesh (I):

Requriment For ClusterMesh (II):

Clustermesh Setup

Specify unique cluster name and ID

Expose the Cilium etcd to other clusters

Extract the TLS keys and generate the etcd configuration

Test

Debug

Concern/ Need Improve

Referance

⚠️ GitHub.com Fallback ⚠️

Cilium multi cluster POC on AWS EKS - johnzheng1975/devops_way GitHub Wiki

Purpose

EKS/ Cilium Creation

Requriment For ClusterMesh (I)

Requriment For ClusterMesh (II)

Command

EKS VPC Peering

Requriment For ClusterMesh:

Improve Cilium Privileges

Requriment For ClusterMesh (I):

Requriment For ClusterMesh (II):

Clustermesh Setup

Specify unique cluster name and ID

Expose the Cilium etcd to other clusters

Extract the TLS keys and generate the etcd configuration

Test

Debug

Concern/ Need Improve

Referance

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️