Cilium multi cluster POC on AWS EKS - johnzheng1975/devops_way GitHub Wiki

Purpose

  • Connect two kubernetes, let they can access each other thorugh Pod IP, backup each other.
  • Prepare istio-multi cluster

EKS/ Cilium Creation

Requriment For ClusterMesh (I)

PodCIDR ranges in all clusters must be non-conflicting.

  • Challenge: EKS do not support "PodCIDR allocation"
  • Solution: Combine Cilium with AWS VPC CNI
    • Create two EKS with differnt VPC cidr (10.248.0.0/18; 192.168.0.0/16)
    • Use "AWS VPC CNI" to assign Pod IP address, which will belong to VPC cidr
    • Install and Use Cilium with "chainingMode"

Requriment For ClusterMesh (II)

Etcd need be managed by Cilium using etcd-operator. Use a TLS protected etcd cluster with Cilium.

  • Solution: Install cilium with managed etcd

Command

# For eks creation, you can use terraform or eksctl, with specified the VPC range.

# For cilium installation, you can use below
$curl -LO https://github.com/cilium/cilium/archive/v1.6.8.tar.gz
$tar xzvf v1.6.8.tar.gz
$cd cilium-1.6.8/
$cd install/kubernetes/
$helm3 template cilium   --namespace kube-system   --version 1.6.8   --set global.etcd.enabled=true   --set global.etcd.managed=true   --set global.cni.chainingMode=aws-cni   --set global.masquerade=false   --set global.tunnel=disabled   --set global.nodeinit.enabled=true   > cilium.yaml
$cat cilium.yaml
$kubectl create -f cilium.yaml
$kubectl get pods -A -o wide

EKS VPC Peering

Requriment For ClusterMesh:

Nodes in all clusters must have IP connectivity between each other. The network between clusters must allow the inter-cluster communication.

  • Solution:
    • Create AWS Peering Connections for these two VPC
    • Change routetable for each vpc, point to each other
    • Change "inbound rules" of security group binded on two cluster nodes, allow "All Traffic" for intenal access (10.248.0.0/18; 192.168.0.0/16)

Improve Cilium Privileges

Requriment For ClusterMesh (I):

Cilium interacts with the Linux kernel to install BPF program which will then perform networking tasks and implement security rules. In order to install BPF programs system-wide, CAP_SYS_ADMIN privileges are required. These privileges must be granted to cilium-agent.

  • Solution:: Edit cilium daemonset, add rights "SYS_ADMIN"
  • Command:: kubectl edit ds -n kube-system cilium; Make sure below is SYS_ADMIN is added
            securityContext:
            capabilities:
              add:
              - NET_ADMIN
              - SYS_ADMIN
              - SYS_MODULE
            privileged: true
    

Requriment For ClusterMesh (II):

Cilium requires access to the host networking namespace.

  • Solution:: Edit cilium daemonset, add rights "SYS_ADMIN"
  • Command:: kubectl edit ds -n kube-system cilium; Make sure below exists:
    hostNetwork: true
    

Clustermesh Setup

Specify unique cluster name and ID

For each cluster, make name/ ID unique.

kubectl -n kube-system edit cm cilium-config
[ ... add/edit ... ]
cluster-name: cluster1
cluster-id: "1"

Expose the Cilium etcd to other clusters

Create etcd external service for each cluster, apply below yaml file.

apiVersion: v1
kind: Service
metadata:
  name: cilium-etcd-external
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
spec:
  type: LoadBalancer
  ports:
  - port: 2379
  selector:
    app: etcd
    etcd_cluster: cilium-etcd
    io.cilium/app: etcd-operator

Extract the TLS keys and generate the etcd configuration

  • Clone the cilium/clustermesh-tools repository.
    $git clone https://github.com/cilium/clustermesh-tools.git
    $cd clustermesh-tools
    
  • Extract the TLS certificate, key and root CA authority for each cluster.
    $./extract-etcd-secrets.sh
    
  • Repeat this step for all clusters, copy the result to same folder.
  • Generate a single Kubernetes secret from all the keys and certificates extracted.
    $./generate-secret-yaml.sh > clustermesh.yaml
    
  • Ensure that the etcd service names can be resolved.
    $./generate-name-mapping.sh > ds.patch
    
    #The ds.patch will like:
    #spec:
    #  template:
    #    spec:
    #      hostAliases:
    #      - ip: "10.138.0.18"
    #        hostnames:
    #        - cluster1.mesh.cilium.io
    #      - ip: "10.138.0.19"
    #        hostnames:
    #        - cluster2.mesh.cilium.io
    
    #Apply it
    $kubectl -n kube-system patch ds cilium -p "$(cat ds.patch)"
    
  • Apply clustermesh.yaml, prepared secrets/ keys
    kubectl -n kube-system apply -f clustermesh.yaml
    
  • Restart all pods for cilium, cilium-operator, cilium-etcd, etcd-operator, coredns
    kubectl -n kube-system delete pods --all
    
  • Wait 10 - 40 minutes, until all pods works and not restart again.
  • Check cilium node list can get full list of nodes discovered.
    tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ k exec -ti cilium-scwh8 -n ks -- cilium node list
    Name                                                  IPv4 Address     Endpoint CIDR   IPv6 Address   Endpoint CIDR
    ctryc2/ip-10-248-11-160.us-west-2.compute.internal    10.248.11.160    10.160.0.0/16                  
    ctryc2/ip-10-248-5-152.us-west-2.compute.internal     10.248.5.152     10.136.0.0/16                  
    ctryc2/ip-10-248-7-35.us-west-2.compute.internal      10.248.7.35      10.35.0.0/16                   
    ctryc3/ip-192-168-36-48.us-west-2.compute.internal    192.168.36.48    10.48.0.0/16                   
    ctryc3/ip-192-168-4-254.us-west-2.compute.internal    192.168.4.254    10.254.0.0/16                  
    ctryc3/ip-192-168-67-107.us-west-2.compute.internal   192.168.67.107   10.107.0.0/16                  
    ctryc3/ip-192-168-84-58.us-west-2.compute.internal    192.168.84.58    10.58.0.0/16 
    

Test

  • One cluster pod can access another cluster pod through IP

    # Pods IP in one cluster (tryc2)
    tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ k get pods -n legacy -o wide
    NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
    httpbin-5446f4d9b4-x4jfw   1/1     Running   0          18h   10.248.0.25    ip-10-248-5-152.us-west-2.compute.internal   <none>           <none>
    sleep-5bbf6b4f77-5zp2t     1/1     Running   0          18h   10.248.6.155   ip-10-248-7-35.us-west-2.compute.internal    <none>           <none>
    
    # Pods IP in another cluster (tryc3)
    tryc3@ip-172-31-0-31:~/clustermesh-tools$ k get pods -n legacy -o wide
    NAME                       READY   STATUS    RESTARTS   AGE   IP               NODE                                          NOMINATED NODE   READINESS GATES
    httpbin-5446f4d9b4-6hvnb   1/1     Running   0          18h   192.168.13.185   ip-192-168-4-254.us-west-2.compute.internal   <none>           <none>
    sleep-5bbf6b4f77-v7dfq     1/1     Running   0          18h   192.168.68.184   ip-192-168-84-58.us-west-2.compute.internal   <none>           <none>
    
    # The pod in cluster tryc2 can access the pod ip in cluster tryc3 directly
    tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ k exec -ti sleep-5bbf6b4f77-5zp2t  -n legacy -- curl 192.168.13.185/ip
    {
      "origin": "10.248.7.35"
    }
    
    # The pod in cluster tryc3 can access the pod ip in cluster tryc2 directly
    tryc3@ip-172-31-0-31:~/clustermesh-tools$ k exec -ti sleep-5bbf6b4f77-v7dfq  -n legacy -- curl 10.248.0.25/ip
    {
      "origin": "192.168.84.58"
    }
    
  • Support global service, point to the pods cross two clusters. You can implement this through add io.cilium/global-service: "true" in annotations.

    # Deploy in cluster1
    kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.7.1/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml
    
    # Deploy in cluster2
    kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.7.1/examples/kubernetes/clustermesh/global-service-example/cluster2.yaml
    
    # Access the service, you can get response from pods which are in different clusters
    tryc3@ip-172-31-0-31:~/clustermesh-tools$ k exec -ti x-wing-5db7fc5c8f-2xhxj -- curl rebel-base
    {"Galaxy": "Alderaan", "Cluster": "Cluster-2"}
    tryc3@ip-172-31-0-31:~/clustermesh-tools$ k exec -ti x-wing-5db7fc5c8f-2xhxj -- curl rebel-base
    {"Galaxy": "Alderaan", "Cluster": "Cluster-1"}
    

Debug

  • Refer to http://docs.cilium.io/en/stable/gettingstarted/clustermesh/#troubleshooting Use similar command to make sure each cilium pods works well
    tryc2@ip-172-31-0-31:~/istio-install/istio-1.3.8$ kubectl get pods -n kube-system -l  k8s-app=cilium |grep cilium|awk '{print $1}'|xargs -i sh -c 'echo "\n";kubectl logs {} -n kube-system|grep " remote "'
    
    level=info msg="New remote cluster configuration" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
    level=info msg="Connection to remote cluster established" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
    level=info msg="Established connection to remote etcd" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus="etcd: 1/1 connected, lease-ID=1cd771353dce2904, lock lease-ID=1cd771353dce2906, has-quorum=true: https://ctryc3.mesh.cilium.io:2379 - 3.3.12 (Leader)" subsys=clustermesh
    
    level=info msg="New remote cluster configuration" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
    level=info msg="Connection to remote cluster established" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
    level=info msg="Established connection to remote etcd" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus="etcd: 1/1 connected, lease-ID=6da571353d29d95b, lock lease-ID=6da571353d29d95d, has-quorum=true: https://ctryc3.mesh.cilium.io:2379 - 3.3.12 (Leader)" subsys=clustermesh
    
    level=info msg="New remote cluster configuration" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
    level=info msg="Connection to remote cluster established" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus= subsys=clustermesh
    level=info msg="Established connection to remote etcd" clusterName=ctryc3 config=/var/lib/cilium/clustermesh/ctryc3 kvstoreErr="<nil>" kvstoreStatus="etcd: 1/1 connected, lease-ID=73cd71353d503821, lock lease-ID=73cd71353d503823, has-quorum=true: https://ctryc3.mesh.cilium.io:2379 - 3.3.12" subsys=clustermesh
    
  • If you find one cilium not healty, restart it manually.

Concern/ Need Improve

I think this is may focus on POC, still need improve for production environment.

  1. etcd nodes operated by the etcd-operator will not use persistent storage. Once the etcd cluster looses quorum, the etcd cluster is automatically re-created by the cilium-etcd-operator. Cilium will automatically recover and re-create all state in etcd. This operation can take couple of seconds John: may take even 10 minutes to recovery after all kube-system pods restart and may cause minor disruptions as ongoing distributed locks are invalidated and security identities have to be re-allocated. (http://docs.cilium.io/en/stable/gettingstarted/k8s-install-etcd-operator/#limitations)
  2. The cilium pod may unhealthy, or cannot join in clustermesh sometimes, until you restart it manually.

Referance

⚠️ **GitHub.com Fallback** ⚠️