Deploy FATE on TKG with KubeFATE - FederatedAI/KubeFATE GitHub Wiki

The TKG (Tanzu Kubernetes Grid) cluster is a platform base on the open-source Kubernetes that is built, signed and supported by VMware. The Tanzu Kubernetes cluster can be configured and run on the supervisor cluster by using the Tanzu Kubernetes Grid service. The supervisor cluster is a vSphere cluster enabled vSphere with Tanzu.

Then, let's start the tutorial.

Environment

item version
Kubernetes v1.18.15+vmware.1
KubeFATE v1.6.0-a

Cluster Planning

Work List

Because we will install FATE on two clusters but all operations will be performed on the same host. So, two working directories respectively need to be created, namely PartyA and PartyB.

(PartyA)$ # This represents the command running in PartyA
(PartyB)$ # This represents the command running in PartyB
$ # This represents runs simultaneously in two working directories

Information

Item PartyA PartyB
PartyID 9999 10000
KubeFATE serviceurl partya.example.com partyb.example.com
Kubernetes context tkc-1 tkc-2
ingress IP 192.168.18.131 192.168.20.135

Kubernetes context is obtained by logging in to TKC, the ingress IP is installed after installing ingress-controller View.

Prepare

Check Environment

Here, we prepared two TKC cluster, use kubecctl vsphere login login to see the k8s version.

(PartyA)$ kubectl --context=tkc-1 version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.15+vmware.1", GitCommit:"9a9f80f2e0b85ce6280dd9b9f1e952a7dbf49087", GitTreeState:"clean", BuildDate:"2021-01-19T22:59:52Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
(PartyB)$ kubectl --context=tkc-2 version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.15+vmware.1", GitCommit:"9a9f80f2e0b85ce6280dd9b9f1e952a7dbf49087", GitTreeState:"clean", BuildDate:"2021-01-19T22:59:52Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Prepare Image

If you cannot access dockerhub, you need to download the image and upload it to the corresponding worknode or your own harbor. We use dockerhub to download directly for this installation.

Prepare chart

Since we can directly access GitHub, we don't need to download the chart manually. If it is an offline environment, you need to download and upload the chart file, fate-v1.6.0-a.tgz

Check k8s support ingress-controller

Install an ingress-controller (for example ingress-nginx). After the installation is complete, obtain the ingress IP through kubectl get svc -n ingress-nginx.

(PartyA)$ kubectl --context=tkc-1 get svc -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   10.108.42.103   192.168.18.131   80:32250/TCP,443:32437/TCP   1d
ingress-nginx-controller-admission   ClusterIP      10.99.180.187   <none>           443/TCP                      1d
(PartyB)$ kubectl --context=tkc-2 get svc -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   10.102.226.6     192.168.20.135   80:30036/TCP,443:30941/TCP   1d
ingress-nginx-controller-admission   ClusterIP      10.104.177.237   <none>           443/TCP                      1d

Begin

After the environment setup is finished, we can start to install KubeFATE and FATE on TKG.

Install KubeFATE

Download the installation package file.

$ curl -LO https://github.com/FederatedAI/KubeFATE/releases/download/v1.6.0/kubefate-k8s-v1.6.0.tar.gz && tar -xzf ./kubefate-k8s-v1.6.0.tar.gz
$ ls
cluster-serving.yaml  cluster-spark.yaml  cluster.yaml  config.yaml  examples  kubefate  kubefate-k8s-v1.6.0.tar.gz  kubefate.yaml  rbac-config.yaml
Install KubeFATE command line

Install the KubeFATE command tool.

$ chmod +x ./kubefate && sudo mv ./kubefate /usr/bin

*KubeFATE command line tool is written in golang, you can also compile it yourself. *

Install KubeFATE service

Install KubeFATE service on two TKC kubernetes respectively.

(PartyA)$ kubectl --context=tkc-1 apply -f ../rbac-config.yaml
(PartyB)$ kubectl --context=tkc-2 apply -f ../rbac-config.yaml

modify KubeFATE serviceurl

(PartyA)$ cat ./kubefate.yaml
...
spec:
  rules:
    - host: partya.example.com
      http:
        paths:
...
(PartyB)$ cat ./kubefate.yaml
...
spec:
  rules:
    - host: partyb.example.com
      http:
        paths:
...

Write to hosts file

(PartyA)$ echo "192.168.18.131 partya.example.com" >> /etc/hosts
(PartyB)$ echo "192.168.20.135 partyb.example.com" >> /etc/hosts

If you use a private mirror warehouse, you need to modify the mirror-related fields in kubefate.yaml.

Modify the serviceurl of config.yaml.

(PartyA)$ cat config.yaml
# TODO
# persistent layer

log:
  level: info
user:
  username: admin
  password: admin

serviceurl: partya.example.com
(PartyB)$ cat config.yaml
# TODO
# persistent layer

log:
  level: info
user:
  username: admin
  password: admin

serviceurl: partyb.example.com
Check KubeFATE

Use kubefate version to check whether the KubeFATE environment is installed.

(PartyA)$ kubefate version
* kubefate commandLine version=v1.4.1
* kubefate service version=v1.4.1
(PartyB)$ kubefate version
* kubefate commandLine version=v1.4.1
* kubefate service version=v1.4.1

If kubefate service version= appears, the installation is successful.

Install FATE

The KubeFATE environment of both parties has been completed. Then, FATE can be installed.

Config cluster.yaml

Our TKC supports LoadBalancer, so our FATE is exposed through LoadBalancer.

(PartyA)$ cat cluster.yaml
name: fate-9999
namespace: fate-9999
chartName: fate
chartVersion: v1.6.0-a
partyId: 9999
registry: ""
imageTag: ""
pullPolicy: 
imagePullSecrets: 
- name: myregistrykey
persistence: false
istio:
  enabled: false
podSecurityPolicy:
  enabled: true                  # The TKC cluster turns on podSecurityPolicy authentication by default, you need to configure true here.
modules:
  - rollsite
  - clustermanager
  - nodemanager
  - mysql
  - python
  - fateboard
  - client

backend: eggroll

rollsite: 
  type: LoadBalancer
  nodePort: 30091
(PartyB)$ cat cluster.yaml
name: fate-10000
namespace: fate-10000
chartName: fate
chartVersion: v1.6.0-a
partyId: 10000
registry: ""
imageTag: ""
pullPolicy: 
imagePullSecrets: 
- name: myregistrykey
persistence: false
istio:
  enabled: false
podSecurityPolicy:
  enabled: true                  # The TKC cluster turns on podSecurityPolicy authentication by default, you need to configure true here.
modules:
  - rollsite
  - clustermanager
  - nodemanager
  - mysql
  - python
  - fateboard
  - client

backend: eggroll

rollsite: 
  type: LoadBalancer
  nodePort: 30101
Deploy cluster.yaml

Create the corresponding namespace before deploying FATE.

(PartyA)$ kubectl --context=tkc-1 create namespace fate-9999
(PartyB)$ kubectl --context=tkc-2 create namespace fate-10000

Then use kubefate command to deploy.

(PartyA)$ kubefate cluster install -f cluster.yaml
(PartyB)$ kubefate cluster install -f cluster.yaml

Wait for the deployment to succeed.

(PartyA)$ kubefate job describe <jobID>   # You will get <jobID> when you install cluster in the previous step
(PartyB)$ kubefate job describe <jobID>   # You will get <jobID> when you install cluster in the previous step

Wait for the job status to become Success, indicating that the deployment is success.

You can also use kubefate cluster list, kubefate cluster describe <clusterID> to check the status of the cluster. Running indicating that the deployment is successful.

Interworking configuration

Since LoadBalancer's IP is allocated by LoadBalancer service in real time, it is necessary to configure each other's address information after installing FATE,

(PartyA)$ kubectl --context=tkc-1 get svc/rollsite -n fate-9999
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE
rollsite      LoadBalancer   10.103.75.93     192.168.18.132  9370:30091/TCP   12m
(PartyB)$ kubectl --context=tkc-2 get svc/rollsite -n fate-10000
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE
rollsite      LoadBalancer   10.99.31.113     192.168.20.136  9370:30101/TCP   12m

Obtain the LoadBalancer IP of the rollsite for both parties through the above command.

Configure each other's information in cluster.yaml,

(PartyA)$ cat cluster.yaml
...
rollsite: 
  type: LoadBalancer
  nodePort: 30091
  partyList:
  - partyId: 10000
    partyIp: 192.168.20.136
    partyPort: 9370
(PartyB)$ cat cluster.yaml
...
rollsite: 
  type: LoadBalancer
  nodePort: 30101
  partyList:
  - partyId: 9999
    partyIp: 192.168.18.132
    partyPort: 9370

Then update the configuration information,

(PartyA)$ kubefate cluster update -f cluster.yaml
(PartyB)$ kubefate cluster update -f cluster.yaml

Waiting for the job status to be Success or the cluster status to Running indicates that the update is successful.

Test the intercommunication status of both parties

Run toy_example test,

(PartyA)$ kubectl --context=tkc-1 exec -it svc/fateflow -c python -n fate-9999 -- bash
(app-root) bash-4.2# cd /data/projects/fate/examples/toy_example/
(app-root) bash-4.2# python run_toy_example.py 9999 10000 1
...

Finally, the log appears similar to success to calculate secure_sum, it is 2000.0000000000002, means the toy_example interoperability test was successful.

View FATEboard

We deployed the fateboard component in the previous deployment, so by default, you can view the fateboard page by visiting http://.fateboard.example.com.

You need to write the hosts file before viewing,

(PartyA)$ echo "192.168.18.131 party9999.fateboard.example.com" >> /etc/hosts
(PartyB)$ echo "192.168.20.135 party10000.fateboard.example.com" >> /etc/hosts

Then you can view the fateboard page page through the URL.

tkg_fate_board

View Notebook

The notebook page is similar to FATEBoard. We also deployed the client component in the previous deployment. By default, you can view the notebook page by visiting http://.notebook.example.com.

You need to write the hosts file before viewing,

(PartyA)$ echo "192.168.18.131 party9999.notebook.example.com" >> /etc/hosts
(PartyB)$ echo "192.168.20.135 party10000.notebook.example.com" >> /etc/hosts

Then you can view the fateboard page page through the URL.

tkg_notebook

Customize UI URL

The previous configuration uses the default URL. We can use a customize URL through configuration. Similar to the following configuration:

(PartyA)$ cat cluster.yaml
...
host:
  fateboard: party9999.fateboard.vmware.com
  client: party9999.notebook.vmware.com
...
(PartyB)$ cat cluster.yaml
...
host:
  fateboard: party10000.fateboard.vmware.com
  client: party10000.notebook.vmware.com
...

Then configure the hosts file,

(PartyA)$ echo "192.168.18.131 party9999.notebook.vmware.com" >> /etc/hosts
(PartyB)$ echo "192.168.20.135 party10000.notebook.vmware.com" >> /etc/hosts

Then we modify kubefate cluster update to update the cluster configuration. After the update is complete, you can access the UI interface through a custom URL.

⚠️ **GitHub.com Fallback** ⚠️