Deploy FATE Clusters with KubeFATE on Kubernetes - FederatedAI/KubeFATE GitHub Wiki

Overview

Introduction to FATE

FATE, an open source project initiated by WeBank's AI Department, aims to provide a secure computing framework to support federated AI ecosystems. It implements a variety of secure computing protocols for big data collaboration in line with data protection regulations. With a modular scalable modeling pipeline, a clear visual interface, and a flexible scheduling system, FATE is ready-to-access out-of-box and has superior operational performance.

Introduction to Kubernetes

Kubernetes, an open source container orchestration engine from Google, supports automated deployment, scalability, and application container management. When deploying an application in a production environment, you usually need to deploy multiple instances of the application to balance the load of application requests.

In Kubernetes, we can create multiple containers, run an application instance in each container, then manage, discover and access this group of application instances through their built-in load balancing strategy. These details do not require complex manual configuration and processing by O&M personnel.

Why Do We Deploy FATE on Kubernetes?

With the use of federated learning, the training set and model will gradually become larger over time In the production environment, we will encounter the following problems:

How does a FATE cluster adapt to various security and compliance requirements within an enterprise organization, as well as within IT environments such as networks, secure domains, etc.;
A single server can no longer support the compute demands of federated learning - so how can we deploy multiple compute nodes and manage them easily;
If problems arise in some nodes, do they have self-healing capabilities to ensure the service reliability;
Can horizontal scaling be implemented to adapt to business growth;
Can FATE versions be properly managed for upgrading;
Can there be different federated clusters within one organization, and how can we manage multiple clusters to meet the needs of different businesses, partners, and application scenarios.

Kubernetes is by far the most popular infrastructure platform. Many implementations have proved that Kubernetes is very suitable as a platform for O&M of a large-scale distributed system within an enterprise. As of the end of 2019, half of all big data loads were running on Kubernetes, according to statistics from Ovum. Our team also recommends Kubernetes as a platform for running FATE federated learning cluster production environment. KubeFATE provides a solution for deploying FATE on Kubernetes and subsequent O&M.

Prerequisites

Before learning how to deploy FATE on Kubernetes, you may want to read up on the following:

You may opt to skip this part if you already have a good understanding of Kubernetes and FATE.

Kubernetes Knowledge Base

Kubernetes Basics
pod, service, deployment, namespace, PVC, PV, Ingress, nodeSelector, NodePort.
Kubernetes RBAC

FATE Knowledge Base

FATE Basics
FATE components: fateflow, fateboard, rollsite, nodemanager, clustermanager
FATE examples

KubeFATE

KubeFATE is a support project for containerized deployment of FATE, mainly including FATE's docker-compose deployment and Kubernetes deployment, Notebook support, etc.

Why Do We Use KubeFATE

KubeFATE is mainly developed on open source contribution of VMware China R&D Center's laboratory, WeBank, GitHub users, and so on. With the development of containerization technology, related technologies are becoming more and more mature, and many excellent projects have emerged, such as Kubernetes, Docker, Harbor, etc. The application of Containerized deployment has solved many deployment problems and greatly improved R&D and O&M efficiency. Containerized deployment will be an important tool for O&M in the future. Kubernetes is the first to implement containerized deployment in production environments. KubeFATE will be your first choice to deploy FATE.

Introduction to KubeFATE

KubeFATE is an implementation of FATE's Kubernetes deployment. KubeFATE uses golang for development. It implements Kubernetes operations through a server service deployed on Kubernetes, allowing FATE deployments from outside the cluster, and enabling easy and fast FATE cluster deployment and O&M through simple command lines.

The project URL of KubeFATE is https://github.com/FederatedAI/KubeFATE/tree/master/k8s-deploy.

KubeFATE Deployment Architecture

Knowledge Related to KubeFATE

FATE can be deployed by KubeFATE through the command line or Rest API. The operable resources are as follows:

Cluster

Cluster is the main resource of KubeFATE. Every successful FATE deployment by KubeFATE generates a cluster, and each cluster corresponds to a group of Kubernetes resources, including two types, namely FATE (Training) and FATE-Serving.

There are five primary command line and API operations: install, update, delete, describe, and list.

Job

Job is the intermediate resource generated while KubeFATE deploys a cluster. It is responsible for completing three types of corresponding operations of the cluster on Kubernetes: install, update and delete.

The basic execution process is divided into four steps: 1. generating job metadata, 2. executing cluster operations of the corresponding type, 3. checking whether operations are successful, and 4. updating job status.

There are mainly three command line and API operations: list, delete, and describe.

Chart

Chart is a Yaml template file for KubeFATE to store different types and versions of FATE. It is a superset of Helm Chart. Compared with an ordinary Helm Chart, it has an additional value-template file. All chart files can be downloaded from https://github.com/FederatedAI/KubeFATE/tree/gh-pages/package.

There are mainly three command line and API operations: upload, list, and delete.

User

This is KubeFATE's representation of command line authentication information.

Deployment Process

Deploy KubeFATE

Download KubeFATE Installation Package

KubeFATE installation package can be downloaded from GitHub's KubeFATE release (https://github.com/FederatedAI/KubeFATE/releases)

$ version=v1.5.0

# Download the corresponding version of KubeFATE
$ wget https://github.com/FederatedAI/KubeFATE/releases/download/${version}/kubefate-k8s-${version}.tar.gz

Unzip and install for use

$ mkdir -p kubefate
$ tar -zxvf -C kubefate kubefate-k8s-${version}.tar.gz 
$ chmod +x ./kubefate && sudo mv ./kubefate /usr/bin

Unzip to access these files

$ ls
cluster-serving.yaml cluster-spark.yaml  cluster.yaml  config.yaml  examples  kubefate  kubefate.yaml  rbac-config.yaml

Install KubeFATE server

You need to deploy KubeFATE server on Kubernetes before using KubeFATE.

This part contains the namespace and RBAC permission of KubeFATE server. The official default permission is cluster-admin. If you are familiar with the RBAC mechanism of kubernetes, you may modify it yourself.

It also contains a secret key used by KubeFATE, MySQL's username and password, and KubeFATE's username and password. It is recommended to modify them before deployment.

$ kubectl apply -f rbac-config.yaml

Next, you can deploy KubeFATE server.

This deployment consists of two parts: KubeFATE and MariaDB (MySQL), with a total of 5 Kubernetes components, namely Deployment and Service for both KubeFATE and MariaDB, and Ingress of KubeFATE.

$ kubectl apply -f kubefate.yaml
deployment.apps/kubefate created
deployment.apps/mariadb created
service/mariadb created
service/kubefate created
ingress.networking.k8s.io/kubefate created

You can see that the 5 Kubernetes components were successfully created. If the pod created by the deployment is in normal status, KubeFATE server has been successfully deployed.

KubeFATE Command Line Connection

KubeFATE command line is the API call implementation of KubeFATE server. From the architecture diagram, we can see that FATE is deployed by KubeFATE via KubeFATE server calling the Kubernetes API, and the communication between the KubeFATE command line and KubeFATE server is implemented through the URL, example.com as disclosed by Ingress, so the KubeFATE command line can be used on any machine that can access Ingress, and you only need to configure the hosts file.

Example:

$ echo "192.168.100.123 example.com"  >> /etc/hosts

192.168.100.123 is the IP address of Ingress.

Use the command kubefate version to check for connectivity.

$ kubefate version
* kubefate service version=v1.2.0
* kubefate commandLine version=v1.2.0

If any error occurs, there are generally two causes:

There is no ingress-controller;
KubeFATE's pod did not run successfully (It will take some time to initialize the database for the first time).

When KubeFATE has been deployed, you can use it to deploy FATE.

Using KubeFATE to Deploy FATE

Using the command kubefate cluster install, you can use KubeFATE to install a specific FATE cluster. Using the parameter --help, you can make better use of KubeFATE's commands.

Before installation, parameters of the cluster can be implemented by configuring cluster.yaml. For configuration details, you may refer to introduction to configuration .

$ kubefate cluster install -f cluster.yaml
create job success, job id=94107328-958e-4fa6-8fa7-9a2b975114de

KubeFATE will generate a job each for cluster installations, changes and deletions. you can use kubefate job describe <job_id> to view the progress of the corresponding operation.

FATE cluster is a general term for FATE, which includes FATE (Training) and FATE-Serving.

KubeFATE has three types of jobs: install, update, and delete.

Install: Create a cluster

First creates a job record in the database, then creates a cluster record before checking whether there is a corresponding version of chart in the database (If not, download the corresponding version of chart and store it in the database); finally, call helm to install the cluster, and update job record and cluster record once the installation is complete.
Update: Updatea cluster

First creates a job record in the database, then creates a cluster record before checking whether there is an updated corresponding version of chart of cluster available in the database (If not, download the corresponding version of chart and store it in the database); finally, call helm to update the cluster, and update job record and cluster record once the update is complete.
Delete: Delete a cluster

First creates a job record in the database, then changes the cluster status before calling helm to delete the cluster; finally, delete the cluster record, and update job record once the deletion is complete.

Verify Deployment

View job Status

By viewing the job'sinformation, you will know the progress of the corresponding FATE cluster installation

$ kubefate job describe 94107328-958e-4fa6-8fa7-9a2b975114de
UUID     	94107328-958e-4fa6-8fa7-9a2b975114de
StartTime	2020-11-25 03:03:41
EndTime  	2020-11-25 03:05:38
Duration 	117s
Status   	Success
Creator  	admin
ClusterId	9e693e93-bf2a-4229-8485-ea922ed33dcf
Result   	Cluster install success
SubJobs  	mysql                PodStatus: Running, SubJobStatus: Success, Duration:    83s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:05:05
         	nodemanager-0        PodStatus: Running, SubJobStatus: Success, Duration:    11s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:03:53
         	nodemanager-1        PodStatus: Running, SubJobStatus: Success, Duration:    11s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:03:53
         	python               PodStatus: Running, SubJobStatus: Success, Duration:   116s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:05:38
         	rollsite             PodStatus: Running, SubJobStatus: Success, Duration:    11s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:03:53
         	client               PodStatus: Running, SubJobStatus: Success, Duration:   116s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:05:38
         	clustermanager       PodStatus: Running, SubJobStatus: Success, Duration:    11s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:03:53
         	fateboard            PodStatus: Running, SubJobStatus: Success, Duration:   116s, StartTime: 2020-11-25 03:03:41, EndTime: 2020-11-25 03:05:38

The deployment is complete when the job's status changes to Success.

subjob represents the status of the underlying sub-jobs of each component in the current job.

View FATE Cluster Information

Use the command kubefate cluster describe <cluster_id> to view the deployed FATE cluster's information

$ kubefate cluster describe 9e693e93-bf2a-4229-8485-ea922ed33dcf
UUID        	9e693e93-bf2a-4229-8485-ea922ed33dcf
Name        	fate-10000                          
NameSpace   	fate-10000                          
ChartName   	fate                                
ChartVersion	v1.5.0                              
Revision    	1                                   
Age         	9m3s                                
Status      	Running                             
Spec        	backend: eggroll                    
            	chartName: fate                     
            	chartVersion: v1.5.0                
            	istio:                              
            	  enabled: false                    
            	modules:                            
            	- rollsite                          
            	- clustermanager                    
            	- nodemanager                       
            	- mysql                             
            	- python                            
            	- fateboard                         
            	- client                            
            	name: fate-10000                    
            	namespace: fate-10000               
            	partyId: 10000                      
            	persistence: false                  
            	pullPolicy: null                    
            	python:                             
            	  grpcNodePort: 30102               
            	  httpNodePort: 30107               
            	  type: NodePort                    
            	registry: ""                        
            	rollsite:                           
            	  nodePort: 30101                   
            	  partyList:                        
            	  - partyId: 9999                   
            	    partyIp: 192.168.9.1         
            	    partyPort: 30091                
            	  type: NodePort                    
            	servingIp: 192.168.10.1             
            	servingPort: 30105                  
            	                                    
Info        	dashboard:                          
            	- party10000.notebook.example.com       
            	- party10000.fateboard.example.com      
            	ip: 192.168.10.2                   
            	pod:                                
            	- clustermanager-76bb7d4dd4-hhpw6   
            	- mysql-57b7d859bc-pw4x5            
            	- nodemanager-0-8d85fd46c-pwcz2     
            	- nodemanager-1-6d67b96bc-qp4bx     
            	- python-9c857bbcc-lgx2d            
            	- rollsite-6b685d468d-bcrzw         
            	status:                             
            	  modules:                          
            	    client: Running                 
            	    clustermanager: Running         
            	    fateboard: Running              
            	    mysql: Running                  
            	    nodemanager-0: Running          
            	    nodemanager-1: Running          
            	    python: Running                 
            	    rollsite: Running

When Status is "Running", the deployed FATE is running normally.

Other cluster information:

Name, NameSpace, ChartName, and ChartVersion are basic information corresponding to the configuration file's fields.
Status represents the status of the deployed FATE ("Running" means that it is running normally)
Revision represents the number of updates, where successful creation is also counted as an update.
Spec corresponds to the cluster.yaml at the time of deployment
Info is information that is unique to the current FATE
- dashboard is the ingress point included in the FATE deployment
- ip represents the IP address of a Node from NodePort that can be used by Kubernetes
- pod represents all the pods in Kubernetes in the current FATE cluster
- status represents the status of all containers in the current FATE cluster

Check if FATE is Running Normally

To check whether FATE can run some test jobs of FATE, you may refer to FATE examples or usage of Notebook for details.

FATE Interconnection

The implementation of FATE federated learning depends on data exchange between multiple parties. There are two multi-party interconnection modes, namely P2P mode (net deployment mode) and exchange mode (star deployment mode).

P2P mode (net deployment mode)

direct

Exchange (star deployment mode):

The external connection information of a Party contains three items:

PartyID

partyID must be specified at the time of FATE deployment,
IP Address

rollsite is exposed externally through NodePort, so the IP address is the NodeIP (you can also get Info.ip by viewing the cluster's specific information),
Port

It is the configured rollsite.nodePort.

P2P Mode

P2P mode means that, within a federated network, a host Party contains the cluster entry information (i.e., PartyID, IP Address, and Port) of all the client parties it needs to connect to, and the client Parties must also contain the host's information.

When a new Party wants to join the network, it must configure a unique PartyID in the network, and add the information of all the parties it needs to connect to into the configuration item of its rollsite.partyList , and the corresponding Party must also add the information of the new Party into its rollsite.partyList.

Exchange Mode

Also known as star deployment mode, it means that all parties only need to configure the information of the exchange cluster and thus can connect to other parties through exchange. Exchange is responsible for managing the cluster information of all parties

If you use exchange mode for deployment, you only need to configure rollsite.exchange to connect to the exchange cluster. The exchange cluster needs to be configured with the information of all parties (exchange cluster configuration).

Spark Mode

When FATE uses the Spark compute engine, the cluster will use a different connection mode. This is similar to P2P mode, except that the interconnection includes two additional components, nginx and rabbitmq.

When FATE uses Spark to connect the parties, it is necessary to configure the route_table of nginx and the route_table of rabbitmq.

The route_table of nginx needs to be configured with nginx of the corresponding party's cluster and grpcNodePort of python.
The route_table of rabbitmq needs to be configured with the corresponding party's rabbitmq.

As of the current version v1.5.0, FATE on Spark does not support exchange mode.

Introduction to the Configuration Settings for FATE deployment by KubeFATE

KubeFATE can be used to deploy two types of clusters, FATE (Training) and FATE-Serving. The deployment configuration file is in YAML format.

Common Portions

name: cluster name, no duplicate names allowed
namespace: corresponding to namespace resources of Kubernetes. Currently, we recommend deploying the resources of only the FATE cluster under a single namespace when deploying by KubeFATE
chartName: type of FATE cluster, including fate and fate-serving
chartVersion: version of FATE cluster. More versions are available at https://github.com/FederatedAI/KubeFATE/tree/gh-pages/package
partyId: a FATE term used to identify different clusters
registry: image registry, which is Docker hub by default, or other registry, for example to configure a Chinese registry address registry: "hub.c.163.com/federatedai"
pullPolicy: Kubernetes image resource pulling policy, which is IfNotPresent by default if left unfilled
persistence: whether the cluster supports data persistence
istio: enable istio or not; (what is istio?)
modules: KubeFATE supports modular deployment, where you can select different modules for deployment

For other configurations, you can select the different configurations according to the different deployment modes.

FATE (Training) Configuration Settings

Deploy FATE (Training) cluster.

Required components include [python, mysql]

If using the eggroll engine, you will also need [rollsite, clustermanager, nodemanager]

If using the Spark engine, you will also need [spark, hdfs, nginx, rabbitmq]

Optional components [fateboard, client]

FATE supports two computing engines, eggroll and Spark. The following is the detailed description of the two configurations.

backend: Computing engines (eggroll, and Spark) used by FATE
python: some configuration settings for fateflow
- type: exposure mode for the fateflow service port, corresponding to the service type of Kubernetes
- httpNodePort: if NodePort is selected above, this is the configuration of fateflow's http port
- grpcNodePort: if NodePort is selected above, this is the configuration of fateflow's grpc port
- nodeSelector: assign Pod to a certain node, nodeselector
- spark: configuration of [FATE on Spark] (FATE on Spark)
- hdfs: configuration of [FATE on Spark] (FATE on Spark)
- rabbitmq: configuration of [FATE on Spark] (FATE on Spark)
- nginx: configuration of [FATE on Spark] (FATE on Spark)
mysql: some configuration settings for mysql (do not configure this if using an external mysql)
- ip: internal IP address of mysql's kubernetes(do not modify)
- port: port of mysql (do not modify)
- database: database name of mysql used by FATE
- user: username of mysql
- password: password of mysql
- subPath: persistent path
- existingClaim: use existing PVC or not
- storageClass: persistent storageClass
- accessMode: ReadWriteOnce
- size: size of PV
- nodeSelector: assign Pod to a certain node, nodeselector

Configure these settings if you use an external mysql server

externalMysqlIp: IP address of mysql
externalMysqlPort: port of mysql
externalMysqlDatabase: database name of mysql
externalMysqlUser: username of mysql
externalMysqlPassword: password of mysql

Configure these settings only if you need to connect to FATE-Serving

servingIp: IP address for the servingServer of FATE-Serving
servingPort: port for the servingServer of FATE-Serving

FATE on eggroll

If using the eggroll compute engine, in addition to basic components, you will also need to install the [rollsite, clustermanager, nodemanager] components.

When cluster.yaml is used, a FATE cluster of FATE on eggroll is deployed by default.

The default deployment implementation, represented on Kubernetes, has the following resources:

Kubernetes Component	Resource Instances
Service	clustermanager, fateboard, fateflow, fateflow-client, mysql, nodemanager-0, nodemanager-1, notebook, rollsite
Deployment	clustermanager, mysql, nodemanager-0, nodemanager-1, python, rollsite
Ingress	client, fateboard
ConfigMap	eggroll-config, fateboard-config, mysql-config, nodemanager-0-config, nodemanager-1-config, python-config, rollsite-config

The following are the component configurations required to use the eggroll compute engine

rollsite: some configuration settings for the rollsite component
- type: exposure mode for the rollsite port, corresponding to the service type of Kubernetes
- nodePort: if NodePort is selected above, you can configure the specific port number here. The default range of Kubernetes is (30000-32767)
- exchange: information of the exchange (i.e., IP address and port) that rollsite connects to
- partyList: information of the other party connected by FATE. If you want to connect to a FATE that has been deployed by KubeFATE, you can view FATE cluster information based on the above to get partyId, NodePort, and NodeIP.
- nodeSelector: assign Pod to a certain node, nodeselector
nodemanager: some configuration settings for the nodemanager component
- count: number of nodemanagers deployed
- sessionProcessorsPerNode: sessionProcessorsPerNode configuration of nodemanager
- subPath: persistent path of nodemanager
- storageClass: persistent storageClass
- existingClaim: use existing PVC or not
- accessMode: access mode
- size: size of PV required
- nodeSelector: assign Pod to a certain node, nodeselector
clustermanager: configuration of the component nodemanager
- nodeSelector: assign Pod to a certain node, nodeselector

FATE on Spark

If using the Spark compute engine, in addition to basic components, you will also need to install the [spark, hdfs, nginx, rabbitmq] components.

When cluster-spark.yaml is used, a FATE cluster of FATE on Spark is deployed by default.

The default deployment implementation, represented on Kubernetes, has the following resources:

Kubernetes Component	Resource Instances
Service	fateboard, fateflow, fateflow-client, mysql, namenode, datanode, nginx, notebook, rabbitmq, spark-master, spark-worker-1
Deployment	datanode, mysql, namenode, nginx, python, rabbitmq, spark-master, spark-worker
Ingress	client, fateboard, rabbitmq, spark
ConfigMap	datanode-env, eggroll-config, fateboard-config, mysql-config, namenode-config, namenode-env, nginx-config, python-config, rabbitmq-config

The following are the component configurations required to use the Spark compute engine

spark: some configuration settings for the Spark component
- master: configuration of the master node
  - Image: image of the master
  - ImageTag: TAG
  - replicas: number of pod replicas
  - cpu: number of CPU requests
  - memory: number of RAM requests
  - nodeSelector: assign Pod to a certain node, nodeselector
  - type: type of Service resources corresponding to kubernetes
- worker: configuration of the worker node(s)
  - Image: image of worker(s)
  - ImageTag: TAG
  - replicas: number of pod replicas
  - cpu: number of CPU requests
  - memory: number of RAM requests
  - nodeSelector: assign Pod to a certain node, nodeselector
  - type: type of Service resources corresponding to kubernetes
hdfs: some configuration settings for the hdfs component
- namenode: configuration of the namenode
  - nodeSelector: assign Pod to a certain node, nodeselector
  - type: type of Service resources corresponding to kubernetes
- datanode: configuration of the datanode
  - nodeSelector: assign Pod to a certain node, nodeselector
  - type: type of Service resources corresponding to kubernetes
nginx: some configuration settings for the nginx component
- nodeSelector: assign Pod to a certain node, nodeselector
- type: exposure mode of nginx port, corresponding to the service type of Kubernetes
- nodePort: if NodePort is selected above, you can configure the specific port number here.
- route_table: configure FATE to connect to the proxy and fateflow information of the other party. If you want to connect to a FATE that has already been deployed by KubeFATE,
  - <party_id>: id of the other party
    - proxy: proxy information of the other party, corresponding to the IP address and port of nginx of the other party
    - fateflow: fateflow information of the other party, corresponding to the IP address and grpcNodePort of python module of the other party
rabbitmq: some configuration settings for the rabbitmq component
- nodeSelector: assign Pod to a certain node, nodeselector
- type: exposure mode of rabbitmq port, corresponding to the service type of Kubernetes
- nodePort: if NodePort is selected above, you can configure the specific port number here.
- default_user: default_user configuration for rabbitmq
- default_pass: default_pass configuration for rabbitmq
- user: user configuration for rabbitmq
- password: password configuration for rabbitmq
- route_table: configure FATE to connect rabbitmq information of the other party
  - <party_id>: id of the other party
    - host: entry ip of rabbitmq of the other party
    - port: entry port of rabbitmq of the other party

FATE Serving Configuration Settings

FATE-Serving deployment is the same as the above. For the same configurations, refer to the common parts of both parties.

FATE-Serving deployment requires three modules, namely [servingProxy, servingRedis, servingServer]

When cluster-serving.yaml is used, a FATE-Serving cluster is deployed by default.

The default deployment implementation, represented on Kubernetes, has the following resources:

Kubernetes Component	Resource Instances
Service	serving-proxy, serving-redis, serving-server
Deployment	serving-proxy, serving-redis, serving-server
Ingress	serving-proxy
ConfigMap	serving-proxy-config, serving-redis-config, serving-server-config

The following are the component configurations for the FATE-Serving cluster

servingProxy: some configuration settings for the servingProxy component
- nodePort: exposure mode of servingProxy port, for connecting to the other party's serving cluster
- ingerssHost: ingress host configuration of servingProxy
- partyList: configuration for connecting to the other party, i.e., IP address and port of component servingProxy of the other party
- nodeSelector: assign Pod to a certain node
servingServer: some configuration settings for the servingServer component
- type: exposure mode of servingServer port, corresponding to the service type of Kubernetes
- nodePort: exposure mode of servingProxy port, for python module connection from own local FATE (Training) cluster
- fateflow: entry of fateflow httpNodePort of local FATE (Training) cluster
- subPath: persistent path of servingServer
- storageClass: persistent storageClass
- existingClaim: use existing PVC or not
- accessMode: access mode
- size: size of PV required
- nodeSelector: assign Pod to a certain node
servingRedis: some configuration settings for the servingRedis (i.e., ordinary redis) component
- password: password for redis
- nodeSelector: assign pod to a certain node
- subPath: persistent path of redis
- storageClass: persistent storageClass
- existingClaim: use existing PVC or not
- accessMode: access mode
- size: size of PV required

Other Configuration Settings

FATE Exchange

If you deploy only a single rollsite, it can be used as the exchange.

$ cat <<EOF | sudo tee cluster-exchange.yaml 
name: fate-exchange
namespace: fate-exchange
chartName: fate
chartVersion: v1.5.0
partyId: exchange
modules:
  - rollsite

rollsite: 
  type: NodePort
  nodePort: 30001
  partyList:
  - partyId: 9999
    partyIp: 192.168.9.1
    partyPort: 30091
  - partyId: 10000
    partyIp: 192.168.10.1
    partyPort: 30101
EOF

FATE-Serving Exchange

If you deploy only a single servingProxy, it can be used as the serving-exchange.

$ cat <<EOF | sudo tee serving-exchange.yaml 
name: fate-serving-exchange
namespace: fate-serving-exchange
chartName: fate-serving
chartVersion: v1.5.0
partyId: exchange
modules:
  - rollsite

rollsite: 
  type: NodePort
  nodePort: 30006
  partyList:
  - partyId: 9999
    partyIp: 192.168.9.1
    partyPort: 30091
  - partyId: 10000
    partyIp: 192.168.10.1
    partyPort: 30101
EOF

As of the current version v1.5.0, FATE on Spark does not support exchange mode.

Get Involved

If you have any suggestions or ideas on KubeFATE, you may submit them through issue and PR on KubeFATE's GitHub

Deploy FATE Clusters with KubeFATE on Kubernetes - FederatedAI/KubeFATE GitHub Wiki

Overview

Introduction to FATE

Introduction to Kubernetes

Why Do We Deploy FATE on Kubernetes?

Prerequisites

Kubernetes Knowledge Base

FATE Knowledge Base

Related Documents

KubeFATE

Why Do We Use KubeFATE

Introduction to KubeFATE

KubeFATE Deployment Architecture

Knowledge Related to KubeFATE

Cluster

Job

Chart

User

Deployment Process

Deploy KubeFATE

Download KubeFATE Installation Package

Install KubeFATE server

KubeFATE Command Line Connection

Using KubeFATE to Deploy FATE

Verify Deployment

View job Status

View FATE Cluster Information

Check if FATE is Running Normally

FATE Interconnection

P2P Mode

Exchange Mode

Spark Mode

Introduction to the Configuration Settings for FATE deployment by KubeFATE

Common Portions

FATE (Training) Configuration Settings

FATE on eggroll

FATE on Spark

FATE Serving Configuration Settings

Other Configuration Settings

FATE Exchange

FATE-Serving Exchange

Get Involved

Related Links

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️