Deep Analysis to Customized Deployment of Federated Learning Components in KubeFATE - FederatedAI/KubeFATE GitHub Wiki
Overview
KubeFATE is divided based on two possible deployment environments, namely Docker-Compose and Kubernetes. The former provides an experimental environment for a quick start, and the latter is designed for FATE clusters in production systems. This document mainly focuses on the Kubernetes deployment environment, aiming to aid advanced users with customized FATE deployments in customizing deployment modules, adding or removing FATE modules, etc.
KubeFATE comprises two parts, which are the KubeFATE CLI and the KubeFATE service. KubeFATE CLI is an executable binary file that can be directly downloaded to the client for use; KubeFATE service is generally deployed on Kubernetes that are consistent with FATE and FATE-Serving, with a configured Service Account, so that the KubeFATE service has the permissions to create Pod, Service, and Ingress. For specific steps, refer to:
KubeFATE CLI provides corresponding commands, including four main parts, namely cluster management (kubefate cluster
), job management (kubefate job
), chart management (kubefate chart
), and user management (kubefate user
). For details, refer to kubefate help
. Cluster management of FATE and FATE Serving is implemented after receiving commands from the user and calling the RESTful API of the Kubefate service. The underlying layer of the KubeFATE service is based on Helm v3. The instruction passed in by the user will be rendered into Helm Chart and deployed through job assignment.
Terminology
The terms mentioned in this document are defined as follows:
- Client: refers to the machine on which the user uses KubeFATE CLI, which may be a laptop, Mac, or Linux. It is not required to be in the Kubernetes cluster, but must be connected to Ingress created by Kubernetes;
- Kubernetes administrator machine: refers to the machine on which
kubectl
can be used. It may either be in the Kubernetes cluster or not, but must be connected to the API Server of Kubernetes and have sufficient permissions to create Service Account, Service, Ingress and Pod. To execute the RBAC.yaml that the KubeFATE project comes with, administrator privileges are required. - Server: refers to the machine on which the Kubernetes cluster is deployed.
Helm 3 and Helm Chart
The version after Helm v3.0.0 is often referred to as Helm 3, which was released in November 2019. It has major changes from its previous version (referred to as Helm 2) The most significant difference is that the component Tiller was removed. Helm 2 has a typical client-server structure, where the component Tiller acts as a service to interact with the Helm client and uses the Kubernetes cluster through the Kubernetes API. This feature is actually rather identical to the KubeFATE service, so we have chosen to develop with the upgraded version Helm 3. Helm 3 has no server-side, and directly interacts with the Kubernetes API through the Client to dynamically capture the cluster's status. Helm 3 is designed to simplify permissions management and avoid the problems caused by status synchronization, but such design is disadvantageous as the permissions management is completely dependent upon Kubernetes, its configuration is complicated, and its compatibility with third-party components requires a lot of work on the client. As the configuration of FATE itself is relatively complex, to simplify user's configuration and usage, and to make it compatible with the upper-level systems, we manage the status information from within the KubeFATE service. After understanding this architecture, we can conclude the following two points:
- Because Helm 3 is incompatible with Helm 2, KubeFATE must be used with Helm 3, and is incompatible with Helm 2. In case of any calling problem, you should check whether the client has Helm 2 pre-installed. If so, it has to be uninstalled.
- It is difficult synchronize the status of Helm 3 with that of KubeFATE. In some extremely unexpected situations, their statuses may be not unified, and a common solution is to delete the existing cluster through using Helm commands. These various unexpected situations will be gradually fixed with subsequent versions. If you encounter any unexpected situation, you may submit it to the KubeFATE project through issue.
Helm Chart is the package format used by Helm. Chart is a file collection of Kubernetes-related resources. Helm Chart has specific directory layout requirements that can be packaged into the deployment version archive. In addition, Helm Chart has a community that provides many ready-made Charts for download and deployment, which can be downloaded via helm pull ${chart_repo}/${chart_name}
.
(The introduction in the following section will partially quote the official Helm Chart documentation)
Chart File Structure
For example, a Chart for WordPress can be stored in the wordpress/
directory with the following structure:
wordpress/
Chart.yaml # The YAML file containing chart information
LICENSE # Optional: The plain text file containing chart license
README.md # Optional: The readable README file
values.yaml # The default configuration value of chart
values.schema.json # Optional: A values.yaml file using JSON structure
charts/ # Other charts containing chart dependency
crds/ # The definition of custom resources
templates/ # The template directory, to generate valid Kubernetes manifest file when used with values
templates/NOTES.txt # Optional: The plain text file containing brief instructions
To develop Helm Chart, you can use helm create NAME [flags]
, where [flag]
is:
--debug enable verbose output
--kube-apiserver string the address and the port for the Kubernetes API server
--kube-as-group stringArray Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--kube-as-user string Username to impersonate for the operation
--kube-context string name of the kubeconfig context to use
--kube-token string bearer token used for authentication
--kubeconfig string path to the kubeconfig file
-n, --namespace string namespace scope for this request
--registry-config string path to the registry config file (default "~/.config/helm/registry.json")
--repository-cache string path to the file containing cached repository indexes (default "~/.cache/helm/repository")
--repository-config string path to the file containing repository names and URLs (default "~/.config/helm/repositories.yaml")
to initialize the Chart directory. The key files include:
Chart.yaml File
This is the metadata description of the Chart, containing apiVersion
, name
, version
, dependencies
and other fields. We can use this file to control the versions for the Chart. There is also an important concept called Chart dependencies, described under dependencies
in the Chart.yaml file. As mentioned earlier, Helm Chart has a community to provide ready-made Charts for download and deployment, so when we implement our own Chart, we can, by adding dependencies, use the existing Chart in the community as part of the cluster deployment. For example, a Wordpress deployment depends on having Apache as the HTTP server and MySQL as database. Then you can add to `Chart.yaml` contents similar to the following:
dependencies:
- name: apache
version: 1.2.3
repository: https://example.com/charts
- name: mysql
version: 3.2.1
repository: https://another.example.com/charts
In which,
- name field is the name of the chart you need
- version field is the version of the chart you need
- repository field is the full URL of the chart repository
After dependency is defined, you can download the dependent Chart through helm dependency update
to the chart/
directory.
Templates Directory and values.yaml
The Helm Chart template is written in Go template language, with some additional functions. All template files are stored in the template/
folder. When Helm renders Chart, it traverses each file in the directory through the template engine. The user incorporates the default value of the template through the value.yaml
file. Values access the values.yaml
file through the .Values
object in the template. For example, for a Chart of a Deis database, the template file is defined as follows:
apiVersion: v1
kind: ReplicationController
metadata:
name: deis-database
namespace: deis
labels:
app.kubernetes.io/managed-by: deis
spec:
replicas: 1
selector:
app.kubernetes.io/name: deis-database
template:
metadata:
labels:
app.kubernetes.io/name: deis-database
spec:
serviceAccount: deis-database
containers:
- name: deis-database
image: {{ .Values.imageRegistry }}/postgres:{{ .Values.dockerTag }}
imagePullPolicy: {{ .Values.pullPolicy }}
ports:
- containerPort: 5432
env:
- name: DATABASE_STORAGE
value: {{ default "minio" .Values.storage }}
Then the value.yaml
file corresponding to the Chart needs to contain:
- imageRegistry: source registry of the Docker image
- dockerTag: tag of the Docker image
- pullPolicy: pull policy of Kubernetes
- storage: storage backend, set to "minio" by default
value.yaml
is configured as follows:
imageRegistry: "quay.io/deis"
dockerTag: "latest"
pullPolicy: "Always"
storage: "s3"
In addition, the template provides the following default predefined values for use:
- Release.Name: version name (non-chart);
- Release.Namespace: namespace of the released chart version;
- Release.Service: service to organize the version;
- Release.IsUpgrade: set to true if the current operation is upgrade or rollback;
- Release.IsInstall: set to true if the current operation is installation;
- Chart: contents of the Chart.yaml file. Therefore, the chart version is available from Chart.Version, and the maintainers are in Chart.Maintainers;
- Files: containing non-special files in the chart. This will not allow you access the template, but you can access other existing files (except those excluded by .helmignore). You can use {{ index .Files "file.name" }} to access files or use the {{.Files.Get name }} function. You can also use {{ .Files.GetBytes }} for getting the contents of a file as an array of bytes;
- Capabilities: containing Kubernetes version information (
{{ .Capabilities.KubeVersion }}
) and supported Kubernetes API version ({{ .Capabilities.APIVersions.Has "batch/v1" }}
);
The above is the basic knowledge of Helm Chart. We suggest readers refer to the official documentation of Helm Chart for more details. These encompass the bases for customizing the installation of FATE and KubeFATE:
KubeFATE Architecture and the Rendering Process
The schematic diagram of KubeFATE architecture and FATE deployment are as follows:
For the service part of KubeFATE, FATE clusters are all deployed in Kuberentes environment. KubeFATE service should have the permission to access the kube-apiserver
of Kubernetes used for FATE cluster deployment. Generally, they are deployed in the same Kubernetes cluster and use the service account. For details, refer to the example in the code and this series of documents: Using KubeFATE to Deploy FATE Cluster on Kubernetes. The computer in the diagram is a client
, which accesses the REST APIs
module of KubeFATE service through the KubeFATE CLI
for execution. REST APIs
can also be externally connected to other management software, such as FATE-Cloud, as an infrastructure provider within an organization. Under the API layer, we use the design pattern of the Facade service and combine different interfaces. Externally by calling:
- Helm: namely the interfacing port of Helm 3, mainly for cluster deployment, deletion, upgrade, etc.;
- Kubernetes APIs: for FATE module health monitoring, etc. Cluster information, user authentication information, and the rendered Helm Chart are all cached in MySQL.
As can be seen from the architecture diagram, if we need to customize the deployed cluster, such as adding or removing a module, integrating third-party software, customizing module content, etc., we actually need to customize the deployed Helm Chart. In the code, we provide the following for reference:
- Each version comes with the default Chart of FATE and FATE-Serving, available at https://github.com/FederatedAI/KubeFATE/tree/master/helm-charts. You can toggle between different versions through GitHub's tag;
- There are special Chart management commands in KubeFATE's CLI:
kubefate chart upload
: upload a new Chart;kubefate chart ls
: list existing Charts in KubeFATE. The uploaded chart will be cached in MySQL by type and version;kubefate chart delete
: delete existing Charts in KubeFATE.
- In our Chart, a Makefile is provided to initialize and package the Helm Chart. One suggestion is to create a new Helm Chart and modify it from our default Chart.
Based on ordinary Helm Chart, we make another layer of abstraction, which is the rendering process of KubeFATE. With FATE v1.5.0 as example, the process is shown below:
We use the command kubefate cluster install
to pass in the cluster.yaml
file which contains the chartName
and chartVersion
fields. In KubeFATE service, it queries whether there is a corresponding local Chart in MySQL. If not, it queries in FATECLOUD_REPO_URL
. This field is defined in the yaml file where the KubeFATE service is deployed, which is defined under k8s-deploy/kubefate.yaml
in the code. When deploying KubeFATE, we can choose a custom http address. In an offline deployment environment, we can choose to upload the required chart files using kubefate chart upload
, or create an internal repository according to Standard for Helm Chart Repository. In addition, because Harbor complies with the OCI standard, we can directly use Harbor as a private internal Chart repository. For details, refer to Managing Helm Charts.
After the Helm Chart required for deployment is found in the KubeFATE service, it will be read in. Based on the original Helm 3, we make an extra layer of template rendering. In KubeFATE, cluster.yaml
is for users to configure which FATE module to be deploy and configure each module. Therefore, in each KubeFATE Chart, there will be a value-template.yaml
. We still use the standard Go Template as the template language to render the value.yaml
file of the standard Helm 3.
After getting the user-defined value.yaml
file KubeFATE calls Helm 3 and, based on the value.yaml
file and the Helm Chart template directory, creates a FATE v1.5.0 cluster.
The following is the summary of the points of note when customizing the KubeFATE chart:
- To create a new chart of FATE or FATE-Serving, we suggest you copy an existing chart and modify it to ensure
value-template.yaml
is included; cluster.yaml
is the user-interfacing interface, you need to consider which variables need to be passed to the user. When deciding to pass a variable upwards to thecluster.yaml
file, please make sure thatvalue-template.yaml
has been configured and can generate a suitablevalue.yaml
file for Chart to use;- After the
value.yaml
file is generated, it becomes a standard Helm 3 process. We suggest you get familiar with the production process of Chart in Helm 3. Features such as hook that are not covered in this document can also be used; - Helm Chart is a community that allow us to integrate with other systems through dependencies. You are also welcome to submit customized Helm Charts to KubeFATE through PR. Currently, the Chart directory of KubeFATE is
./helm-charts
. You may take a suitable name as a folder and place it in this directory during PR.