Deploy an exchange central multi parties federated learning network with KubeFATE - FederatedAI/KubeFATE GitHub Wiki
Federated learning is a machine learning framework that protects data privacy. It can effectively help multiple institutions to perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security, and government regulations. Discover the value of data and protect data privacy and security.
In the deployment of federated learning applications, multiple institutions are required to participate together and join the federated network organization. However, too many participants are difficult to manage.
The traditional model can use simple links between various agencies. When the federated network organization grows larger, the configuration information management between agencies becomes very complicated.
Using the exchange deployment model can easily and conveniently deal with the network construction of multi-member organizations. Members can discover each other through the core exchange, This becomes very simple to manage the federated learning network.
KubeFATE v1.6.0 supports the exchange deployment mode of the Spark and eggroll computing engine.
Use KubeFATE to deploy a federated learning network with exchange as the central node. This network contains an exchange and several parties.
The deployment here includes 3 parties and 1 exchange. Each role has an independent k8s cluster. All clusters have deployed KubeFATE (Deploy KubeFATE).
party | party ID | k8s version | k8s node IP | kubefate version | FATE version |
---|---|---|---|---|---|
exchange | 1 | v1.19.9 | 192.168.100.1 | v1.4.1 | v1.6.0 |
party-9999 | 9999 | v1.19.9 | 192.168.100.9 | v1.4.1 | v1.6.0 |
party-10000 | 10000 | v1.19.9 | 192.168.100.10 | v1.4.1 | v1.6.0 |
party-8888 | 8888 | v1.19.9 | 192.168.100.8 | v1.4.1 | v1.6.0 |
Let's use the FATE-Exchange chart to deploy an exchange cluster.
There are two types of exchange, which correspond to FATE's two computing engines (eggroll, Spark).
-
eggroll(rollsite)
The core of the eggroll type of exchange cluster is the component that contains the rollsite.
$ cat cluster-exchange.yaml name: fate-exchange namespace: fate-exchange chartName: fate-exchange chartVersion: v1.6.0 partyId: 1 registry: "" imageTag: "1.6.0-release" pullPolicy: imagePullSecrets: - name: myregistrykey persistence: false istio: enabled: false modules: - rollsite rollsite: type: NodePort nodePort: 30000 partyList: - partyId: 10000 partyIp: 192.168.100.10 partyPort: 30101 - partyId: 9999 partyIp: 192.168.100.9 partyPort: 30091 - partyId: 8888 partyIp: 192.168.100.8 partyPort: 30081
-
Spark(ATS)
When deploying an exchange that uses Spark as a computing engine, you need to first solve the certificate configuration between each party and exchange. Refer to this (pulsar and certificate generation of ATS) document to generate the exchange certificate and import the certificate into k8s:
kubectl create secret generic traffic-server-cert -n fate-exchange \ --from-file=proxy.cert.pem=proxy.fate.org/proxy.cert.pem \ --from-file=proxy.key.pem=proxy.fate.org/proxy.key.pem \ --from-file=ca.cert.pem=certs/ca.cert.pem
Configure the YAML file. The exchange core of the Spark computing engine contains two components (nginx, trafficServer).
$ cat cluster-exchange.yaml name: fate-exchange namespace: fate-exchange chartName: fate-exchange chartVersion: v1.6.0 partyId: 1 registry: "" imageTag: "1.6.0-release" pullPolicy: imagePullSecrets: - name: myregistrykey persistence: false istio: enabled: false modules: - trafficServer - nginx trafficServer: type: NodePort nodePort: 30001 route_table: sni: - fqdn: 10000.fate.org tunnelRoute: 192.168.100.10:30109 - fqdn: 9999.fate.org tunnelRoute: 192.168.100.9:30099 - fqdn: 8888.fate.org tunnelRoute: 192.168.100.8:30089 nginx: nodeSelector: type: NodePort httpNodePort: 30003 grpcNodePort: 30008 route_table: 8888: proxy: - host: 192.168.100.8 http_port: 30083 grpc_port: 30088 fateflow: - host: 192.168.100.8 http_port: 30087 grpc_port: 30082 9999: proxy: - host: 192.168.100.9 http_port: 30093 grpc_port: 30098 fateflow: - host: 192.168.100.9 http_port: 30097 grpc_port: 30092 10000: proxy: - host: 192.168.100.10 http_port: 30103 grpc_port: 30108 fateflow: - host: 192.168.100.10 http_port: 30107 grpc_port: 30102
Configure the YAML file and use exchange's kubefate
deployment,
(exchange)$ kubefate cluster install -f ./cluster-exchange.yaml
Check the status of the cluster is Running
to confirm whether it deploys successfully.
(exchange)$ kubefate cluster ls
When a new party wants to join an already running exchange cluster, the party information needs to be added, Modify the cluster-exchange.yaml
file to add a new party. Then use the update command of KubeFATE to update to the exchange cluster.
(exchange)$ kubefate cluster update -f ./cluster-exchange.yaml
Then wait for a while to take effect (this is because the program has a small time period for loading party information).
In a federated network with an existing exchange, the joining of a new party becomes simple. You only need to configure the information between the party and the exchange, the party will successfully join the network.
For the configuration of exchange, refer to the [exchange update configuration](#Update configuration).
There are different ways to connect different computing engines to exchange. Take Party-9999 as an example below.
-
Eggroll(rollsite)
Configure the exchange field of rollsite to connect to the exchange cluster.
$ cat cluster.yaml name: fate-9999 namespace: fate-9999 chartName: fate chartVersion: v1.6.0 partyId: 9999 registry: "" imageTag: "1.6.0-release" pullPolicy: persistence: false istio: enabled: false modules: - rollsite - clustermanager - nodemanager - mysql - python - fateboard - client backend: eggroll rollsite: type: NodePort nodePort: 30091 exchange: ip: 192.168.100.1 port: 30000
-
Spark(Pulsar)
When deploying FATE that uses Spark as the computing engine, you need to resolve the certificate configuration with the exchange. Refer to this (pulsar and certificate generation of ATS) document to generate the exchange certificate and import the certificate into k8s:
kubectl create secret generic pulsar-cert \ --from-file=broker.cert.pem=9999.fate.org/broker.cert.pem \ --from-file=broker.key-pk8.pem=9999.fate.org/broker.key-pk8.pem \ --from-file=ca.cert.pem=certs/ca.cert.pem
The FATE of the Spark engine needs to configure python, nginx and pulsar respectively to link with exchange.
$ cat cluster.yamlname: fate-9999namespace: fate-9999chartName: fatechartVersion: v1.6.0partyId: 9999registry: ""imageTag: "1.6.0-release"pullPolicy: imagePullSecrets: - name: myregistrykeypersistence: falseistio: enabled: falsemodules: - python - mysql - fateboard - client - spark - hdfs - nginx - pulsarbackend: sparkpython: type: NodePort httpNodePort: 30097 grpcNodePort: 30092nginx: type: NodePort httpNodePort: 30093 grpcNodePort: 30098 exchange: ip: 192.168.100.1 httpPort: 30003 grpcPort: 30008pulsar: type: NodePort httpNodePort: 30094 httpsNodePort: 30099 exchange: ip: 192.168.100.1 port: 30001
After configuring the YAML file, use the kubefate
corresponding to the Party to deploy the FATE cluster
(party-9999)$ kubefate cluster install -f ./cluster.yaml
Check whether the status of the cluster is Running
to confirm whether it runs successfully.
(party-9999)$ kubefate cluster ls
Refer to the previous [Party configuration](#Configure Party), configure Party-8888 and Party-10000 respectively, and then deploy the corresponding FATE cluster to join the federated network.
Through the above deployment, we have successfully deployed a federated learning network interconnected through exchange, which contains three parties and the computing engine is eggroll. Below we check the usability of the federated learning network through some tests.
Use toy_example test for different party confirms that the two parties can communicate with each other.
-
Party-9999 and Party-10000
Enter the python container of Party-9999 through the command line. Then run the toy command.
kubectl -n fate-9999 exec -it svc/fateflow -c python -- bashcd ../examples/toy_example/python run_toy_example.py 9999 10000 1
-
Party-10000 and Party-8888
kubectl -n fate-10000 exec -it svc/fateflow -c python -- bashcd ../examples/toy_example/python run_toy_example.py 10000 8888 1
-
Party-8888 and Party-9999
kubectl -n fate-8888 exec -it svc/fateflow -c python -- bashcd ../examples/toy_example/python run_toy_example.py 8888 9999 1
Finally, if the log appears similar to success to calculate secure_sum, it is 2000.0000000000002
, it means the toy_example communicate test is successful.
If the intercommunication test between two parties is passed, we can run a three-party min_test to test the multi-parties task training.
The task of min_test requires the participation of three parties, Guest, Host and Arbiter. We use Party-10000 as the Guest, Party-9999 as the Host, and Party-8888 as the Arbiter.
-
First upload the min_test data set on the FATE of each Party.
Run the following commands on the k8s master corresponding to each Party
kubectl -n fate-<partyID> exec -it svc/fateflow -c python -- bashcd ../examples/scripts; python upload_default_data.py -m 1
<partyID>
represents the ID of the party currently deployed by k8s. -
Launch a training task on one of the parties.
Let's launch a task at Party-10000
kubectl -n fate-10000 exec -it svc/fateflow -c python -- bashcd ../examples/min_test_task; python run_task.py -m 1 -gid 10000 -hid 9999 -aid 8888
-
View the task results.
The test of min_test needs to run for some time. Wait for the task to end. You can view the running result through the log on the command line.
You can also check the FATE-Board web page: http://10000.fateboard.example.com for more task information.
Finally, you can use the federated learning network to train your own model.