Deploy an exchange central multi parties federated learning network with KubeFATE - FederatedAI/KubeFATE GitHub Wiki

Federated learning is a machine learning framework that protects data privacy. It can effectively help multiple institutions to perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security, and government regulations. Discover the value of data and protect data privacy and security.

Encounter problems

In the deployment of federated learning applications, multiple institutions are required to participate together and join the federated network organization. However, too many participants are difficult to manage.

The traditional model can use simple links between various agencies. When the federated network organization grows larger, the configuration information management between agencies becomes very complicated.

Solution

Using the exchange deployment model can easily and conveniently deal with the network construction of multi-member organizations. Members can discover each other through the core exchange, This becomes very simple to manage the federated learning network.

KubeFATE support for exchange

KubeFATE v1.6.0 supports the exchange deployment mode of the Spark and eggroll computing engine.

Federated learning organization building

Organization

Use KubeFATE to deploy a federated learning network with exchange as the central node. This network contains an exchange and several parties.

Deployment structure diagram

exchange-structure-diagram

Introduction to deployment environment

The deployment here includes 3 parties and 1 exchange. Each role has an independent k8s cluster. All clusters have deployed KubeFATE (Deploy KubeFATE).

party party ID k8s version k8s node IP kubefate version FATE version
exchange 1 v1.19.9 192.168.100.1 v1.4.1 v1.6.0
party-9999 9999 v1.19.9 192.168.100.9 v1.4.1 v1.6.0
party-10000 10000 v1.19.9 192.168.100.10 v1.4.1 v1.6.0
party-8888 8888 v1.19.9 192.168.100.8 v1.4.1 v1.6.0

Deploy exchange

Let's use the FATE-Exchange chart to deploy an exchange cluster.

Configuration

There are two types of exchange, which correspond to FATE's two computing engines (eggroll, Spark).

  • eggroll(rollsite)

    The core of the eggroll type of exchange cluster is the component that contains the rollsite.

    $ cat cluster-exchange.yaml
    name: fate-exchange
    namespace: fate-exchange
    chartName: fate-exchange
    chartVersion: v1.6.0
    partyId: 1
    registry: ""
    imageTag: "1.6.0-release"
    pullPolicy: 
    imagePullSecrets: 
    - name: myregistrykey
    persistence: false
    istio:
      enabled: false
    modules:
      - rollsite
    
    rollsite: 
      type: NodePort
      nodePort: 30000
      partyList:
      - partyId: 10000
        partyIp: 192.168.100.10
        partyPort: 30101
      - partyId: 9999
        partyIp: 192.168.100.9
        partyPort: 30091
      - partyId: 8888
        partyIp: 192.168.100.8
        partyPort: 30081
  • Spark(ATS)

    When deploying an exchange that uses Spark as a computing engine, you need to first solve the certificate configuration between each party and exchange. Refer to this (pulsar and certificate generation of ATS) document to generate the exchange certificate and import the certificate into k8s:

    kubectl create secret generic traffic-server-cert -n fate-exchange \
    	--from-file=proxy.cert.pem=proxy.fate.org/proxy.cert.pem \
    	--from-file=proxy.key.pem=proxy.fate.org/proxy.key.pem \
    	--from-file=ca.cert.pem=certs/ca.cert.pem

    Configure the YAML file. The exchange core of the Spark computing engine contains two components (nginx, trafficServer).

    $ cat cluster-exchange.yaml
    name: fate-exchange
    namespace: fate-exchange
    chartName: fate-exchange
    chartVersion: v1.6.0
    partyId: 1
    registry: ""
    imageTag: "1.6.0-release"
    pullPolicy: 
    imagePullSecrets: 
    - name: myregistrykey
    persistence: false
    istio:
      enabled: false
    modules:
      - trafficServer
      - nginx
    
    trafficServer:
      type: NodePort
      nodePort: 30001
      route_table: 
        sni:
        - fqdn: 10000.fate.org
          tunnelRoute: 192.168.100.10:30109
        - fqdn: 9999.fate.org
          tunnelRoute: 192.168.100.9:30099
        - fqdn: 8888.fate.org
          tunnelRoute: 192.168.100.8:30089
    
    nginx:
      nodeSelector: 
      type: NodePort
      httpNodePort: 30003
      grpcNodePort: 30008
      route_table: 
        8888: 
          proxy: 
            - host: 192.168.100.8
              http_port: 30083
              grpc_port: 30088 
          fateflow: 
            - host: 192.168.100.8
              http_port: 30087
              grpc_port: 30082
        9999: 
          proxy: 
            - host: 192.168.100.9
              http_port: 30093
              grpc_port: 30098 
          fateflow: 
            - host: 192.168.100.9
              http_port: 30097
              grpc_port: 30092
        10000: 
          proxy: 
            - host: 192.168.100.10
              http_port: 30103
              grpc_port: 30108 
          fateflow: 
            - host: 192.168.100.10
              http_port: 30107
              grpc_port: 30102

Deploy

Configure the YAML file and use exchange's kubefate deployment,

(exchange)$ kubefate cluster install -f ./cluster-exchange.yaml

Check the status of the cluster is Running to confirm whether it deploys successfully.

(exchange)$ kubefate cluster ls

Update configuration

When a new party wants to join an already running exchange cluster, the party information needs to be added, Modify the cluster-exchange.yaml file to add a new party. Then use the update command of KubeFATE to update to the exchange cluster.

(exchange)$ kubefate cluster update -f ./cluster-exchange.yaml

Then wait for a while to take effect (this is because the program has a small time period for loading party information).

Add participants to exchange

In a federated network with an existing exchange, the joining of a new party becomes simple. You only need to configure the information between the party and the exchange, the party will successfully join the network.

For the configuration of exchange, refer to the [exchange update configuration](#Update configuration).

Configure Party

There are different ways to connect different computing engines to exchange. Take Party-9999 as an example below.

  • Eggroll(rollsite)

    Configure the exchange field of rollsite to connect to the exchange cluster.

    $ cat cluster.yaml
    name: fate-9999
    namespace: fate-9999
    chartName: fate
    chartVersion: v1.6.0
    partyId: 9999
    registry: ""
    imageTag: "1.6.0-release"
    pullPolicy: 
    persistence: false
    istio:
      enabled: false
    modules:
      - rollsite
      - clustermanager
      - nodemanager
      - mysql
      - python
      - fateboard
      - client
    
    backend: eggroll
    
    rollsite: 
      type: NodePort
      nodePort: 30091
      exchange:
        ip: 192.168.100.1
        port: 30000
  • Spark(Pulsar)

    When deploying FATE that uses Spark as the computing engine, you need to resolve the certificate configuration with the exchange. Refer to this (pulsar and certificate generation of ATS) document to generate the exchange certificate and import the certificate into k8s:

    kubectl create secret generic pulsar-cert \
    	--from-file=broker.cert.pem=9999.fate.org/broker.cert.pem \
    	--from-file=broker.key-pk8.pem=9999.fate.org/broker.key-pk8.pem \
    	--from-file=ca.cert.pem=certs/ca.cert.pem

    The FATE of the Spark engine needs to configure python, nginx and pulsar respectively to link with exchange.

    $ cat cluster.yamlname: fate-9999namespace: fate-9999chartName: fatechartVersion: v1.6.0partyId: 9999registry: ""imageTag: "1.6.0-release"pullPolicy: imagePullSecrets: - name: myregistrykeypersistence: falseistio:  enabled: falsemodules:  - python  - mysql  - fateboard  - client  - spark  - hdfs  - nginx  - pulsarbackend: sparkpython:  type: NodePort  httpNodePort: 30097  grpcNodePort: 30092nginx:  type: NodePort  httpNodePort: 30093  grpcNodePort: 30098  exchange:    ip: 192.168.100.1    httpPort: 30003    grpcPort: 30008pulsar:  type: NodePort  httpNodePort: 30094  httpsNodePort: 30099  exchange:    ip: 192.168.100.1    port: 30001

Deploy

After configuring the YAML file, use the kubefate corresponding to the Party to deploy the FATE cluster

(party-9999)$ kubefate cluster install -f ./cluster.yaml

Check whether the status of the cluster is Running to confirm whether it runs successfully.

(party-9999)$ kubefate cluster ls

Add multi-parties in turn

Refer to the previous [Party configuration](#Configure Party), configure Party-8888 and Party-10000 respectively, and then deploy the corresponding FATE cluster to join the federated network.

Test

Through the above deployment, we have successfully deployed a federated learning network interconnected through exchange, which contains three parties and the computing engine is eggroll. Below we check the usability of the federated learning network through some tests.

Multi-parties connection test

Use toy_example test for different party confirms that the two parties can communicate with each other.

  1. Party-9999 and Party-10000

    Enter the python container of Party-9999 through the command line. Then run the toy command.

    kubectl -n fate-9999 exec -it svc/fateflow -c python -- bashcd ../examples/toy_example/python run_toy_example.py 9999 10000 1 
  2. Party-10000 and Party-8888

    kubectl -n fate-10000 exec -it svc/fateflow -c python -- bashcd ../examples/toy_example/python run_toy_example.py 10000 8888 1 
  3. Party-8888 and Party-9999

    kubectl -n fate-8888 exec -it svc/fateflow -c python -- bashcd ../examples/toy_example/python run_toy_example.py 8888 9999 1 

Finally, if the log appears similar to success to calculate secure_sum, it is 2000.0000000000002, it means the toy_example communicate test is successful.

Multi-parties training test

If the intercommunication test between two parties is passed, we can run a three-party min_test to test the multi-parties task training.

The task of min_test requires the participation of three parties, Guest, Host and Arbiter. We use Party-10000 as the Guest, Party-9999 as the Host, and Party-8888 as the Arbiter.

  1. First upload the min_test data set on the FATE of each Party.

    Run the following commands on the k8s master corresponding to each Party

    kubectl -n fate-<partyID> exec -it svc/fateflow -c python -- bashcd ../examples/scripts; python upload_default_data.py -m 1

    <partyID> represents the ID of the party currently deployed by k8s.

  2. Launch a training task on one of the parties.

    Let's launch a task at Party-10000

    kubectl -n fate-10000 exec -it svc/fateflow -c python -- bashcd ../examples/min_test_task; python run_task.py -m 1 -gid 10000 -hid 9999 -aid 8888
  3. View the task results.

    The test of min_test needs to run for some time. Wait for the task to end. You can view the running result through the log on the command line.

    You can also check the FATE-Board web page: http://10000.fateboard.example.com for more task information.

    fateboard

Finally, you can use the federated learning network to train your own model.

Next step

Use FATE Client to Build Jobs in Jupyter Notebook

⚠️ **GitHub.com Fallback** ⚠️