Federated GNN - rhqaq/paper_reading GitHub Wiki

FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning

背景

联邦学习是在CV和NLP领域发挥了很多作用，但在图学习任务中的研究还比较少
目前的联邦学习框架或者说库没有支持GNN的
做图联邦学习缺乏统一的库，会增加不必要的开发和研究成本

贡献

Unified View for Modularized and Flexible Programming. 基于FederatedScope
Unified and Comprehensive Benchmarks. GraphDataZoo(多种图数据集分割机制）和GNNModelZoo（多种SOTA FGL算法）
Efficient and Automated Model Tuning. 自动调参模块以及个性化学习调参
Privacy Attacks and Defence. 第一个增加了攻击模块的库；图联邦学习还需要共享额外的node embeddings和neighbor generator；防御功能封装为插件

We utilize FS-G to conduct extensive experimental studies to validate the implementation correctness, verify its efficacy, and better understanding the characteristics of FGL.
Furthermore, we employ FS-G to serve three real-world E-commerce scenarios, and the collaboratively learned GNN outperforms their locally learned counterparts, which confirms the business value of FS-G.
We have open-sourced FS-G for the community, which we believe can ease the innovation of FGL algorithms, promote their applications, and benefit more real-world business.

GNN联邦学习特点

一般的模型联邦学习只需要交换模型参数，GNN还需要交换node embedding和neighbor信息
FS-G将所有交换都视为发送消息，利于实现SOTA的FGL

GraphDataZoo

方便研究人员构造分布式数据

(1) Node-level task: Each instance is a node which is associated with its label. To make prediction for a node, its 𝑘-hop neighborhood is often considered as the input to a GNN.

(2) Link-level task: The goal is to predict whether any given node pair is connected or the label of each given link (e.g., the rating a user assigns to an item).

(3) Graphlevel task: Each instance is an individual graph which is associated with its label. For the link/node-level tasks, transductive setting is prevalent, where both the labeled and unlabeled links/nodes appear in the same graph. As for the graph-level task, a standalone dataset often consists of a collection of graphs.

联邦学习要求数据是分散的，这需要把现有的图数据拆成独立的几份数据。
对于节点/链接级的任务，每个客户端应该持有一个子图。
对于图级任务，每个客户端应该持有所有图的一个子集。
Some splitters split a given dataset by specific meta data or the node attribute value, expecting to simulate realistic FL scenarios.
Some other splitters are designed to provide various non-i.i.d.ness, including covariate shift, concept drift, and prior probability shift

还提供了实验数据集

GNNModelZoo

为不同级别的任务指定的模型将会遇到异构的输入和/或输出，因此需要不同的体系结构。有四种类的neural network model (1) Encoder: embeds the raw node attributes or edge attributes, e.g., atom encoder and bond encoder.

(2) GNN: learns discriminative representations for the nodes from their original representations (raw or encoded) and the graph structures.

(3) Decoder: recovers these hidden representations back into original node attributes or adjacency relationships.

(4) Readout: aggregates node representations into a graph representation, e.g., the mean pooling

有各种变体， such as GPR-GNN

用户可以构造库内包含的GNN以外的模型

调参模块

由于联邦学习会进行很多次通信，每次参数调试都需要较大的成本

应用multi-fidelity HPO
(1)为每次试验制作有限的FL轮，而不是一个完整的FL课程；(2)在每一轮中抽取一小部分客户，如𝐾（𝐾≪𝑁）。

允许每个client采用个性化的本地模型，并能提示client是否改采用个性化处理

OFF-THE-SHELF ATTACK AND DEFENCE ABILITIES

研究者可以在实验中选取攻击方式和防御方式，来评估联邦学习隐私保护能力

实验

我们利用FS-G进行了广泛的实验，目的是验证FS-G的实现正确性，为长期以来需要的FGL建立基准。

在本研究中，我们考虑了三种不同的设置：

(1)本地：每个客户端都用其数据训练一个GNN模型。

(2) FGL：FedAvg、FedOpt和FedProx被用于协作训练GNN模型。

(3)全局：在完整的数据集上训练一个GNN模型。

Node-level Tasks

模型：GCN, GraphSage, GAT, and GPR-GNN 按社区和随机两种方式分割数据集

Link-level Tasks.

Graph-level Tasks

总结

我们实现了一个FGL软件包，FS-G，以促进FGL的研究和应用
利用FS-G，FGL算法可以以统一的方式表示，根据全面和统一的基准进行验证，并进一步有效地调优
同时，FS-G提供了丰富的插件攻击和防御工具，来评估感兴趣的FGL算法的隐私泄漏水平。