Federated GNN - rhqaq/paper_reading GitHub Wiki
FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning
背景
- 联邦学习是在CV和NLP领域发挥了很多作用,但在图学习任务中的研究还比较少
- 目前的联邦学习框架或者说库没有支持GNN的
- 做图联邦学习缺乏统一的库,会增加不必要的开发和研究成本
贡献
-
Unified View for Modularized and Flexible Programming. 基于FederatedScope
-
Unified and Comprehensive Benchmarks. GraphDataZoo(多种图数据集分割机制)和GNNModelZoo(多种SOTA FGL算法)
-
Efficient and Automated Model Tuning. 自动调参模块以及个性化学习调参
-
Privacy Attacks and Defence. 第一个增加了攻击模块的库;图联邦学习还需要共享额外的node embeddings和neighbor generator;防御功能封装为插件
- We utilize FS-G to conduct extensive experimental studies to validate the implementation correctness, verify its efficacy, and better understanding the characteristics of FGL.
- Furthermore, we employ FS-G to serve three real-world E-commerce scenarios, and the collaboratively learned GNN outperforms their locally learned counterparts, which confirms the business value of FS-G.
- We have open-sourced FS-G for the community, which we believe can ease the innovation of FGL algorithms, promote their applications, and benefit more real-world business.
GNN联邦学习特点
- 一般的模型联邦学习只需要交换模型参数,GNN还需要交换node embedding和neighbor信息
- FS-G将所有交换都视为发送消息,利于实现SOTA的FGL
GraphDataZoo
方便研究人员构造分布式数据
(1) Node-level task: Each instance is a node which is associated with its label. To make prediction for a node, its 𝑘-hop neighborhood is often considered as the input to a GNN.
(2) Link-level task: The goal is to predict whether any given node pair is connected or the label of each given link (e.g., the rating a user assigns to an item).
(3) Graphlevel task: Each instance is an individual graph which is associated with its label. For the link/node-level tasks, transductive setting is prevalent, where both the labeled and unlabeled links/nodes appear in the same graph. As for the graph-level task, a standalone dataset often consists of a collection of graphs.
- 联邦学习要求数据是分散的,这需要把现有的图数据拆成独立的几份数据。
- 对于节点/链接级的任务,每个客户端应该持有一个子图。
- 对于图级任务,每个客户端应该持有所有图的一个子集。
- Some splitters split a given dataset by specific meta data or the node attribute value, expecting to simulate realistic FL scenarios.
- Some other splitters are designed to provide various non-i.i.d.ness, including covariate shift, concept drift, and prior probability shift
还提供了实验数据集
GNNModelZoo
为不同级别的任务指定的模型将会遇到异构的输入和/或输出,因此需要不同的体系结构。 有四种类的neural network model (1) Encoder: embeds the raw node attributes or edge attributes, e.g., atom encoder and bond encoder.
(2) GNN: learns discriminative representations for the nodes from their original representations (raw or encoded) and the graph structures.
(3) Decoder: recovers these hidden representations back into original node attributes or adjacency relationships.
(4) Readout: aggregates node representations into a graph representation, e.g., the mean pooling
有各种变体, such as GPR-GNN
用户可以构造库内包含的GNN以外的模型
调参模块
由于联邦学习会进行很多次通信,每次参数调试都需要较大的成本
- 应用multi-fidelity HPO
- (1)为每次试验制作有限的FL轮,而不是一个完整的FL课程;(2)在每一轮中抽取一小部分客户,如𝐾(𝐾≪𝑁)。
允许每个client采用个性化的本地模型,并能提示client是否改采用个性化处理
OFF-THE-SHELF ATTACK AND DEFENCE ABILITIES
研究者可以在实验中选取攻击方式和防御方式,来评估联邦学习隐私保护能力
实验
我们利用FS-G进行了广泛的实验,目的是验证FS-G的实现正确性,为长期以来需要的FGL建立基准。
在本研究中,我们考虑了三种不同的设置:
(1)本地:每个客户端都用其数据训练一个GNN模型。
(2) FGL:FedAvg、FedOpt和FedProx被用于协作训练GNN模型。
(3)全局:在完整的数据集上训练一个GNN模型。
Node-level Tasks
模型:GCN, GraphSage, GAT, and GPR-GNN 按社区和随机两种方式分割数据集
Link-level Tasks.
Graph-level Tasks
总结
- 我们实现了一个FGL软件包,FS-G,以促进FGL的研究和应用
- 利用FS-G,FGL算法可以以统一的方式表示,根据全面和统一的基准进行验证,并进一步有效地调优
- 同时,FS-G提供了丰富的插件攻击和防御工具,来评估感兴趣的FGL算法的隐私泄漏水平。