Overview of Deep Learning on Graph Embeddings - lshhhhh/deep-learning-study GitHub Wiki

https://towardsdatascience.com/overview-of-deep-learning-on-graph-embeddings-4305c10ad4a4

Graph embedding은 간단하지만 매우 효과적인 방법으로 graph를 ML task에 최적의 format이 되도록 변환한다. 이러한 단순함 때문에 종종 상당히 scalable하며 구현이 용이하다. 성능이나 효율성에 영향을 주지 않고 대부분의 네트워크와 그래프에 적용될 수 있다.

Graph embedding은 다른 레벨의 granularity(단위의 세밀함: node 레벨, sub-graph 레벨, graph walks와 같은 strategies)로 다양한 방법으로 할 수 있다. 가장 인기있는 몇몇 방법을 소개해보겠다.

DeepWalk

Graph walk embedding
각 node를 arbitrary representation vector로 나타내고, 그래프를 traverse한다. 한 node vector과 그 다음 step의 node vector를 나란히 옆에 배열하는 방식의 행렬로 traversal의 step들을 aggregate할 수 있다. 그리고 그렇게 graph를 나타낸 행렬을 RNN에 feed할 수 있다. Traversals에서 일부를 잘라낸 step들을 RNN의 input으로도 사용할 수 있을거다. 이것은 문장 안의 단어 vector들이 합쳐지는 것과 유사하다.
DeepWalk는 아래 식을 이용하여 일련의 random walks를 완성하고자 한다.
목표: 여태까지 random walk로 방문한 이전의 모든 node들 다음으로 v_i가 관찰될 확률 예측

v_i: i번째 node Pr: probability Φ: mapping func (node v -> latent representation(NN의 input))

NN은 walk동안 어떤 node가 얼마나 자주 encounter되는지를 이용하여 node feature이나 classification에 대한 예측을 한다. 예측하는 방법으로 Word2vec에서와 같이 skip-gram을 이용한다. ("context" <-> "connectivity, structural role, and node features")
DeepWalk는 time complexity가 O(|V|)로 상대적으로 효율적이지만, 노드가 추가될 때마다 모델이 embedding을 다시 학습하고 새 node로부터 배워야한다는 점에서 transductive하다.
- Transductive Learning: domain adaptation, test x에 대한 정보가 주어진 상황에서 test y를 맞추는 문제, Test 데이터의 x에 대한 정보에는 node의 graph structure도 포함된다. 예를 들면, test data에 있는 node가 이미 training graph에 등장했던 node들이다. 즉 test graph가 training graph에 포함된 상황이다.
- Inductive Learning: supervised learning의 상황. text x에 대한 정보가 전혀 주어지지 않은 상황에서 test y를 맞추는 문제, test A matrix와 training A matrix가 서로 다르다.
- 출처: https://newsight.tistory.com/313

Node2vec

Graph walk embedding
Node를 word와 같이 다루는 점에서 DeepWalk와 비슷
Node2vec은 parameter p, q로 부터 생성된 walk bias variable α를 가진다. p는 BFS 방식을 우선시하고, q는 DFS 방식을 우선시한다. 다음 walk를 어디로 할지는 1/p나 1/q의 확률로 결정된다. 작동 방식을 떠올려보면 BFS는 local neighbor들을 학습하기에 이상적이고, DFS는 global variable들을 학습하기 좋을 것이다. Task에 따라 두 priority를 switch하며 진행할 수 있을 것이다. 즉 single graph에 대해, Node2vec은 parameter의 값에 따라 다른 결과를 낼 수 있다.
Node2vec도 DeepWalk와 같이 walks의 latent embedding을 NN의 input으로 사용해 node를 분류한다.
실험을 통해 증명된 것은
- BFS: structural roles (hubs, bridges, outliers, etc.)에 따른 분류에 사용하기 좋고,
- DFS: community 중심 분류에 좋다는 것이다.

Graph2vec

Graph의 sub-graph embedding
Doc2vec에서의 equation의 변형으로 그 논문에서의 inspiration과 비슷하고, 같은 방법으로 증명될 수 있다.
"document : sentences : words" <-> "graph : sub-graphs : nodes"

V: vocab list d~: doc embedding vector w_j~: w_j embedding vector
Subgraph의 latent vector가 NN의 input으로 들어간다고 짤막하게 소개하는데, 궁금할 때 더 찾아봐야겠음

Structural Deep Network Embedding (SDNE)

앞의 technique들과 달리 random walk는 사용하지 않는다. 두가지 metrics로부터 학습한다.
- First-order proximity: 두 node가 edge를 공유하면 similar하다고 생각 - pairwise similarity
  - Laplacian Eigenmap embedding 알고리즘: similar node들이 멀게 mapping되면 penalty => loss function
- Second-order proximity: 두 node가 많은 이웃한/인접한 node들을 공유한다면 similar하다고 생각
  - Graph의 adjacency matrix를 autoencoder의 input으로 => reconstruction loss function
- 두 loss function을 함께 최소화하도록 하여 graph embedding을 만든다.

Large-scale Information Network Embedding (LINE)

각 node 쌍에 대하여 두 joint probability distribution을 정의하고 KL divergence를 이용하여 그 차이를 최소화한다. 두 distribution는 1) adjacency matrix, 2) 두 node embedding의 dot product이다.

Hierarchical Representation Learning for Networks

이전에 언급했던 embedding/walking based 모델들은 local optima에 빠질 위험이 있다.
Local optima를 피하기 위해서 더 나은 weight init이 필요하고, 관련된 node들을 supernode로 묶기 위해 graph coarsening이라는 것을 한다.
HARP는 기본적으로 더 빠른 학습을 하기 위해 graph를 간단히 하는 graph 전처리 step이다. 이 알고리즘을 다른 앞에서 설명한 알고리즘과 합하면 큰 향상을 시킬 수 있다.
Graph를 coarsening한 후, 가장 coarse한 supernode의 embedding을 만들고, supernode들을 사용하여 graph 전체의 embedding을 만든다.

(참고) From Theory to Practice Representing Graphs

https://medium.com/basecs/from-theory-to-practice-representing-graphs-cfd782c5be38