Attributed DeepWalk - axkoro/graph-impute GitHub Wiki

Algorithm

  1. Calculate edge weights based on feature similarity and structural similarity → EdgeWeightCalculator
  2. Perform random walks (that take these edge weights into account) → RandomWalkGenerator
  3. Generate node embeddings by training the Skip-gram model (using the random walk results) → SkipGram
  4. Impute missing features using nodes that have similar embeddings → Imputation Using Embeddings

Source: original paper & lecture slides

Parameters

  1. Calculating the edge weights
    • no_edge_weights
    • fusion_coefficient
      • Typical values: 0.5-0.6[^1]
  2. Random Walks
    • walk_length
      • Typical values: 801, 10-80[^2]
    • num_walks
      • Typical values: 101, 1-502
  3. Training the Skip-gram model
    • embedding_size
      • Typical values: 12812, 32-2562
    • context_window
      • Typical values: 1012
    • num_negative_samples
      • Typical values: 5–20 and 2-5 for "large" datasets[^3]
    • smoothing_exponent
      • Typical values: 0.753
      • Actually, the distribution from which the negative samples are drawn is itself a parameter.
        • The unigram distribution with an exponent of 3/4 outperformed other distributions in Word2vec3.
        • Our way of actually measuring the distribution and then smoothing it by exponentiating is probably more accurate (but also more expensive).
    • num_epochs
      • Typical values: 112, 1-3[^4]
    • learning_rate
      • Typical values: 2.5 %24
  4. Imputing features
    • top_similar
    • similarity_metric
      • Choices: "cosine" or "dot_product"

Potential parameters & tuning

  1. How to use the embeddings for imputation
    • Find the k most similar neighbors and use their features
      • What similarity metric to use for the node embeddings
        • dot product
        • cosine similarity
        • euclidian distance
      • How to use the features
        • average
        • weighted average, e.g. $f_i = \frac{\sum_{j \in N(i)} \text{sim}(i,j) \cdot f_j}{\sum_{j \in N(i)} \text{sim}(i,j)}$
    • Train a model that predicts features based on node embeddings
      • Perform supervised training using the subset of nodes where the features are observed
  2. Somehow increase the features influence on the embeddings
    • Within SkipGram
      • Before starting the training, initialize the embedding vectors with the feature vectors (scaled to the correct dimension of course)
      • Add a term to the cost function that penalizes differences in embedding vectors between nodes with similar features
        • This could use the precomputed feature similarity values (from EdgeWeightCalculator) for determining the cost
    • Within Random Walks
      • make it over-proportionally more likely to visit a node if the edge-weight is high (not just linearly)
  3. Tune Random Walks
    • dynamically adjust transition probabilities
      • e.g. make it less like to revisit a node you came from (like in node2vec)
  4. Add an additional smoothing step after the imputation
    • E.g. like where each (missing) feature is iteratively smoothed like this $f_i^{(t+1)} = \alpha \, f_i^{(t)} + (1-\alpha) \frac{\sum_{j \in N(i)} \text{sim}(i,j) \, f_j^{(t)}}{\sum_{j \in N(i)} \text{sim}(i,j)}$
    • Non-missing features may or may not be fixed here

[^1]: Attributed DeepWalk paper ↩️2 ↩️3 ↩️4 ↩️5 ↩️6

[^2]: DeepWalk paper ↩️2 ↩️3 ↩️4 ↩️5 ↩️6 ↩️7

[^3]: Negative Sampling paper ↩️2 ↩️3

[^4]: Word2vec paper ↩️2

⚠️ **GitHub.com Fallback** ⚠️