Algorithm

Calculate edge weights based on feature similarity and structural similarity → EdgeWeightCalculator
Perform random walks (that take these edge weights into account) → RandomWalkGenerator
Generate node embeddings by training the Skip-gram model (using the random walk results) → SkipGram
Impute missing features using nodes that have similar embeddings → Imputation Using Embeddings

Source: original paper & lecture slides

Parameters

Calculating the edge weights
- no_edge_weights
- fusion_coefficient
  - Typical values: 0.5-0.6[^1]
Random Walks
- walk_length
  - Typical values: 80¹, 10-80[^2]
- num_walks
  - Typical values: 10¹, 1-50²
Training the Skip-gram model
- embedding_size
  - Typical values: 128¹ ², 32-256²
- context_window
  - Typical values: 10¹ ²
- num_negative_samples
  - Typical values: 5–20 and 2-5 for "large" datasets[^3]
- smoothing_exponent
  - Typical values: 0.75³
  - Actually, the distribution from which the negative samples are drawn is itself a parameter.
    - The unigram distribution with an exponent of 3/4 outperformed other distributions in Word2vec³.
    - Our way of actually measuring the distribution and then smoothing it by exponentiating is probably more accurate (but also more expensive).
- num_epochs
  - Typical values: 1¹ ², 1-3[^4]
- learning_rate
  - Typical values: 2.5 %² ⁴
Imputing features
- top_similar
- similarity_metric
  - Choices: "cosine" or "dot_product"

Potential parameters & tuning

How to use the embeddings for imputation
- Find the k most similar neighbors and use their features
  - What similarity metric to use for the node embeddings
    - dot product
    - cosine similarity
    - euclidian distance
  - How to use the features
    - average
    - weighted average, e.g. $f_i = \frac{\sum_{j \in N(i)} \text{sim}(i,j) \cdot f_j}{\sum_{j \in N(i)} \text{sim}(i,j)}$
- Train a model that predicts features based on node embeddings
  - Perform supervised training using the subset of nodes where the features are observed
Somehow increase the features influence on the embeddings
- Within SkipGram
  - Before starting the training, initialize the embedding vectors with the feature vectors (scaled to the correct dimension of course)
  - Add a term to the cost function that penalizes differences in embedding vectors between nodes with similar features
    - This could use the precomputed feature similarity values (from EdgeWeightCalculator) for determining the cost
- Within Random Walks
  - make it over-proportionally more likely to visit a node if the edge-weight is high (not just linearly)
Tune Random Walks
- dynamically adjust transition probabilities
  - e.g. make it less like to revisit a node you came from (like in node2vec)
Add an additional smoothing step after the imputation
- E.g. like where each (missing) feature is iteratively smoothed like this $f_i^{(t+1)} = \alpha \, f_i^{(t)} + (1-\alpha) \frac{\sum_{j \in N(i)} \text{sim}(i,j) \, f_j^{(t)}}{\sum_{j \in N(i)} \text{sim}(i,j)}$
- Non-missing features may or may not be fixed here

[^1]: Attributed DeepWalk paper ↩️² ↩️³ ↩️⁴ ↩️⁵ ↩️⁶

[^2]: DeepWalk paper ↩️² ↩️³ ↩️⁴ ↩️⁵ ↩️⁶ ↩️⁷

[^3]: Negative Sampling paper ↩️² ↩️³

[^4]: Word2vec paper ↩️²

Attributed DeepWalk - axkoro/graph-impute GitHub Wiki

Algorithm

Parameters

Potential parameters & tuning

⚠️ GitHub.com Fallback ⚠️

Attributed DeepWalk - axkoro/graph-impute GitHub Wiki

Algorithm

Parameters

Potential parameters & tuning

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️