Attributed DeepWalk - axkoro/graph-impute GitHub Wiki
- Calculate edge weights based on feature similarity and structural similarity → EdgeWeightCalculator
- Perform random walks (that take these edge weights into account) → RandomWalkGenerator
- Generate node embeddings by training the Skip-gram model (using the random walk results) → SkipGram
- Impute missing features using nodes that have similar embeddings → Imputation Using Embeddings
Source: original paper & lecture slides
- Calculating the edge weights
no_edge_weights
-
fusion_coefficient
- Typical values: 0.5-0.6[^1]
- Random Walks
- Training the Skip-gram model
-
embedding_size
-
context_window
-
num_negative_samples
- Typical values: 5–20 and 2-5 for "large" datasets[^3]
-
smoothing_exponent
- Typical values: 0.753
- Actually, the distribution from which the negative samples are drawn is itself a parameter.
- The unigram distribution with an exponent of 3/4 outperformed other distributions in Word2vec3.
- Our way of actually measuring the distribution and then smoothing it by exponentiating is probably more accurate (but also more expensive).
-
num_epochs
-
learning_rate
-
- Imputing features
top_similar
-
similarity_metric
- Choices: "cosine" or "dot_product"
- How to use the embeddings for imputation
- Find the k most similar neighbors and use their features
- What similarity metric to use for the node embeddings
- dot product
- cosine similarity
- euclidian distance
- How to use the features
- average
-
weighted average, e.g.
$f_i = \frac{\sum_{j \in N(i)} \text{sim}(i,j) \cdot f_j}{\sum_{j \in N(i)} \text{sim}(i,j)}$
- What similarity metric to use for the node embeddings
- Train a model that predicts features based on node embeddings
- Perform supervised training using the subset of nodes where the features are observed
- Find the k most similar neighbors and use their features
- Somehow increase the features influence on the embeddings
- Within SkipGram
- Before starting the training, initialize the embedding vectors with the feature vectors (scaled to the correct dimension of course)
- Add a term to the cost function that penalizes differences in embedding vectors between nodes with similar features
- This could use the precomputed feature similarity values (from EdgeWeightCalculator) for determining the cost
- Within Random Walks
- make it over-proportionally more likely to visit a node if the edge-weight is high (not just linearly)
- Within SkipGram
- Tune Random Walks
- dynamically adjust transition probabilities
- e.g. make it less like to revisit a node you came from (like in node2vec)
- dynamically adjust transition probabilities
- Add an additional smoothing step after the imputation
- E.g. like where each (missing) feature is iteratively smoothed like this
$f_i^{(t+1)} = \alpha \, f_i^{(t)} + (1-\alpha) \frac{\sum_{j \in N(i)} \text{sim}(i,j) \, f_j^{(t)}}{\sum_{j \in N(i)} \text{sim}(i,j)}$ - Non-missing features may or may not be fixed here
- E.g. like where each (missing) feature is iteratively smoothed like this
[^1]: Attributed DeepWalk paper ↩️2 ↩️3 ↩️4 ↩️5 ↩️6
[^2]: DeepWalk paper ↩️2 ↩️3 ↩️4 ↩️5 ↩️6 ↩️7
[^3]: Negative Sampling paper ↩️2 ↩️3
[^4]: Word2vec paper ↩️2