RandomWalkGenerator - axkoro/graph-impute GitHub Wiki

Design Decisions

Sampling Random Neighbours

  • For sampling the random next neighbour in our walks, we're using Inverse Transform Sampling.
    1. Build an array of the cumulative sums of all probabilities / weights.
    2. Generate a random number from a uniform distribution between 0 and 1 (or the sum of all weights, if using weights instead of probabilites).
    3. Search through the array to find the first index where the entry is larger than the generated number.
      • The neighbour corresponding to this index is the result of the sample.
  • Even though the asymptotic runtime of this method is worse than that of the alias method (see NegativeSampler), we decided on using this method.
    • Reason: The constant factors of building the Alias method are larger („$O(n) \neq O(n)$“) and don't amortize themselves if we only use the distribution for one lookup.
    • Asymptotic runtimes
      • Alias Method: O(n) setup, O(1) lookup.
      • ITS: O(n) setup, O(log n) lookup.
    • Cost Example:\
      • For a max degree of 9458 (like on the GitHub graph), binary search takes roughly log₂(9458) ≈ 14 comparisons per lookup. Since only one lookup is performed per distribution build, the initialization cost dominates and this cost is much smaller for our method than the alias method.
⚠️ **GitHub.com Fallback** ⚠️