Design Decisions

For sampling the random next neighbour in our walks, we're using Inverse Transform Sampling.
1. Build an array of the cumulative sums of all probabilities / weights.
2. Generate a random number from a uniform distribution between 0 and 1 (or the sum of all weights, if using weights instead of probabilites).
3. Search through the array to find the first index where the entry is larger than the generated number.
  - The neighbour corresponding to this index is the result of the sample.
Even though the asymptotic runtime of this method is worse than that of the alias method (see NegativeSampler), we decided on using this method.
- Reason: The constant factors of building the Alias method are larger („$O(n) \neq O(n)$“) and don't amortize themselves if we only use the distribution for one lookup.
- Asymptotic runtimes
  - Alias Method: O(n) setup, O(1) lookup.
  - ITS: O(n) setup, O(log n) lookup.
- Cost Example:\
  - For a max degree of 9458 (like on the GitHub graph), binary search takes roughly log₂(9458) ≈ 14 comparisons per lookup. Since only one lookup is performed per distribution build, the initialization cost dominates and this cost is much smaller for our method than the alias method.

RandomWalkGenerator - axkoro/graph-impute GitHub Wiki