kNN - axkoro/graph-impute GitHub Wiki

Potential Parameters

  • Neighborhood: k-hop or k-nearest
  • Size of the neighborhood $k$
  • How to use the observed features in the neighborhood for imputation
    • Mean
    • Median (only really useful for non-continuous features)
    • Weighted mean (e.g. weighted by the distance to the source node)
  • Whether to use imputed features for future imputations
    • If not: flag imputed features, so that they may be ignored for future imputations
  • What to do if feature is not present in neighborhood
    • Use global average
    • Continue searching until the feature is found (or until it was found a certain number of times)
    • Use constant value

Potential Optimizations

  • Strategy for filling features that couldn't be imputed from the neighbourhood
    • Before: calculate global average of the feature on-demand and every time this case occurs
    • Current: calculate global average of the feature on-demand and store the result (e.g. in a HashMap), instead of calculating it repeatedly
      • might lead to worse quality, because changes in the actual global average will not be represented
      • will increase solving times, especially in graphs with a lot of isolated nodes
  • Parallelization
  • Use iterators instead of copying vectors (e.g. in get_neighbours)
  • In the second for-loop: Iterate over neighbourhood instead of iteration over the features (see draft)
    • Might improve cache efficiency because all operations on the feature vector of a neighbour are performed in a batch (see principle of locality).
    • Will need three additional arrays, on the other hand.
    • → Benchmark this

Design Decisions

Initial draft with explanation for design decisions

⚠️ **GitHub.com Fallback** ⚠️