kNN - axkoro/graph-impute GitHub Wiki
- Neighborhood: k-hop or k-nearest
- Size of the neighborhood
$k$ - How to use the observed features in the neighborhood for imputation
- Mean
- Median (only really useful for non-continuous features)
- Weighted mean (e.g. weighted by the distance to the source node)
- Whether to use imputed features for future imputations
- If not: flag imputed features, so that they may be ignored for future imputations
- What to do if feature is not present in neighborhood
- Use global average
- Continue searching until the feature is found (or until it was found a certain number of times)
- Use constant value
- Strategy for filling features that couldn't be imputed from the neighbourhood
- Before: calculate global average of the feature on-demand and every time this case occurs
- Current: calculate global average of the feature on-demand and store the result (e.g. in a HashMap), instead of calculating it repeatedly
- might lead to worse quality, because changes in the actual global average will not be represented
- will increase solving times, especially in graphs with a lot of isolated nodes
- Parallelization
- Use iterators instead of copying vectors (e.g. in
get_neighbours
) - In the second for-loop: Iterate over neighbourhood instead of iteration over the features (see draft)
- Might improve cache efficiency because all operations on the feature vector of a neighbour are performed in a batch (see principle of locality).
- Will need three additional arrays, on the other hand.
- → Benchmark this
Initial draft with explanation for design decisions