Potential Parameters

Neighborhood: k-hop or k-nearest
Size of the neighborhood $k$
How to use the observed features in the neighborhood for imputation
- Mean
- Median (only really useful for non-continuous features)
- Weighted mean (e.g. weighted by the distance to the source node)
Whether to use imputed features for future imputations
- If not: flag imputed features, so that they may be ignored for future imputations
What to do if feature is not present in neighborhood
- Use global average
- Continue searching until the feature is found (or until it was found a certain number of times)
- Use constant value

Potential Optimizations

Strategy for filling features that couldn't be imputed from the neighbourhood
- Before: calculate global average of the feature on-demand and every time this case occurs
- Current: calculate global average of the feature on-demand and store the result (e.g. in a HashMap), instead of calculating it repeatedly
  - might lead to worse quality, because changes in the actual global average will not be represented
  - will increase solving times, especially in graphs with a lot of isolated nodes
Parallelization
Use iterators instead of copying vectors (e.g. in get_neighbours)
In the second for-loop: Iterate over neighbourhood instead of iteration over the features (see draft)
- Might improve cache efficiency because all operations on the feature vector of a neighbour are performed in a batch (see principle of locality).
- Will need three additional arrays, on the other hand.
- → Benchmark this

Initial draft with explanation for design decisions