Implementation Details - TeamCohen/ProPPR GitHub Wiki
Parameter Vector
Parallelization
ProPPR's implementation of the parameter vector supports multithreaded access. Blocking is reduced using Java's ConcurrentHashMap
, which permits simultaneous access to disjoint portions of the map by partitioning it and giving each partition its own lock. To ensure that increment operations don't run into each other, we use the following loop:
ConcurrentHashMap paramVector;
...
double oldvalue = paramVector.get(key);
while( !paramVector.replace(key, oldvalue, oldvalue+increment))) {
oldvalue = paramVector.get(key);
try {
Thread.sleep(1);
} catch (InterruptedException e) {}
}
where ConcurrentHashMap.replace
returns true and updates the key if the current value matches oldvalue, and returns false (and doesn't update) otherwise. This means that if a key was updated by another thread after our get()
but before we could put()
, the parameter vector remains unpolluted and retains all updates.
Lazy Regularization
In normal regularization, all parameters are regularized at every example, even parameters the example didn't need. We can dramatically reduce the complexity of updates by only performing regularizaion on parameters active in the current example. To do this, we add a clock field for each parameter listing the last time it was updated. At the next update time, we apply n regularization cycles to catch the parameter up to the current clock.
This is implemented by wrapping the parameter vector in an abstract class which outwardly behaves as a Java Map
but inwardly permits subclasses to maintain an inner backing store. The normal-regularization subclass uses an inner map with parameters as keys and double
weights as values. For lazy regularization, we define a class TimesampedWeight
which wraps a double
weight with an int
clocktick, and use this as the value type. The lazy regularization subclass intercedes as appropriate to set, retrieve, and update the clocktick, and maintains the master clock. SRW classes which use lazy regularization specify a lazy parameter vector int their setup routine, and notify the lazy parameter vector when a new example is received so it can advance the clock.