Tuning - TeamCohen/ProPPR GitHub Wiki
Table of Contents
Tuning the grounding parameters
A minimal grounding command looks something like this:
$ java -cp .:${CLASSPATH} edu.cmu.ml.proppr.Grounder --programFiles \
textcat.wam:toylabels.cfacts:toywords.graph --queries train.examples \
--grounded train.examples.grounded
The same command with all tunable parameters specified looks something like this:
$ java -cp .:${CLASSPATH} edu.cmu.ml.proppr.Grounder --programFiles \
textcat.wam:toylabels.cfacts:toywords.graph --queries train.examples \
--grounded train.examples.grounded --apr eps=1e-5:alph=0.01
When grounding, all you can really change is how far in the graph the prover will walk until it gives up. This is determined by epsilon
and alpha
, which are part of the --apr
(a_pproximate p_age r_ank) command line option.
epsilon
- If the weight on the current node divided by its out degree is below this value, the prover will terminate the current path. This places an upper bound on the error of the weight of a node.alpha
- At each step,alpha
of the weight on the current node is sent to the restart (query) node, and the rest goes to the node's descendants. This means ProPPR prefers short proofs, since longer paths will have more of their weight siphoned off to the query node. Additionally,alpha
andepsilon
together place a bound on the size of the graph at1 / (alpha*epsilon)
.
For practical purposes, this means:
- If you know the query and solution nodes are connected, but ProPPR is not producing any solutions, decrease epsilon to get the prover to walk deeper into the graph before giving up. (You can also achieve this by decreasing alpha, which would prevent weight from hemmorhaging off of a path before it could reach a solution, but this is less stable and might take longer to converge.)
- If your graph has a lot of fanout (nodes with a high out degree) and you want to keep the descendants of high-fanout nodes, decrease alpha to something on the order of 1/(expected out degree).
- If your results are not repeatable between runs, check the unnormalized weights of solutions (run QueryAnswerer with
--unnormalized
) and decrease epsilon so that the error on the unnormalized weights is acceptable. - If your graphs are too large, increase epsilon so that the prover doesn't walk as far.
- If grounding takes too long, increase alpha to cycle weight through the graph faster and increase the speed of convergence.
You want epsilon
and alpha
to be as large as possible without compromising either recall (the ability of ProPPR to reach solutions) or the accuracy of the pagerank approximation.
Tuning the learning parameters
A minimal training command looks something like this:
java -cp .:${CLASSPATH} edu.cmu.ml.proppr.Trainer --train train.examples.grounded --params params.wts --epochs 20
The same command with all^H^H^H most tunable parameters specified looks something like this:
java -cp .:${CLASSPATH} edu.cmu.ml.proppr.Trainer --train train.examples.grounded --params params.wts --epochs 20 --srw l2plocal:mu=0.001:eta=1.0:delta=0.4 --apr alph=0.01:depth=100
NOTE: Most of the SRW implementations use the power-iteration method to compute pagerank, which is most similar to PprProver. DprSRW uses the same method to compute pagerank as DprProver, but it is much less well-tested.
SRW parameters:
mu
- determines how much of the regularization loss (L1 or L2 depending on the srw impl) is taken off a parameter at each example. Moremu
is more regularization, lessmu
is less regularization. Set to 0 to turn regularization off.eta
- determines the base learning rate. Moreeta
means faster learning/larger steps, lesseta
means slower learning/smaller steps.delta
- determines how much negative labels are boosted. Normally the scores for solutions are close to zero, which means that for positive labels (which want the score to be 1.0), the loss is large, and the gradient is large. For negative labels (which want the score to be 0.0), the loss is small, so the gradient is small. This means that learning is disproportionately affected by the positive labels. In negative instance boosting, we take the maximum score for the positive labels, increase it by delta, then (with some additional smarts) use this as a scaling factor for the negative label scores. Negative instance boosting is disabled fordelta
>0.5, and this is also the default.
For practical purposes, this means:
- If learning doesn't seem to be working, turn off regularization by setting
mu=0
. If regularization is too high relative to the gradient, your parameter vector can't make any progress. This often happens for larger graphs or larger numbers of training examples. ** If learning still isn't working, run with--traceLosses
to see if you're getting convergence in the parameter vector. If the changes made at each epoch are still large by final iteration, increase--epochs
. If this doesn't work (or takes way too long to run), increase the learning rate by settingeta
> 1. - If negative examples are being ranked too highly, turn on negative instance boosting by setting
delta
< 0.5. - more later...
Pagerank parameters:
- more later...