vw hyperopt plans - martinpopel/vowpal_wabbit GitHub Wiki
Plans for a new vw-hyperopt script
We have vw-hypersearch, but it can handle only one hyperparameter and the golden-section search works only for unimodal (e.g. convex) functions.
vw-experiment
vw-experiment is a simple script, which computes test and train loss. It will be used from vw-hyperopt, but it is useful by itself.
vw-experiment
--train=train.dat
--test=test.dat
--vw=../vw
--train_loss_examples=1e5
vw-hyperopt
Example usage:
vw-hyperopt --train=train.dat --test=test.dat \
vw --loss_function=[hinge,logistic,squared] \
--l1=[1e-10..0.005]L -q=[ff]O -b=[18..23]IO --passes=[2,4,8]O
Semantics:
[a,b,c]... try the listed values (numbers or strings) for a given parameter[a,b,c]O... try also omitting the parameter[min..max]... range of real values[min..max]I... range of integer values[min..max]L... range of real values with logarithmic scale[min..max]O... try also omitting the parameter- modifiers I, L and O can be combined
VW parameters with special handling:
ALWAYS:
-c --cacheis always added for speedup
FORBIDDEN:
-k --kill_cacheis not forwarded to vw (but the cache file is deleted)-d --datais overriden by --train and --test-t --testonly-f --final_regressor-a --audit--readable_model arg--invert_hash arg
QUESTIONABLE:
-i --initial_regressor--holdout_off--save_resume--cache_file
vw-hyperopt parameters:
--traintraining data [required]--testdevelopment test data [recommended]--train_loss_examples=Nnumber of examples for computing train loss (viavw --examples -t -d train.dat). 0 means do not compute train loss. "all" means use the whole train.dat. Default is 100,000.--save_modelsall/only the best--save_logs--jobs=N... N parallel jobs, default=autodetect based on number of cores--noisecompute also the irreducible error (loss) via vw-overfit--plottikz,png--searchexhaustive, random,... We could have also--randseed,--timeout,--rounds(of hill-climbing)
Related links:
- http://fastml.com/optimizing-hyperparams-with-hyperopt/
- http://www.eng.uwaterloo.ca/~jbergstr/research.html#modelsearch
- http://nlpers.blogspot.cz/2014/10/hyperparameter-search-bayesian.html
- https://github.com/HIPS/Spearmint
- http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
Drawing plots
It would be nice if vw-hyperopt could produce (e.g. png) plots with:
- test loss if
--test - train loss if
--train_loss_examples - irreducible error (loss) if
--noise - progressive validation error if
--pve - time_train if
--time_train - time_test if
--time_test
My understanding of Variance-Bias Tradeoff
(Note that "corresponds to" here means "is an estimate of".) Train loss corresponds to Bias^2 + noise. Test loss corresponds to Bias^2 + noise + Variance. The difference between train loss and test loss corresponds to the Variance. The amount of Variance corresponds to the amount of over-training. (http://scott.fortmann-roe.com/docs/BiasVariance.html)
Rationale:
If test loss curve is close to the noise cure, no more hyperparameter tuning can help. You must add new features to the train data.
If over-training is the problem, there are several things you can do about it:
- get more training data
- apply (higher) regularization (
--l1or--l2) - try bagging with
-B - restrain the options below for fighting high Bias (except the first one or two)
If high Bias is the problem (i.e. underfitting):
- make sure the training data is shuffled
- higher
-b(--bit_precision) - lower/no regularization
- more
--passesor higher--learning_rate - get more features, either truly new features or nonlinear combinations via
--quadratic,--cubic,--stage_poly,--lrq,--ngram,--nnetc.