Fine Tuning The Adam Optimizer - jcjohnson/neural-style GitHub Wiki
Neural-Style has two choices for optimizers, L-BFGS, and Adam. Of these two choices, Adam is more efficient, but seems to suffer in terms of quality as result. However, it appears like some of these issues with Adam can be corrected by adjusting the parameters which the optimizer uses.
In the optim
library that Neural-Style uses, adam.lua contains all the usable parameters for the Adam optimizer:
ARGS:
- 'opfunc' : a function that takes a single input (X), the point
of a evaluation, and returns f(X) and df/dX
- 'x' : the initial point
- 'config` : a table with configuration parameters for the optimizer
- 'config.learningRate' : learning rate
- `config.learningRateDecay` : learning rate decay
- 'config.beta1' : first moment coefficient
- 'config.beta2' : second moment coefficient
- 'config.epsilon' : for numerical stability
- 'config.weightDecay' : weight decay
- 'state' : a table describing the state of the optimizer; after each
call the state is modified
Of these parameters, the beta1
and epsilon
seem to be the best for helping correct some of Adam's issues, like gray/grey spots.
See the comments starting from here or the post here for some experiments involving Adam's parameters, and their effects on style transfer outputs.
This modified version of neural_style.lua
was created to simplify experimentation with Adam's parameters, but experimentation can also be done by manually modifying lines 233-236, in neural_style.lua
.
The optimal parameters for Adam appear to be:
optim_state = {
learningRate = params.learning_rate,
beta1 = 0.99,
epsilon = 1e-1,
}