Fine Tuning The Adam Optimizer - jcjohnson/neural-style GitHub Wiki

Neural-Style has two choices for optimizers, L-BFGS, and Adam. Of these two choices, Adam is more efficient, but seems to suffer in terms of quality as result. However, it appears like some of these issues with Adam can be corrected by adjusting the parameters which the optimizer uses.

In the optim library that Neural-Style uses, adam.lua contains all the usable parameters for the Adam optimizer:

ARGS:
- 'opfunc' : a function that takes a single input (X), the point
             of a evaluation, and returns f(X) and df/dX
- 'x'      : the initial point
- 'config` : a table with configuration parameters for the optimizer
- 'config.learningRate'      : learning rate
- `config.learningRateDecay` : learning rate decay
- 'config.beta1'             : first moment coefficient
- 'config.beta2'             : second moment coefficient
- 'config.epsilon'           : for numerical stability
- 'config.weightDecay'       : weight decay
- 'state'                    : a table describing the state of the optimizer; after each
                              call the state is modified

Of these parameters, the beta1 and epsilon seem to be the best for helping correct some of Adam's issues, like gray/grey spots.

See the comments starting from here or the post here for some experiments involving Adam's parameters, and their effects on style transfer outputs.

This modified version of neural_style.lua was created to simplify experimentation with Adam's parameters, but experimentation can also be done by manually modifying lines 233-236, in neural_style.lua.

The optimal parameters for Adam appear to be:

    optim_state = {
      learningRate = params.learning_rate,
      beta1 = 0.99,
      epsilon = 1e-1,
    }