Optimizer Options - neosr-project/neosr GitHub Wiki

This page describes all optimizers, schedulers and their options currently implemented in neosr.

Optimizers


Adam, adam

See pytorch documentation for all options.

[train.optim_g]
type = "adam"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

AdamW, adamw

See pytorch documentation for all options.

[train.optim_g]
type = "adamw"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

NAdam, nadam

See pytorch documentation for all options.

[train.optim_g]
type = "nadam"
lr = 5e-4
betas = [ 0.98, 0.99 ]
weight_decay = 0.01
decoupled_weight_decay = true

Adan, adan

[train.optim_g]
type = "adan"
lr = 5e-4
betas = [ 0.98, 0.92, 0.99 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0.0
no_prox = true
forearch = true

AdamW_Win, adamw_win

[train.optim_g]
type = "adamw_win"
lr = 5e-4
betas = [ 0.98, 0.999 ]
reckless_steps = [ 2.0, 8.0 ]
eps = 1e-8
weight_decay = 0.02
amsgrad = false
max_grad_norm = 0.0
acceleration_mode = "win2" # "win"

AdamW_SF, adamw_sf

[train.optim_g]
type = "adamw_sf"
lr = 8e-4
betas = [ 0.9, 0.99 ]
eps = 1e-8
weight_decay = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

[!NOTE] The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.


Adan_SF, adan_sf

[train.optim_g]
type = "adan_sf"
lr = 8e-4
betas = [ 0.98, 0.92, 0.987 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

[!NOTE] The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.


SOAP_SF, soap_sf

[train.optim_g]
type = "soap_sf"
lr = 1e-3
schedule_free = true
warmup_steps = 1600
beta = 0.9
beta2_scale = 0.8
eps = 1e-8
weight_decay = 0.01
precondition_frequency = 10
max_precond_dim = 2048
merge_dims = true
precondition_1d = false
normalize_grads = false
correct_bias = true
r = 0.0
weight_lr_power = 2.0
gradient_clip_val = 0.1
split = false
foreach = true
mars = false
caution = false
mars_gamma = 0.0025

[!NOTE] Recommended learning rate of 1e-3 for generator and 1e-4 for discriminator. The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.


Sharpness-Aware Minimization

fsam

FriendlySAM can be enabled by using the following options:

[train]
sam = "fsam"
sam_init = 1000

[!IMPORTANT] When training from scratch and with low batch sizes (less than 8), SAM could cause NaN. In that case, use sam_init to start SAM only after N iterations. When using AMP (automatic mixed precision), be careful. Due to limitations on pytorch's GradScaler, SAM does not scale gradients to appropriate precision ranges, which could lead to NaN.


Schedulers

MultiStepLR, multisteplr

[train.scheduler]
type = "multisteplr"
milestones = [ 60000, 120000 ]
gamma = 0.5

This scheduler drops the learning rate by gamma at each milestones (iters).


CosineAnnealing, cosineannealing

[train.scheduler]
type = "cosineannealing"
T_max = 160000
eta_min = 4e-5

This scheduler drops the learning rate to eta_min until it reaches T_max (iters).