Optimizer Options - neosr-project/neosr GitHub Wiki
This page describes all optimizers, schedulers and their options currently implemented in neosr
.
Optimizers
Adam
, adam
See pytorch documentation for all options.
[train.optim_g]
type = "adam"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0
AdamW
, adamw
See pytorch documentation for all options.
[train.optim_g]
type = "adamw"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0
NAdam
, nadam
See pytorch documentation for all options.
[train.optim_g]
type = "nadam"
lr = 5e-4
betas = [ 0.98, 0.99 ]
weight_decay = 0.01
decoupled_weight_decay = true
Adan
, adan
[train.optim_g]
type = "adan"
lr = 5e-4
betas = [ 0.98, 0.92, 0.99 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0.0
no_prox = true
forearch = true
AdamW_Win
, adamw_win
[train.optim_g]
type = "adamw_win"
lr = 5e-4
betas = [ 0.98, 0.999 ]
reckless_steps = [ 2.0, 8.0 ]
eps = 1e-8
weight_decay = 0.02
amsgrad = false
max_grad_norm = 0.0
acceleration_mode = "win2" # "win"
AdamW_SF
, adamw_sf
[train.optim_g]
type = "adamw_sf"
lr = 8e-4
betas = [ 0.9, 0.99 ]
eps = 1e-8
weight_decay = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true
[!NOTE] The parameter
schedule_free
MUST be in the configuration file for this optimizer to work as intended. Enablingema
is recommended. If you wish to use warmup, do so through the optionwarmup_steps
(iters) instead ofwarmup_iter
, since the later is implemented outside the optimizer.
Adan_SF
, adan_sf
[train.optim_g]
type = "adan_sf"
lr = 8e-4
betas = [ 0.98, 0.92, 0.987 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true
[!NOTE] The parameter
schedule_free
MUST be in the configuration file for this optimizer to work as intended. Enablingema
is recommended. If you wish to use warmup, do so through the optionwarmup_steps
(iters) instead ofwarmup_iter
, since the later is implemented outside the optimizer.
SOAP_SF
, soap_sf
[train.optim_g]
type = "soap_sf"
lr = 1e-3
schedule_free = true
warmup_steps = 1600
beta = 0.9
beta2_scale = 0.8
eps = 1e-8
weight_decay = 0.01
precondition_frequency = 10
max_precond_dim = 2048
merge_dims = true
precondition_1d = false
normalize_grads = false
correct_bias = true
r = 0.0
weight_lr_power = 2.0
gradient_clip_val = 0.1
split = false
foreach = true
mars = false
caution = false
mars_gamma = 0.0025
[!NOTE] Recommended learning rate of
1e-3
for generator and1e-4
for discriminator. The parameterschedule_free
MUST be in the configuration file for this optimizer to work as intended. Enablingema
is recommended. If you wish to use warmup, do so through the optionwarmup_steps
(iters) instead ofwarmup_iter
, since the later is implemented outside the optimizer.
Sharpness-Aware Minimization
fsam
FriendlySAM can be enabled by using the following options:
[train]
sam = "fsam"
sam_init = 1000
[!IMPORTANT] When training from scratch and with low batch sizes (less than 8), SAM could cause
NaN
. In that case, usesam_init
to start SAM only after N iterations. When using AMP (automatic mixed precision), be careful. Due to limitations on pytorch's GradScaler, SAM does not scale gradients to appropriate precision ranges, which could lead toNaN
.
Schedulers
MultiStepLR
, multisteplr
[train.scheduler]
type = "multisteplr"
milestones = [ 60000, 120000 ]
gamma = 0.5
This scheduler drops the learning rate by gamma
at each milestones
(iters).
CosineAnnealing
, cosineannealing
[train.scheduler]
type = "cosineannealing"
T_max = 160000
eta_min = 4e-5
This scheduler drops the learning rate to eta_min
until it reaches T_max
(iters).