Optimizer Options - neosr-project/neosr GitHub Wiki

This page describes all optimizers, schedulers and their options currently implemented in neosr.

Optimizers

`Adam`, `adam`

See pytorch documentation for all options.

[train.optim_g]
type = "adam"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

`AdamW`, `adamw`

See pytorch documentation for all options.

[train.optim_g]
type = "adamw"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

`NAdam`, `nadam`

See pytorch documentation for all options.

[train.optim_g]
type = "nadam"
lr = 5e-4
betas = [ 0.98, 0.99 ]
weight_decay = 0.01
decoupled_weight_decay = true

`Adan`, `adan`

[train.optim_g]
type = "adan"
lr = 5e-4
betas = [ 0.98, 0.92, 0.99 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0.0
no_prox = true
forearch = true

`AdamW_Win`, `adamw_win`

[train.optim_g]
type = "adamw_win"
lr = 5e-4
betas = [ 0.98, 0.999 ]
reckless_steps = [ 2.0, 8.0 ]
eps = 1e-8
weight_decay = 0.02
amsgrad = false
max_grad_norm = 0.0
acceleration_mode = "win2" # "win"

`AdamW_SF`, `adamw_sf`

[train.optim_g]
type = "adamw_sf"
lr = 8e-4
betas = [ 0.9, 0.99 ]
eps = 1e-8
weight_decay = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

[!NOTE] The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.

`Adan_SF`, `adan_sf`

[train.optim_g]
type = "adan_sf"
lr = 8e-4
betas = [ 0.98, 0.92, 0.987 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

[!NOTE] The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.

`SOAP_SF`, `soap_sf`

[train.optim_g]
type = "soap_sf"
lr = 1e-3
schedule_free = true
warmup_steps = 1600
beta = 0.9
beta2_scale = 0.8
eps = 1e-8
weight_decay = 0.01
precondition_frequency = 10
max_precond_dim = 2048
merge_dims = true
precondition_1d = false
normalize_grads = false
correct_bias = true
r = 0.0
weight_lr_power = 2.0
gradient_clip_val = 0.1
split = false
foreach = true
mars = false
caution = false
mars_gamma = 0.0025

[!NOTE] Recommended learning rate of 1e-3 for generator and 1e-4 for discriminator. The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.

Sharpness-Aware Minimization

`fsam`

FriendlySAM can be enabled by using the following options:

[train]
sam = "fsam"
sam_init = 1000

[!IMPORTANT] When training from scratch and with low batch sizes (less than 8), SAM could cause NaN. In that case, use sam_init to start SAM only after N iterations. When using AMP (automatic mixed precision), be careful. Due to limitations on pytorch's GradScaler, SAM does not scale gradients to appropriate precision ranges, which could lead to NaN.

Schedulers

`MultiStepLR`, `multisteplr`

[train.scheduler]
type = "multisteplr"
milestones = [ 60000, 120000 ]
gamma = 0.5

This scheduler drops the learning rate by gamma at each milestones (iters).

`CosineAnnealing`, `cosineannealing`

[train.scheduler]
type = "cosineannealing"
T_max = 160000
eta_min = 4e-5

This scheduler drops the learning rate to eta_min until it reaches T_max (iters).

Optimizer Options - neosr-project/neosr GitHub Wiki

Optimizers

Adam, adam

AdamW, adamw

NAdam, nadam

Adan, adan

AdamW_Win, adamw_win

AdamW_SF, adamw_sf

Adan_SF, adan_sf

SOAP_SF, soap_sf

Sharpness-Aware Minimization

fsam

Schedulers

MultiStepLR, multisteplr

CosineAnnealing, cosineannealing

`Adam`, `adam`

`AdamW`, `adamw`

`NAdam`, `nadam`

`Adan`, `adan`

`AdamW_Win`, `adamw_win`

`AdamW_SF`, `adamw_sf`

`Adan_SF`, `adan_sf`

`SOAP_SF`, `soap_sf`

`fsam`

`MultiStepLR`, `multisteplr`

`CosineAnnealing`, `cosineannealing`