Losses - neosr-project/neosr GitHub Wiki

This page describes all losses and their options currently implemented in `neosr`.

`wavelet_guided`, `wavelet_init`

The wavelet_guided loss enables the use of WGSR. As explained in the paper, the purpose of this is to stability gan training and reduce artifacts. The option wavelet_init specifies the number of iterations before to enable wavelet_guided.

[train]
wavelet_guided = true
wavelet_init = 80000

[!NOTE] This loss works better for finetuning than for training from scratch. It is recommended you train the model for at least ~40k before enabling it.

`pixel_opt`

The pixel_opt option defines the pixel loss.

[train.pixel_opt]
type = "L1Loss"
loss_weight = 1.0
reduction = "mean"

The above option sets pixel loss with L1 criteria and weight of 1.0. Possible values for type are: L1Loss, MSELoss (also known as L2), HuberLoss and chc (Clipped Huber with Cosine Similarity Loss - can improve color consistency and decrease noise, reduction is done using Huber loss).

`mssim_opt`, `mssim_loss`

The mssim_opt option defines the Multi-scale SSIM loss. The implementation on neosr has been adapted from "A better pytorch-based implementation for the mean structural similarity. Differentiable simpler SSIM and MS-SSIM.". The options bellow are the defaults when calling mssim function by itself:

[train.mssim_opt]
type = "mssim_loss"
loss_weight = 1.0
window_size = 11
sigma = 1.5
in_channels = 3
K1 = 0.01
K2 = 0.03
L = 1

`ncc_opt`, `ncc_loss`

This option sets the NCC loss. It uses Normalized Cross-Correlation.

[train.ncc_opt]
type = "ncc_loss"
loss_weight = 1.0

`fdl_opt`, `fdl_loss`

This option sets the Frequency Distribution Loss, which is a perceptual loss.

[train.fdl_opt]
type = "fdl_loss"
model = "dinov2" # "vgg", "resnet", "effnet"
num_proj = 24
phase_weight = 1.0
loss_weight = 1.0
patch_size = 4
stride = 1
#vgg_weights = None
#dino_layers = None
#dino_weights = None

This loss uses pretrained network features. Possible networks are "dinov2", "vgg" (19), "resnet" (101) and "effnet" (efficientnet v1). The default value for num_proj is set to 24, due to heavy hit on training performance. In the official implementation however, the value 256 is used. You may increase it at the end of a finetuning process to achieve better perceptual quality. The *_weights parameters are the weight for each stage (layer) when using VGG or DINOv2 backends. For vgg_weights it must be a list of 5 float values (for each layer), while for DINOv2 it must be a list of maximum 11 values, where the weights corresponds to each layer in order. For example, the default values are:

[train.fdl_opt]
type = "fdl_loss"
vgg_weights = [0.5, 0.5, 1.0, 1.0, 1.0]
dino_layers = [0, 1, 2, 3, 4, 5, 6, 7]
dino_weights = [1.0, 0.5, 0.5, 1.0, 0.5, 0.5, 1.0, 0.1]
# layer 1 of dinov2 will be weighted at 0.5 (half) in this example
# layer 7 will be weighted 0.1
# avoid increasing the weight above 1.0

`perceptual_opt`, `vgg_perceptual_loss`

This option sets the perceptual loss. It uses the VGG19 network to extract features from images.

[train.perceptual_opt]
type = "vgg_perceptual_loss"
loss_weight = 1.0
criterion = "huber"
patchloss = true
ipk = true
patch_weight = 1.0
vgg_type = "vgg19"
use_input_norm = true
range_norm = false
[train.perceptual_opt.layer_weights]
conv1_2 = 0.1
conv2_2 = 0.1
conv3_4 = 1.0
conv4_4 = 1.0
conv5_4 = 1.0

Possible values for criterion are: l1, l2, huber and chc.

The options patchloss, ipk and perceptual_patch_weight specifies to use Patch Loss. By default, those options are disabled. The option patchloss enables Feature Patch Kernel, as described in the paper. The option ipk enables Image Patch Kernel.

`dists_opt`, `dists_loss`

This option enables DISTS (vgg16) as a perceptual loss. Can be used in combination with perceptual_opt.

[train.dists_opt]
type = "dists_loss"
loss_weight = 0.5

`gan_opt`, `gan_loss`

This option enables GAN training.

[train.gan_opt]
type = "gan_loss"
gan_type = "bce"
loss_weight = 0.3
real_label_val = 1.0
fake_label_val = 0.0

Possible values for gan_type are: bce, mse or huber.

`ldl_opt`, `ldl_loss`

This option sets the LDL loss. See the research paper for details.

[train.ldl_opt]
type = "ldl_loss"
loss_weight = 1.0
criterion = "huber"
ksize = 7

Possible values for type are: l1, l2 and huber.

`ff_opt`, `ff_loss`

This option sets the Focal-Frequency Loss. See the research paper for details.

[train.ff_opt]
type = "ff_loss"
loss_weight = 1.0
alpha = 1.0
patch_factor = 1
ave_spectrum = true
log_matrix = false
batch_matrix = false

[!NOTE] Focal Frequency loss can cause instabilities if enabled without using a pretrained model.

`gw_opt`, `gw_loss`

This option specifies to use Gradient-Weighted Loss from the CDC research. In practice, this loss makes the network focus more on high-frequencies.

[train.gw_opt]
type = "gw_loss"
loss_weight = 1.0
criterion = "chc_loss"
corner = true

Possible values for criterion are: l1, l2, huber, and chc.

`kl_opt`, `kl_loss`

This option specifies to use the Kullback-Leibler divergence loss.

[train.kl_opt]
type = "kl_loss"
loss_weight = 1.0

[!NOTE] KL-loss should only be enabled if using a pretrained model. Enabling it from scratch may cause incorrect results or NaN.

`match_lq_colors`

This option specifies to match color and luma from your LQ images, instead of the GT images. Can increase stability if your dataset has too much variations in color/luma. Only applicable if consistency_loss is enabled.

[train]
match_lq_colors = true

`consistency_opt`, `consistency_loss`

This option sets the color and luma consistency loss. It allows for matching the brightness and colors of your generated images to GT or LQ (see match_lq option). The loss uses Oklab and CIE L* color space transforms, as well as Cosine Similarity.

[train.consistency_opt]
type = "consistency_loss"
loss_weight = 1.0
criterion = "chc" # "l1"
blur = true
cosim = true
saturation = 1.0
brightness = 1.0

`msswd_opt`, `msswd_loss`

This option sets Multiscale Sliced Wasserstein Distance loss. It is a color consistency loss.

[train.msswd_opt]
type = "msswd_loss"
num_scale = 3
num_proj = 24
loss_weight = 1.0
patch_size = 11
stride = 1
c = 3

The parameters num_proj and num_scale defaults to 24 and 3, respectively, due to heavy hit on training performance. In the official implementation however, the values 128 and 5 are used. You may increase it at the end of a finetuning process to achieve better perceptual quality.

Losses - neosr-project/neosr GitHub Wiki

This page describes all losses and their options currently implemented in neosr.

wavelet_guided, wavelet_init

pixel_opt

mssim_opt, mssim_loss

ncc_opt, ncc_loss

fdl_opt, fdl_loss

perceptual_opt, vgg_perceptual_loss

dists_opt, dists_loss

gan_opt, gan_loss

ldl_opt, ldl_loss

ff_opt, ff_loss

gw_opt, gw_loss

kl_opt, kl_loss

match_lq_colors

consistency_opt, consistency_loss

msswd_opt, msswd_loss