Losses - neosr-project/neosr GitHub Wiki
neosr
.
This page describes all losses and their options currently implemented in wavelet_guided
, wavelet_init
The wavelet_guided
loss enables the use of WGSR. As explained in the paper, the purpose of this is to stability gan training and reduce artifacts.
The option wavelet_init
specifies the number of iterations before to enable wavelet_guided.
[train]
wavelet_guided = true
wavelet_init = 80000
[!NOTE] This loss works better for finetuning than for training from scratch. It is recommended you train the model for at least ~40k before enabling it.
pixel_opt
The pixel_opt
option defines the pixel loss.
[train.pixel_opt]
type = "L1Loss"
loss_weight = 1.0
reduction = "mean"
The above option sets pixel loss with L1 criteria and weight of 1.0.
Possible values for type
are: L1Loss
, MSELoss
(also known as L2), HuberLoss
and chc
(Clipped Huber with Cosine Similarity Loss - can improve color consistency and decrease noise, reduction is done using Huber loss).
mssim_opt
, mssim_loss
The mssim_opt
option defines the Multi-scale SSIM loss. The implementation on neosr has been adapted from "A better pytorch-based implementation for the mean structural similarity. Differentiable simpler SSIM and MS-SSIM.".
The options bellow are the defaults when calling mssim
function by itself:
[train.mssim_opt]
type = "mssim_loss"
loss_weight = 1.0
window_size = 11
sigma = 1.5
in_channels = 3
K1 = 0.01
K2 = 0.03
L = 1
ncc_opt
, ncc_loss
This option sets the NCC loss. It uses Normalized Cross-Correlation.
[train.ncc_opt]
type = "ncc_loss"
loss_weight = 1.0
fdl_opt
, fdl_loss
This option sets the Frequency Distribution Loss, which is a perceptual loss.
[train.fdl_opt]
type = "fdl_loss"
model = "dinov2" # "vgg", "resnet", "effnet"
num_proj = 24
phase_weight = 1.0
loss_weight = 1.0
patch_size = 4
stride = 1
#vgg_weights = None
#dino_layers = None
#dino_weights = None
This loss uses pretrained network features. Possible networks are "dinov2"
, "vgg"
(19), "resnet"
(101) and "effnet"
(efficientnet v1). The default value for num_proj
is set to 24, due to heavy hit on training performance. In the official implementation however, the value 256 is used. You may increase it at the end of a finetuning process to achieve better perceptual quality.
The *_weights
parameters are the weight for each stage (layer) when using VGG or DINOv2 backends. For vgg_weights
it must be a list of 5 float values (for each layer), while for DINOv2 it must be a list of maximum 11 values, where the weights corresponds to each layer in order. For example, the default values are:
[train.fdl_opt]
type = "fdl_loss"
vgg_weights = [0.5, 0.5, 1.0, 1.0, 1.0]
dino_layers = [0, 1, 2, 3, 4, 5, 6, 7]
dino_weights = [1.0, 0.5, 0.5, 1.0, 0.5, 0.5, 1.0, 0.1]
# layer 1 of dinov2 will be weighted at 0.5 (half) in this example
# layer 7 will be weighted 0.1
# avoid increasing the weight above 1.0
perceptual_opt
, vgg_perceptual_loss
This option sets the perceptual loss. It uses the VGG19 network to extract features from images.
[train.perceptual_opt]
type = "vgg_perceptual_loss"
loss_weight = 1.0
criterion = "huber"
patchloss = true
ipk = true
patch_weight = 1.0
vgg_type = "vgg19"
use_input_norm = true
range_norm = false
[train.perceptual_opt.layer_weights]
conv1_2 = 0.1
conv2_2 = 0.1
conv3_4 = 1.0
conv4_4 = 1.0
conv5_4 = 1.0
Possible values for criterion
are: l1
, l2
, huber
and chc
.
The options patchloss
, ipk
and perceptual_patch_weight
specifies to use Patch Loss. By default, those options are disabled. The option patchloss
enables Feature Patch Kernel, as described in the paper. The option ipk
enables Image Patch Kernel.
dists_opt
, dists_loss
This option enables DISTS (vgg16) as a perceptual loss. Can be used in combination with perceptual_opt
.
[train.dists_opt]
type = "dists_loss"
loss_weight = 0.5
gan_opt
, gan_loss
This option enables GAN training.
[train.gan_opt]
type = "gan_loss"
gan_type = "bce"
loss_weight = 0.3
real_label_val = 1.0
fake_label_val = 0.0
Possible values for gan_type
are: bce
, mse
or huber
.
ldl_opt
, ldl_loss
This option sets the LDL loss. See the research paper for details.
[train.ldl_opt]
type = "ldl_loss"
loss_weight = 1.0
criterion = "huber"
ksize = 7
Possible values for type
are: l1
, l2
and huber
.
ff_opt
, ff_loss
This option sets the Focal-Frequency Loss. See the research paper for details.
[train.ff_opt]
type = "ff_loss"
loss_weight = 1.0
alpha = 1.0
patch_factor = 1
ave_spectrum = true
log_matrix = false
batch_matrix = false
[!NOTE] Focal Frequency loss can cause instabilities if enabled without using a pretrained model.
gw_opt
, gw_loss
This option specifies to use Gradient-Weighted Loss from the CDC research. In practice, this loss makes the network focus more on high-frequencies.
[train.gw_opt]
type = "gw_loss"
loss_weight = 1.0
criterion = "chc_loss"
corner = true
Possible values for criterion
are: l1
, l2
, huber
, and chc
.
kl_opt
, kl_loss
This option specifies to use the Kullback-Leibler divergence loss.
[train.kl_opt]
type = "kl_loss"
loss_weight = 1.0
[!NOTE] KL-loss should only be enabled if using a pretrained model. Enabling it from scratch may cause incorrect results or NaN.
match_lq_colors
This option specifies to match color and luma from your LQ images, instead of the GT images. Can increase stability if your dataset has too much variations in color/luma. Only applicable if consistency_loss
is enabled.
[train]
match_lq_colors = true
consistency_opt
, consistency_loss
This option sets the color and luma consistency loss. It allows for matching the brightness and colors of your generated images to GT or LQ (see match_lq
option). The loss uses Oklab and CIE L* color space transforms, as well as Cosine Similarity.
[train.consistency_opt]
type = "consistency_loss"
loss_weight = 1.0
criterion = "chc" # "l1"
blur = true
cosim = true
saturation = 1.0
brightness = 1.0
msswd_opt
, msswd_loss
This option sets Multiscale Sliced Wasserstein Distance loss. It is a color consistency loss.
[train.msswd_opt]
type = "msswd_loss"
num_scale = 3
num_proj = 24
loss_weight = 1.0
patch_size = 11
stride = 1
c = 3
The parameters num_proj
and num_scale
defaults to 24 and 3, respectively, due to heavy hit on training performance. In the official implementation however, the values 128 and 5 are used. You may increase it at the end of a finetuning process to achieve better perceptual quality.