CFG Parameters in the different layers - Sudhakar17/darknet GitHub Wiki

CFG-Parameters in the different layers

Image processing [N x C x H x W]:

[convolutional] - convolutional layer
- batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
- filters=64 - number of kernel-filters (1 by default)
- size=3 - kernel_size of filter (1 by default)
- groups = 32 - number of groups for grouped-convolutional (depth-wise) (1 by default)
- stride=1 - stride (offset step) of kernel filter (1 by default)
- padding=1 - size of padding (0 by default)
- pad=1 - if 1 will be used padding = size/2, if 0 the will be used parameter padding= (0 by default)
- dilation=1 - size of dilation (1 by default)
- activation=leaky - activation function after convolution: logistic (by default), loggy, relu, elu, selu, relie, plse, hardtan, lhtan, linear, ramp, leaky, tanh, stair, relu6, swish, mish

[activation] - separate activation layer
- activation=leaky - activation function: linear (by default), loggy, relu, elu, selu, relie, plse, hardtan, lhtan, linear, ramp, leaky, tanh, stair

[batchnorm] - separate Batch-normalization layer

[maxpool] - max-pooling layer (the maximum value)
- size=2 - size of max-pooling kernel
- stride=2 - stirde (offset step) of max-pooling kernel

[avgpool] - average pooling layer input W x H x C -> output 1 x 1 x C

[shortcut] - residual connection (ResNet)
- from=-3,-5 - relative layer numbers, preforms element-wise adding of several layers: previous-layer and layers specified in from= parameter
- weights_type=per_feature - will be used weights for shortcut y[i] = w1*layer1[i] + w2*layer2[i] ...
  - per_feature - 1 weights per layer/feature
  - per_channel - 1 weights per channel
  - none - weights will not be used (by default)
- weights_normalization=softmax - will be used weights normalization
  - softmax - softmax normalization
  - relu - relu normalization
  - none - without weights normalization - unbound weights (by default)
- activation=linear - activation function after shortcut/residual connection (linear by default)

[upsample] - upsample layer (increase W x H resolution of input by duplicating elements)
- stride=2 - factor for increasing both Width and Height (new_w = w*stride, new_h = h*stride)

[scale_channels] - scales channels (SE: squeeze-and-excitation blocks) or (ASFF: adaptively spatial feature fusion) -it multiplies elements of one layer by elements of another layer
- from=-3 - relative layer number, performs multiplication of all elements of channel N from layer -3, by one element of channel N from the previous layer -1 (i.e. for(int i=0; i < b*c*h*w; ++i) output[i] = from_layer[i] * previous_layer[i/(w*h)]; )
- scale_wh=0 - SE-layer (previous layer 1x1xC), scale_wh=1 - ASFF-layer (previous layer WxHx1)
- activation=linear - activation function after scale_channels-layer (linear by default)

[sam] - Spatial Attention Module (SAM) - it multiplies elements of one layer by elements of another layer
- from=-3 - relative layer number (this and previous layers should be the same size WxHxC)

[reorg3d] - reorg layer (resize W x H x C)
- stride=2 - if reverse=0 input will be resized to W/2 x H/2 x C4, if reverse=1thenW2 x H*2 x C/4`, (1 by default)
- reverse=1 - if 0(by default) then decrease WxH, if1thenincrease WxH (0 by default)

[reorg] - OLD reorg layer from Yolo v2 - has incorrect logic (resize W x H x C) - depracated
- stride=2 - if reverse=0 input will be resized to W/2 x H/2 x C4, if reverse=1thenW2 x H*2 x C/4`, (1 by default)
- reverse=1 - if 0(by default) then decrease WxH, if1thenincrease WxH (0 by default)

[route] - concatenation layer, Concat for several input-layers, or Identity for one input-layer
- layers = -1, 61 - layers that will be concatenated, output: W x H x C_layer_1 + C_layer_2
  - if index < 0, then it is relative layer number (-1 means previous layer)
  - if index >= 0, then it is absolute layer number

[yolo] - detection layer for Yolo v3 / v4
- mask = 3,4,5 - indexes of anchors which are used in this [yolo]-layer
- anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 - initial sizes if bounded_boxes that will be adjusted
- num=9 - total number of anchors
- classes=80 - number of classes of objects which can be detected
- ignore_thresh = .7 - keeps duplicated detections if IoU(detect, truth) > ignore_thresh, which will be fused during NMS (is used for training only)
- truth_thresh = 1 - adjusts duplicated detections if IoU(detect, truth) > truth_thresh, which will be fused during NMS (is used for training only)
- jitter=.3 - randomly crops and resizes images with changing aspect ratio from x(1 - 2*jitter) to x(1 + 2*jitter) (data augmentation parameter is used only from the last layer)
- random=1 - randomly resizes network for each 10 iterations from 1/1.4 to 1.4(data augmentation parameter is used only from the last layer)
- resize=1.5 - randomly resizes image in range: 1/1.5 - 1.5x
- max=200 - maximum number of objects per image during training
- counters_per_class=100,10,1000 - number of objects per class in Training dataset to eliminate the imbalance
- label_smooth_eps=0.1 - label smoothing
- scale_x_y=1.05 - eliminate grid sensitivity
- iou_thresh=0.2 - use many anchors per object if IoU(Obj, Anchor) > 0.2
- iou_loss=mse - IoU-loss: mse, giou, diou, ciou
- iou_normalizer=0.07 - normalizer for delta-IoU
- cls_normalizer=1.0 - normalizer for delta-Objectness
- max_delta=5 - limits delta for each entry

[crnn] - convolutional RNN-layer (recurrent)
- batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
- size=1 - convolutional kernel_size of filter (1 by default)
- pad=0 - if 1 will be used padding = size/2, if 0 the will be used parameter padding= (0 by default)
- output = 1024 - number of kernel-filters in one output convolutional layer (1 by default)
- hidden=1024 - number of kernel-filters in two (input and hidden) convolutional layers (1 by default)
- activation=leaky - activation function for each of 3 convolutional-layers in the [crnn]-layer (logistic by default)

[conv_lstm] - convolutional LSTM-layer (recurrent)
- batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
- size=3 - convolutional kernel_size of filter (1 by default)
- padding=1 - convolutional size of padding (0 by default)
- pad=1 - if 1 will be used padding = size/2, if 0 the will be used parameter padding= (by default)
- stride=1 - convolutional stride (offset step) of kernel filter (1 by default)
- dilation=1 - convolutional size of dilation (1 by default)
- output=256 - number of kernel-filters in each of 8 or 11 convolutional layers (1 by default)
- groups=4 - number of groups for grouped-convolutional (depth-wise) (1 by default)
- state_constrain=512 - constrains LSTM-state values [-512; +512] after each inference (time_steps*32 by default)
- peephole=0 - if 1 then will be used Peephole (additional 3 conv-layers), if 0 will not (1 by default)
- bottleneck=0 - if 1 then will be used reduced optimal versionn of conv-lstm layer
- activation=leaky - activation function for each of 8 or 11 convolutional-layers in the [conv_lstm]-layer (linear by default)
- lstm_activation=tanh - activation for G (gate: g = tanh(wg + ug)) and C (memory cell: h = o * tanh(c))

Detailed-architecture-of-the-peephole-LSTM

Free-form data processing [Inputs]:

[connected] - fully connected layer
- output=256 - number of outputs (1 by default), so number of connections is equal to inputs*outputs
- activation=leaky - activation after layer (logistic by default)

[dropout] - dropout layer
- probability=0.5 - dropout probability - what part of inputs will be zeroed (0.5 = 50% by default)
- dropblock=1 - use as DropBlock
- dropblock_size_abs=7 - size of DropBlock in pixels 7x7

[softmax] - SoftMax CE (cross entropy) layer - Categorical cross-entropy for multi-class classification

[contrastive] - Contrastive loss layer for Supervised and Unsupervised learning (should be set [net] contrastive=1 and optionally [net] unsupervised=1)
- classes=1000 - number of classes
- temperature=1.0 - temperature

[cost] - cost layer calculates (linear)Delta and (squared)Loss
- type=sse - cost type: sse (L2), masked, smooth (smooth-L1) (SSE by default)

[rnn] - fully connected RNN-layer (recurrent)
- batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
- output = 1024 - number of outputs in one connected layer (1 by default)
- hidden=1024 - number of outputs in two (input and hidden) connected layers (1 by default)
- activation=leaky - activation after layer (logistic by default)

[lstm] - fully connected LSTM-layer (recurrent)
- batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
- output = 1024 - number of outputs in all connected layers (1 by default)

[gru] - fully connected GRU-layer (recurrent)
- batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
- output = 1024 - number of outputs in all connected layers (1 by default)