[논문리뷰] YOLOv3 : An Incremental Improvement - penny4860/study-note GitHub Wiki

1. 정리

요약

retinanet보다 빠른 1-stage detector를 구현
retinanet과의 비교
- multi-scale feature
  - retinanet은 5-scale, yolo3는 3-scale
- subnet 구조
  - retinanet은 cls/reg를 나누어서 예측
  - yolo3는 1개로 예측
yolo3가 빠른 이유?
- scale을 덜 사용하고 5->3
- subnet이 cls/reg 로 나누지 않고 1-path를 사용

질문

2. 내용

2. The Deal

2.1. Bounding Box Prediction

box decoding 방식: https://github.com/penny4860/tf2-eager-yolo3/blob/81766c3ff0bd6f411068e41fcf89be2f60417bcb/yolo/post_proc/decoder.py#L69
Training할때 box assign rule
- foreground sample : gt-box와 best matching 되는 1개만 assign
  - objectness = 1.0
  - class, box를 assign
- background sample
  - objectness = 0.0
  - class, box를 assign 하지 않고, loss도 연산하지 않는다.

2.2. Class Prediction

softmax를 사용하지 않고 label별로 logistic classifier 적용.

def _activate_probs(objectness, classes, obj_thresh=0.3):
    # 1. sigmoid activation
    objectness_prob = _sigmoid(objectness)
    classes_probs = _sigmoid(classes)
    # 2. conditional probability
    classes_conditional_probs = classes_probs * objectness_prob
    # 3. thresholding
    classes_conditional_probs *= objectness_prob > obj_thresh
    return objectness_prob, classes_conditional_probs

2.3. Predictions Across Scales

Yolo 구조
- Backbone : Feature Extractor
  - input : image
  - output : (C3, C4, C5)
- FPN
  - input : (C3, C4, C5) :
    - [52, 52, 256], [26, 26, 512], [13, 13, 1024]
  - output : (P3, P4, P5)
    - [52, 52, 128], [26, 26, 256], [13, 13, 512]
- subnet (detection net)
  - input : (P3, P4, P5)
    - [52, 52, 128], [26, 26, 256], [13, 13, 512]
  - output : (D3, D4, D5))
    - [52, 52, 3*(n_cls+4+1)], [26, 26, 3*(n_cls+4+1)], [13, 13, 3*(n_cls+4+1)]
detection tensor 에서 grid, anchor 별로 3가지를 한번에 예측
- objectness : 1
- box : 4
- class : n_classes
- 3-scale
- 3-anchor per grid

2.4. Feature Extractor

resnet에서의 skip connection(shortcut) / bottleneck을 변형해서 사용
- Darknet-53 에서의 pattern
  - [(1x1)-128, (3x3)-256, residual]

2.5. Training

Hard negative mining 은 사용하지 않고
multi-scale training
augmentation 을 많이 사용

3. How we do

성능
- Old detection metric 으로 평가하면 retinanet과 차이가 별로 없으나
- Coco metric으로 평가하면 gap이 있음.
결론 : IOU threshold 가 커질수록 YOLO의 성능이 떨어진다.

4. Things didn't work

focal loss
- 성능이 안좋아져서 안썼다고 함. (mAP 2.0 감소)
- objectness를 따로 예측하기 때문에 imbalance 문제에 덜 민감한듯
gt-box를 할당할때 iou-threshold를 주는 방식
- faster rcnn에서는 iou-threshold로 gt-box를 assign
  - 0.7이상이면 GT
  - 0.3미만이면 negative
  - 나머지는 학습에 사용 안함.
- retinanet도 비슷하지만 threshold가 약간 다름.
  - 0.5, 0.4
- yolo에서는 best matching box만 assign