object_dect - yubo105139/paper GitHub Wiki

TOC

1. 目标检测【1】

目前基于深度学习的目标检测已经取得相当进步,但是在小目标检测上仍旧有很大的差距

image-20210401163525912

​ 可以看出小尺度目标的 AP 值最小且仅为大尺度目标AP 值的一半左右,小目标的检测性能与中尺度以及大尺度目标相比仍存在着不可忽视的差距。

2. 小目标检测

2.1 小目标尺度定义

image-20210401163336937

2.2 小目标难以检测的原因

​ (1) 数据集目标实例尺度分布不均,无论是样本数量还是所占像素比例,相对于中、大目标都处于弱势;

​ (2) 小目标实例图像模糊,分辨率低,携带的信息少;

​ (3) 小目标实例特征表达能力弱,随着卷基层加深,能提取到的特征越来越少。

2.3 针对以上小目标检测难点,主要有四类解决方法

(1)大小目标数据分布不均匀【2】【3】

​ 【2】根据小目标的语义分割标注,提取小目标Mask,多次复制贴到原图里,以及将包含小目标的训练样本重采样,以均衡样本数量和小目标像素占比。

image-20210401170315943

​ 【3】将网络训练时小目标样本的误差积累的量作为反馈信息,小目标loss占比过低时候,调整缩小数据输入的尺度,引导网络平衡大小目标之间的训练误差。

image-20210401170745318

(2)图像多尺度【4】【5】【6】

​ 【4】多尺度训练,将样本根据尺寸,分成多个尺度的多个部分,网络训练时,每间隔几十个batch,就随机变换一下不同尺度域的样本作为下一阶段训练数据,测试阶段也相应采用多尺度测试,如下图。image-20210401171742453

​ 【5】多尺度训练时候,在RPN输出以及RCN输出的众多候选框中,过滤掉小尺度上的过小目标,大尺度上的过大目标,中尺度中的过小,过大目标,作为一种尺度范围正则化。

image-20210401172147890

​ 【6】 用固定大小的框,截取多尺度图像的子区域,缩放到统一尺寸,使不同大小的目标在一个相对均衡的范围,同时过滤掉大部分多余的无目标背景,用RPN选取一定数量背景区域框作为训练集的负样本。

image-20210401172939378

(3)特征多尺度【7】【8】【9】

​ 【7】小目标特征随着卷积层加深,信息越来越少,因此将深层特征融合传递到浅层,综合全局与局部信息,做出综合的检测判决,以提升小目标的检测能力。

image-20210401173711330

​ 【8】FPN中融合深层特征到浅层特征时,融合过程会受到数据集尺度分布差异变化的影响,因此增加一个权重参数可以控制深层特征传递到浅层特征时信息注入的比例,以减小尺度变化的带来的负面影响。

image-20210401174049703

​ 【9】FPN中融合深层特征到浅层特征时,对小目标的检测提升了,但还不够,作者提出一个特征纹理转换层FTT,进一步加强浅层特征的表达能力,来进一步提高小目标检测能力。

image-20210401175047228

(4)基于超分辨率【10】【11】

​ 【10】小目标检测差的一个原因是过低的分辨率,特征图尺寸和信息都匮乏,因此可以通过对小目标进行超分,丰富小目标的特征图,来提高检测能力,作者采用GAN,来生成高分特征图,用于后续目标检测,同时引入目标检测误差的感知损失,构建条件GAN,来引导超分的特征益于目标检测。

image-20210401175534084

​ 【11】采用GAN来超分小目标的特征图时,作为训练监督信息的高分特征图与低分特征图,以同一个目标在高低分图像上的中心为基准,映射到各自的特征图的相应位置,截取一个相同的特征区域,以L1或L2 求损失,此时特征图尺寸虽然可以是相同的,但是该特征区域映射回各自原图的感受野大小是相同的,但是由于图像尺寸相差两倍,导致低分图的相对感受野是高分图的两倍,导致特征所表达的图像信息是不匹配的,作者提出将高分图的普通卷积全部换成空洞卷积,一次来扩大感受野,使得高分特征的相对感受野和低分特征相对感受野信息匹配。

image-20210402093943442

2.4 总结

image-20210406173342776

image-20210406173954360

​ 以上论文,大都基于Faster-RCNN作为基准线,扩展作者提出的新改进,主要是因为学术上易于分析对比有效性。对工程上而言,由于需要对检测速度和精准度之间取舍权衡,因此可以选用最新的一些一阶段目标检测算法作为基准线,候选有 EfficientDet-2020-04 和 yolov5-2020- 06,代码都有开源的pytorch版本;基于此基准,可以尝试上述的 主要四类解决方法,由于四类方法逻辑上并不冲突,都是从不同的角度在解决问题,因此可以单独也可以多个叠加引入做实验。

3.参考文献以及关键技术点如下汇总:

【1】Deep Learning for Generic Object Detection: A Survey-2019

image-20210324134330507

recognition task

image-20210324134450472

object detection progress ?

image-20210324134533822

image-20210324135642276

object detection dataset ?

image-20210319091012465

PASCAL VOC and MS COCO version

image-20210324134705889

metrics ?

image-20210324135519006

image-20210324135544062

two stage to one stage

RPN -> RCNN -> FastRCNN -> Faster RCNN -> RFCN -> yolo -> SSD

image-20210324140714025

image-20210324135819820

image-20210324135840123

image-20210324135858487

image-20210324135918761

【2】Augmentation for small object detection-2019

image-20210324134216594

2019-

code: https://paperswithcode.com/paper/augmentation-for-small-object-detection#code [unofficial] [pytorch]

small medium large object defines

image-20210324141031965

small object suffers poor AP compare to non-small objects

image-20210324140915083

small objects have matched less anchors

image-20210319133452369

how to resolve? by Oversampling and Augmentation

oversampling by copy samll object images into dataset

image-20210324141800202

augmentation by replicate samll object mask several times into an images

image-20210324142051919

small object augmentation before transformations

image-20210319160904109

resules ?

oversampling perform best an 3 times

image-20210319170805972

augmtation best at original+aug

image-20210324142510263

image-20210324142414511

【3】Dynamic Scale Training for Object Detection-2021

image-20210324170026942

code : https://paperswithcode.com/paper/stitcher-feedback-driven-data-provider-for[official] [pytorch]

Brief

image-20210324170315037

Novels

image-20210324170200298

Intuition:

image-20210324171057137

  1. Attribute performance drop to Scale Variation , thats means imbalanced of different scales images

image-20210324171407030

  1. Imbalance over Images that Matters, so concentrate on minority scales.

image-20210324172327616

How to solve ?

image-20210324172546268

image-20210324172527327

image-20210324172644200

Collage Images

image-20210324173212920

DST apply to Faster R-CNN-FPN ? AP +2.3% on MSCOCO compare to baseline(Faster R-CNN-FPN).

image-20210324174910779

DST apply to one stage retinaNet ? AP +2.5% on MSCOCO compare to baseline.

image-20210324175005814

Determine threshold τ = 0.1 through grid searching

image-20210324110250723

Determine the number of collage components k = 4

image-20210324110511918

Feed back loss from regression loss

image-20210324111235965

speed ?

image-20210324110205644

【4】Mutil Scale Training/Testing from Yolov2

《YOLO9000: Better, Faster, Stronger》

《Spatial Pyramid Pooling in Deep ConvolutionalNetworks for Visual Recognition》

mutil scale training stem from the two paper above

code : https://github.com/longcw/yolo2-pytorch [unofficial] [pytorch]

Multi Scale Training

image-20210401100733478

One Stage Multi Scale Testing

image-20210401100950410

Two Stage Multi Scale Testing

image-20210401101129626

【5】An Analysis of Scale Invariance in Object Detection SNIP-2018

image-20210401094320895

code: not yet

observation: Large scale variation across object among small media large instances.

image-20210330142221304

scale variation result in domain shift

image-20210331090831336

SNIP training and inference base on RPN

image-20210331114807441

result:

compare with mutil scale training

image-20210331162057337

compare with state of the art detectors

image-20210331162016982

【6】SNIPER: Efficient Multi-Scale Training-2018

image-20210330111135119

code : https://github.com/mahyarnajibi/SNIPER [official] [mxnet]

brief

image-20210330143101745

intuition

image-20210330144156211

how ?

  1. SNIPER generates chips

image-20210330141928026

  1. SNIPER Positive chip selection

    image-20210331162417650

image-20210330141549867

  1. SNIPER negative chip selection

    image-20210331162637180

image-20210330141636296

result ? AR AP SOTA

  1. recall is not effected by negative chip sampling

image-20210330134950298

  1. scale = 3 with negative mining achieves best AP

image-20210330134847204

  1. compare with SOTA of SNIP +1.7AP +2.3% AP_small

image-20210330140150000

speed ?

  1. training time 1/3

image-20210330141108955

  1. inference time 5 frames on GPU

image-20210330143405824

【7】Feature Pyramid Networks for Object Detection-2017

image-20210324134031478

code: https://github.com/jwyang/fpn.pytorch [unofficial] [pytorch]

Breif

image-20210324143745424

FPN structure ?

image-20210324142803688

apply to Faster RCNN achieve SOTA result then

image-20210324143143713

【8】Effective Fusion Factor in FPN for Tiny Object Detection-2020

image-20210324144037320

code : https://github.com/ucas-vg/Effective-Fusion-Factor [pytorch]

brief: propose a novel concept--fusion factor,which affects the performance of small object detection

image-20210324144225746

intuition ?

image-20210324144740634

how?

image-20210322164444178

fusion factor α apply to FPN framework

image-20210324145854193

how to obtain an effective fusion factor α ?

image-20210324155655735= SUM( IOU( X, GT, Anchors) , iou_th ) X is images dataset

image-20210324154937040

image-20210324154602430

result ?

miss rate decreased slightly

image-20210324160112456

AP boost about 1.%

image-20210324160159919

【9】Extended Feature Pyramid Network for Small Object Detection-2020

image-20210324160632349

code : https://github.com/gene-chou/EFPN-detectron2 [unofficial] [pytorch]

novels ?

image-20210323154005444

intuition ?

  1. Both small and medium objects are detected on the lowest level of FPN
  2. AP and AR decline sharply when instances turn small

image-20210323093553501

structure ?

FPN + FTT + Supervision

image-20210324162229525

Extended Level ?

image-20210324164244259

image-20210324164342431

FTT ?

image-20210323133555855

image-20210323133531475

Foreground-Background-Balanced Loss--add foreground attention loss according to ground truth.

image-20210323154328144

result ?

​ +2% on Tsinghua-Tencent 100K

image-20210324163026394

+1.2% on MS COCO

image-20210324163617945

Effect of each component in EFPN: boost small objects.

image-20210324164850649

【10】Perceptual Generative Adversarial Networks for Small Object Detection -2017

image-20210331170336662

code: not yet

brief and novel

image-20210401095906618

image-20210401095936438

intuition

image-20210331171532157

math

image-20210401102221321

structure:

image-20210401145324517

result

image-20210402094716149

【11】Precise Supervision of Feature Super-Resolution for Small Object Detection

image-20210330144938006

code: not yet

brief and intuition:

image-20210330145322432

network structure

image-20210330145618869

relative receptive fields analysis

image-20210330145536898

image-20210330150031452

image-20210330150117011

solve Mismatch of Relative Receptive Fields by atrous (dilated)convolution layer

image-20210330150503649

result and performance:

  1. AP_small +7.7% , AP +3.7% on Tsinghua-Tencent 100K

image-20210330150856410

  1. AP_small +5.2% , AP +1.8% on PASCAL VOC , AP_small +4.9% , AP +2.2% on MS COCO

image-20210330151026535

小目标检测汇总

  1. 概述 无论是在检测还是分割算法中小目标(面积小于32 ∗ 32)的检测或分割都是比中等与大目标难的,一般来讲在COCO检测数据集上小目标的检测性能是大目标的一半不到。

image-20210401153602655

【12】Temporally Identity-Aware SSD with Attentional LSTM-2020

image-20210408133140145

visual example

image-20210408133505008

输入相邻两针图像,经过TSSD输出目标检测识别结果,在经过OTA输出各个目标的ID

structure

image-20210408135552576

image-20210408135605644

​ 目标检测的基础部分为SSD,输出6个尺度特征图,分别进入AC-LSTM(先经过卷积,再经过注意力层,最后经过LSTM)融合相邻帧信息,浅深层特征,回归预测目标框偏移量和目标类别;注意力机制部分,也会输出基于交叉熵损失的注意力图;帧间关系 通过最小化 每帧前k高的置信度之和 与 历史连续帧置信度和的滑动平均的残差来约束。

image-20210408145824344

loss

image-20210408142604552

image-20210408145840985

image-20210408145856935

speed

image-20210408151126751

performance

image-20210408165602929

image-20210408165715936

结论:

​ SSD 六尺度特征,有利于小目标检测,注意力层有利于抑制大部分无关背景影响,LSTM能融合多帧信息,可能有利于提升目标检测性能,注意力图有利于OTA区分多目标ID。帧间关系损失函数部分,有利于多帧目标检测的稳定性。速度上也可以达到实时,大概是SSD原本速度的一半。AP性能上2.4%提升,和其他SOTA相比仍有差距,比如D&T[21]。