1. Residual Attention Network for Image Classification - yspaik/pytorch_study GitHub Wiki

  1. Introduction
  • ๋ฐฐ๊ฒฝ
  • Attention ๋ชจ๋ธ์€ ์‹œ๊ณ„์—ด ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ž˜ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์ง€๋งŒ, ์ด๋ฏธ์ง€ ์ธ์‹ ๋“ฑ์˜ feedforward network์—๋Š” ์ž˜ ์‚ฌ์šฉ๋˜์ง€ ์•Š์Œ
  • ์ตœ๊ทผ ์ด๋ฏธ์ง€ ์ธ์‹ ๊ธฐ์ˆ  ํ–ฅ์ƒ์œผ๋กœ ResNet์„ ์ด์šฉํ•˜์—ฌ ์ธต์„ ๊นŠ๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Œ โ†’ ResNet์„ ์ด์šฉํ•œ ๊นŠ์€CNN ๋Œ€ํ•ด attention์„ ์ ์šฉ ํ•˜๊ณ  ์ •๋ฐ€๋„ ํ–ฅ์ƒ์„ ๋„๋ชจ
  • ๋ชจ๋ธ ๊ตฌ์กฐ์™€ ์„ฑ๊ณผ

1. Stacked network structure

  • ์—ฌ๋Ÿฌ Attention Module์„ ์Œ“์•„ ๋งŒ๋“  ๋ชจ๋ธ ๊ตฌ์กฐ. Attention Module์€ ๋‹ค๋ฅธ ์ข…๋ฅ˜๋กœ๋„ ์—ฐ๊ฒฐ๋  ์ˆ˜ ์žˆ์Œ

2. Attention Residual Learning

  • ๋‹จ์ˆœํžˆ Attention Module์„ ์—ฐ๊ฒฐํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ง.
  • Residual Network๋ฅผ ๊ณ ๋ คํ•˜์—ฌ hundreds of layers์˜ ๋„คํŠธ์›Œํฌ๋ฅผ ์—ฐ๊ฒฐ

3. Bottom-up top-down feedforward attention

  • Bottom-up (๋ฐฐ๊ฒฝ์˜ ์ฐจ์ด ๋“ฑ) attention ํ•˜๋Š” ๋ฐฉ๋ฒ•

  • Top-down (์‚ฌ์ „ ์ง€์‹ ๋“ฑ) attention ํ•˜๋Š” ๋ฐฉ๋ฒ•

    โ†’ ์•ˆ์ • ์ธต์„ ๋Š˜๋ฆฌ๊ณ  ์ •ํ™•๋„ ํ–ฅ์ƒ, End-to-End ๊นŠ์€ ๋„คํŠธ์›Œํฌ์— ์‰ฝ๊ฒŒ ์ ์šฉ, ํšจ์œจ์ ์ธ ๊ณ„์‚ฐ ์˜ˆ์ œ ์ด๋ฏธ์ง€ : Attention Mask์— ๋”ฐ๋ฅธ ๋‹ค๋ฅธ Feature๋ฅผ ๋ณด์—ฌ์คŒ

  • ๊ทธ๋ฆผ์—์„œ ์ฃผ๋ชฉํ•  ์ 

    • ๋‹ค๋ฅธ Attention Module์—์„œ๋Š” ๋‹ค๋ฅธ attention mask
    • ์ธต์ด ์–•์€ attention module์—์„œ๋Š” ๋ฐฐ๊ฒฝ์˜ ๋นˆ ๊ณต๊ฐ„์„ ์†Œ๊ฑฐ
    • ์ธต์ด ๊นŠ์€ attention module์—์„œ๋Š” ํ’์„ ์„ ๊ฐ•์กฐ
  1. Related Work

  2. Residual Attention Network - ์ œ์•ˆ ๋ชจ๋ธ

3.1. Attention Residual Learning

  • ๋‹จ์ˆœํžˆ Attention Module์„ CNN์ถœ๋ ฅ๊ณผ ๊ณฑํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ์ •๋ฐ€๋„๊ฐ€ ๋–จ์–ด์ง

    • ์ธต์ด ๊นŠ์–ด ์งˆ์ˆ˜๋ก gradient decent๊ฐ€ ์‚ฌ๋ผ์ง
    • CNN์˜ ์ค‘์š”ํ•œ value of features๋ฅผ ์•ฝํ™” ๋  ๊ฐ€๋Šฅ์„ฑ
  • Attention Residual Learning ๋„์‹

    • Soft mask branch ๐‘€ ๐‘ฅ โˆˆ [0, 1] ์—ญํ• 

      1. enhance good features
      2. trunk features๋กœ๋ถ€ํ„ฐ noise deduction
    • Stacked Attention Modules๊ฐ€ ์žฅ๋‹จ์ ์„ ๋ณด์™„ํ•˜์—ฌ feature map์„ ์ •๊ตํ•˜๊ฒŒ ๋‹ค๋“ฌ๊ฒŒ ๋จ โ†’ ํ’์„ ๊ทธ๋ฆผ Layer๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ์ •๊ตํ•ด์ง

3.2. Soft Mask Branch

  • ๋‘ ๊ฐ€์ง€ ๊ธฐ๋Šฅ์„ ๊ฐ„์งํ•˜๋Š” ๊ตฌ์กฐ
      1. Fast feed-forward sweep -> ์ด๋ฏธ์ง€ ์ „์ฒด์˜ ์ •๋ณด๋ฅผ ํŒŒ์•…
      1. Top-down feedback step -> ์›๋ž˜ feature map๊ณผ ์ด๋ฏธ์ง€ ์ „์ฒด์˜ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉ
  • Sigmoid : normalize output range [0,1]

3.3 Spatial Attention and Channel Attention

  • Activation function ๋ณ€๊ฒฝ โ†’ Attention ์ œ์•ฝ์„ ์ถ”๊ฐ€ ๊ฐ€๋Šฅ
      1. Mixed Attention โ†’ sigmoid
      1. Channel Attention โ†’ ๋ชจ๋“  ์˜์—ญ์˜ channlel์— ๋Œ€ํ•œ L2 normalization โ†’ spatial ์ •๋ณด ์‚ญ์ œ
      1. Spatial Attention โ†’ ๊ฐ channel์—์„œ feature map ์•ˆ์—์„œ ์ •๊ทœํ™” โ†’ sigmoid๋ฅผ ํ†ตํ•ด์„œ spatial ์ •๋ณด๋งŒ ๊ด€๊ณ„๋œ mask ํš๋“