[딥러닝] Hyper parameter setting : mNasnet, Efficientnet - penny4860/study-note GitHub Wiki

내용 정리

1. Learning Rate

Warmup 방식을 사용 : "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"
Method
- 처음 5-epoches
  - [0, 0.256] 까지 점차 상승
    - [0.0512, 0.1024, 0.1536, 0.2048, 0.256]
- 그 이후로는 2.4-epoch 마다 0.97-decay * 0.256 * 0.97**(int(epoch/2.4))

2. Optimizer : rmsprop

parameter
- decay : 0.9
- momentum : 0.9
rmsprop with momentum 과 adam 의 차이?
- rmsprop with momentum
  - velocity를 exponential filtering
  - v(t) = mu*v(t-1) + lr*grad/(grad_squared(t) + eps)
    - mu : momentum 계수
- adam
  - gradient를 exponential filtering
  - 1st moment : gradient의 exponential filtering
  - 2nd moment : grad_squared의 exponential filtering

3. Weight Decay

parameter
- 1e-5
kernel 만 사용 : http://cs231n.github.io/neural-networks-2/
- 일반적으로 bias는 off

4. Batch Norm

momentum : 0.99

5. mnasnet의 search 방법

ImageNet에서 5번의 epoch돌리고 판단