Caffe Example : 2.Training LeNet on MNIST with Caffe (Kor) - ys7yoo/BrainCaffe GitHub Wiki

Caffe๋ฅผ ์ด์šฉํ•œ MNIST์ƒ์˜ LeNet ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ (Training LeNet on MNIST with Caffe)

์ด๋ฒˆ ์ž…๋ฌธ์„œ๋Š” ๋‹น์‹ ์ด Caffe๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ปดํŒŒ์ผํ–ˆ๋‹ค๋Š” ๊ฐ€์ •ํ•˜์— ์ง„ํ–‰ํ•œ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด, ์„ค์น˜ ํŽ˜์ด์ง€๋ฅผ ์ฐธ๊ณ ํ•˜๋ผ. ์ด๋ฒˆ ์ž…๋ฌธ์„œ์—์„œ๋Š” ์šฐ๋ฆฌ๋Š” ๋‹น์‹ ์˜ Caffe๊ฐ€ ์„ค์น˜๋œ ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ CAFFE_ROOT์— ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

1. ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ค€๋น„ (Prepare Datasets)

๋จผ์ € MNIST ์›น์‚ฌ์ดํŠธ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ ํ˜•ํƒœ๋ฅผ ๋‹ค์šด๋ฐ›๊ณ  ๋ณ€ํ™˜ํ•ด์•ผํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ๊ฐ„๋‹จํžˆ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ช…๋ น๋ฌธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. :

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh

๋งŒ์•ฝ wget์ด๋‚˜ gunzip์ด ์„ค์น˜๋˜์ง€ ์•Š์•˜๋‹ค๊ณ  ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด, ์ด๊ฒƒ๋“ค์„ ๊ฐ๊ฐ ์„ค์น˜ํ•ด์•ผ์ฃผ์–ด์•ผ๋งŒ ํ•œ๋‹ค. ์œ„์˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•œํ›„์—๋Š” ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ, mnist_train_lmdb์™€ mnist_test_lmdb๊ฐ€ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

2. LeNet : MNIST ๋ถ„๋ฅ˜ํ™” ๋ชจ๋ธ (LeNet: the MNIST Classification Model)

์‹ค์ œ๋กœ ํŠธ๋ ˆ์ด๋‹ ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•˜๊ธฐ์ „์—, ๋ฌด์—‡์ด ๋ฐœ์ƒํ•  ๊ฒƒ์ธ์ง€ ์„ค๋ช…ํ•ด๋ณด์ž. ์šฐ๋ฆฌ๋Š” LeNet ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋ฉฐ ์ด๋Š” ์ˆซ์ž ๋ถ„๋ฅ˜ํ™” ์—…๋ฌด์— ๋Œ€ํ•ด ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋‰ด๋Ÿฐ์— ๋Œ€ํ•œ ์ •๋ฅ˜ํ•œ ์„ ํ˜• ๋‹จ์œ„ (Rectified Linear Unit : (ReLU))๋กœ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ™œ์„ฑํ™”๋ฅผ ๋Œ€์ฒดํ•˜๋ฉด์„œ ๊ธฐ์กด์˜ LeNet ์ˆ˜ํ–‰์œผ๋กœ๋ถ€ํ„ฐ ์กฐ๊ธˆ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋‹ค.

LeNet์˜ ๊ตฌ์„ฑ์€ ImageNet์•ˆ์— ๊ฒƒ๋“ค๊ณผ ๊ฐ™์€ ๋” ํฐ ๋ชจ๋ธ๋“ค์—์„œ ์—ฌ์ „ํžˆ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” CNN์˜ ์ •์ˆ˜(ํ•ต์‹ฌ์š”์†Œ)๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, Pooling ๊ณ„์ธต์— ์˜ํ•œ convolution ๊ณ„์ธต๊ณผ ๋‹ค๋ฅธ Pooling๊ณ„์ธต์— ์˜ํ•œ convolution ๊ณ„์ธต, ๊ธฐ์กด์˜ ๋‹ค์ค‘๊ณ„์ธต ์ธ์ง€(conventional multilayer perceptrons)์™€ ์œ ์‚ฌํ•œ ์ด ๋‘˜๊ณผ ๋ชจ๋‘ ์—ฐ๊ฒฐ๋œ ๊ณ„์ธต์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” $CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt์— ๊ณ„์ธต๋“ค์„ ์ •์˜ํ•ด์™”๋‹ค.

3.MNIST ๋„คํŠธ์›Œํฌ ์ •์˜ (Define the MNIST Network)

์ด ๋‹จ์›์—์„œ๋Š” MNIST ์†์œผ๋กœ ์“ด ์ˆซ์ž ๋ถ„๋ฅ˜ํ™”๋ฅผ ์œ„ํ•œ LeNet ๋ชจ๋ธ์„ ๋ช…์‹œํ•˜๋Š” lenet_train_test.prototxt ๋ชจ๋ธ ์ •์˜๋ฅผ ์„ค๋ช…ํ•œ๋‹ค. ๋‹น์‹ ์ด Google Protobuf์™€ ์นœ๊ทผํ•˜๊ณ  Caffe์— ์˜ํ•ด ์‚ฌ์šฉ๋˜๋Š” protobuf ์ •์˜๋ฅผ ์ฝ์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•  ๊ฒƒ์ด๋ฉฐ ์ด๋Š” $CAFFE_ROOT/src/caffe/proto/caffe.proto์—์„œ ์ฐพ์•„๋ณผ ์ˆ˜ ์žˆ๋‹ค.
ํŠนํžˆ, protobuf๋ฅผ caffe::NetParameter (ํ˜น์€ ํŒŒ์ด์ฌ์—์„œ, caffe.proto.caffe_pb2.NetParameter)๋กœ ์ž‘์„ฑํ•  ๊ฒƒ์ด๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ฃผ์–ด์ง„ ๋„คํŠธ์›Œํฌ ์ด๋ฆ„์œผ๋กœ ์‹œ์ž‘ํ•  ๊ฒƒ์ด๋‹ค.

name: "LeNet"

๋ฐ์ดํ„ฐ ๊ณ„์ธต ์ž‘์„ฑํ•˜๊ธฐ (Writing the Data Layer)

ํ˜„์žฌ, ๋ฐ๋ชจ์— ์šฐ๋ฆฌ๊ฐ€ ์ดˆ๊ธฐ์— ๋งŒ๋“  lmdb๋กœ๋ถ€ํ„ฐ์˜ MNIST ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์„ ๊ฒƒ์ด๋‹ค. ์ด๊ฒƒ์€ ๋ฐ์ดํ„ฐ ๊ณ„์ธต์— ์˜ํ•ด ์ •์˜๋œ๋‹ค :

layer {
  name: "mnist"
  type: "Data"
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "mnist_train_lmdb"
    backend: LMDB
    batch_size: 64
  }
  top: "data"
  top: "label"
}

ํŠนํžˆ, ์ด๊ณ„์ธต์€ ์ด๋ฆ„์ด mnist์ด๊ณ , ํƒ€์ž…์€ data์ด๋ฉฐ, ์ฃผ์–ด์ง„ lmdb ์†Œ์Šค๋กœ ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด๋“ค์ธ๋‹ค. ์ผํšŒ ์ฒ˜๋ฆฌ๋Ÿ‰ ํฌ๊ธฐ๋กœ 64๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ ๋ฐ›์•„๋“ค์ด๋Š” ํ”ฝ์…€์„ ์กฐ์ •ํ•  ๊ฒƒ์ด๋ฉฐ ๋”ฐ๋ผ์„œ ์ด๋Š” [0,1)๋ฒ”์œ„๋ฅผ ๊ฐ€์ง„๋‹ค. ๊ทผ๋ฐ ์™œ 0.00390625์ผ๊นŒ? ์ด๋Š” 1์„ 256์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ตœ์ข…์ ์œผ๋กœ, ์ด๋Š” ๋‘๊ฐœ์˜ blobs, ํ•˜๋‚˜๋Š” data blobs, ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” label blob๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

์ปจ๋ณผ๋ฃจ์…˜ ๊ณ„์ธต ์ž‘์„ฑํ•˜๊ธฐ (Writing the Convolution Layer)

์ฒซ ์ปจ๋ณผ๋ฃจ์…˜ ๊ณ„์ธต์„ ์ •์˜ํ•ด๋ณด์ž :

layer {
  name: "conv1"
  type: "Convolution"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "data"
  top: "conv1"
}

์ด ๊ณ„์ธต์€ data blob(๋ฐ์ดํ„ฐ ๊ณ„์ธต์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๊ฒƒ์ด๋‹ค)๋ฅผ ๋ฐ›๊ณ , conv1 ๊ณ„์ธต์„ ์ƒ์‚ฐํ•œ๋‹ค. ์ปจ๋ณผ๋ฃจ์…˜์˜ ์ปค๋„ ํฌ๊ธฐ 5๋ฅผ ๊ฐ€์ง€๊ณ  stride 1๋กœ ์ˆ˜ํ–‰๋˜๋Š” 20๊ฐœ์˜ ์ฑ„๋„ ์ถœ๋ ฅ๋“ค์„ ์ƒ์‚ฐํ•œ๋‹ค.
ํ•„ํ„ฐ๋Š” ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ์˜ ๊ฐ’์„ ์ž„์˜์ ์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋Š” ๊ฒƒ์„ ํ—ˆ๋ฝํ•œ๋‹ค. ๊ฐ€์ค‘์น˜ ํ•„๋Ÿฌ(filler)์— ๋Œ€ํ•˜์—ฌ, ์šฐ๋ฆฌ๋Š” ์ž…๋Ÿญ๊ณผ ์ถœ๋ ฅ ๋‰ด๋Ÿฐ๋“ค์— ์ˆ˜์— ๊ธฐ๋ฐ˜ํ•œ ์ดˆ๊ธฐํ™”์˜ ํฌ๊ธฐ๋ฅผ ์ž๋™์ ์œผ๋กœ ๊ฒฐ์ •ํ•˜๋Š” xavier ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋‹ค. ํŽธํ–ฅ ํ•„๋Ÿฌ(filler)์— ๋Œ€ํ•˜์—ฌ, ์šฐ๋ฆฌ๋Š” ๊ฐ„๋‹จํžˆ 0๊ฐ’์œผ๋กœ ์ฑ„์šฐ๋Š” ๋””ํดํŠธ๋กœ ์ƒ์ˆ˜๋กœ์จ ์ดˆ๊ธฐํ™”ํ•  ๊ฒƒ์ด๋‹ค.
lr_mults๋Š” ๊ณ„์ธต์˜ ํ•™์Šต๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์œ„ํ•˜์—ฌ ํ•™์Šต์œจ์„ ์กฐ์ •ํ•˜๋Š” ์žฅ์น˜์ด๋‹ค. ์ด ๊ฒฝ์šฐ์—์„ , ์šฐ๋ฆฌ๋Š” ์ž‘๋™์‹œ๊ฐ„๋™์•ˆ ํ•ด๊ฒฐ์‚ฌ์— ์˜ํ•ด ์ฃผ์–ด์ง€๋Š” ํ•™์Šต์œจ๊ณผ ๋น„์Šทํ•˜๊ฒŒ ๊ฐ€๋Š” ๊ฐ€์ค‘์น˜ ํ•™์Šต์œจ์„ ์„ค์ •ํ•˜๊ณ , ์ฃผ์–ด์ง€๋Š” ๊ฒƒ๋ณด๋‹ค ๋‘๋ฐฐ ํฐ ํŽธํ–ฅ ํ•™์Šต์œจ(์ด๋Š” ๋ณดํ†ต ๋” ๋‚˜์€ ์ˆ˜๋ ด์œจ์„ ๋ณด์ธ๋‹ค)์„ ์„ค์ •ํ•  ๊ฒƒ์ด๋‹ค.

Pooling ๊ณ„์ธต ์ž‘์„ฑํ•˜๊ธฐ (Writing the Pooling Layer)

ํœด, Pooling ๊ณ„์ธต์€ ์‹ค์ œ๋กœ ์ •์˜ํ•˜๊ธฐ ๋งค์šฐ ์‰ฝ๋‹ค :

layer {
  name: "pool1"
  type: "Pooling"
  pooling_param {
    kernel_size: 2
    stride: 2
    pool: MAX
  }
  bottom: "conv1"
  top: "pool1"
}

์ด๋Š” ์šฐ๋ฆฌ๊ฐ€ pool ์ปค๋„ ์‚ฌ์ด์ฆˆ2 ๊ทธ๋ฆฌ๊ณ  stride of 2๋กœ max pooling์„ ์ˆ˜ํ–‰ํ•  ๊ฒƒ์ž„์„ ๋งํ•œ๋‹ค (๋”ฐ๋ผ์„œ ์ฃผ์œ„์˜ pooling ์ง€์—ญ ์‚ฌ์ด์— ์ค‘๋ณต์ด ์—†๋‹ค). ๊ฐ„๋‹จํžˆ, ๋‘๋ฒˆ์งธ ์ปจ๋ณผ๋ฃจ์…˜๊ณผ pooling ๊ณ„์ธต์„ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์ž์„ธํ•œ ์‚ฌํ•ญ์€ $CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt๋ฅผ ํ™•์ธํ•˜๋ผ.

์ „์ฒด์ ์œผ๋กœ ์—ฐ๊ฒฐ๋œ ๊ณ„์ธต ์ž‘์„ฑํ•˜๊ธฐ (Writing the Fully Connected Layer)

์ „์ฒด์ ์œผ๋กœ ์—ฐ๊ฒฐ๋œ ๊ณ„์ธต์„ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ ๋˜ํ•œ ๊ฐ„๋‹จํ•˜๋‹ค:

layer {
  name: "ip1"
  type: "InnerProduct"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "pool2"
  top: "ip1"
}

์ด๊ฒƒ์ด (๋‚ด๋ถ€์ƒ์‚ฐ(InnerProduct)๊ณ„์ธต์œผ๋กœ์จ Caffe์— ์•Œ๋ ค์ง„)500๊ฐœ์˜ ์ถœ๋ ฅ์„ ์ง€๋‹Œ ์ „์ฒด์ ์œผ๋กœ ์—ฐ๊ฒฐ๋œ ๊ณ„์ธต์„ ์ •์˜ํ•œ ๊ฒƒ์ด๋‹ค. ๋ชจ๋“  ๋‹ค๋ฅธ ๋ผ์ธ๋“ค์€ ์นœ๊ทผํ•ด๋ณด์ผ๊ฑฐ์•ผ, ๋งž์ง€?

ReLU ๊ณ„์ธต ์ž‘์„ฑํ•˜๊ธฐ (Writing the ReLU Layer)

ReLU ๊ณ„์ธต ๋˜ํ•œ ๊ฐ„๋‹จํ•˜๋‹ค:

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}

ReLU๋Š” element-wise operation์ด๊ธฐ ๋•Œ๋ฌธ์—, ๋ช‡๋ช‡์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ œ์ž๋ฆฌ์—์„œ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๊ฒƒ์€ bottom๊ณผ top blob์— ๊ฐ™์€ ์ด๋ฆ„์„ ์คŒ์œผ๋กœ์จ ๊ฐ„๋‹จํžˆ ๋‹ฌ์„ฑ๋œ๋‹ค. ๋‹น์—ฐํžˆ, ๋‹ค๋ฅธ ๊ณ„์ธต ํƒ€์ž…์„ ์œ„์—ฌ ๋ณต์ œ๋œ blob๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋ง์•„์•ผํ•œ๋‹ค.
ReLU ๊ณ„์ธต ์ •์˜ ํ›„์—, ์šฐ๋ฆฌ๋Š” ๋‹ค๋ฅธ ๋‚ด๋ถ€์ƒ์‚ฐ๊ณ„์ธต์„ ์ž‘์„ฑํ•  ๊ฒƒ์ด๋‹ค :

layer {
  name: "ip2"
  type: "InnerProduct"
  param { lr_mult: 1 }
  param { lr_mult: 2 }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  bottom: "ip1"
  top: "ip2"
}

์†์‹ค ๊ณ„์ธต ์ž‘์„ฑํ•˜๊ธฐ (Writing the Loss Layer)

๋งˆ์นจ๋‚ด, ์†์‹ค ๊ณ„์ธต์„ ์ž‘์„ฑํ•  ๊ฒƒ์ด๋‹ค!

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
}

softmax_loss ๊ณ„์ธต์€ ์†Œํ”„ํŠธ๋งฅ์Šค์™€ ๋‹คํ•ญ์‹์˜ ๋กœ์ง€์Šคํ‹ฑ ์†์‹ค(์‹œ๊ฐ„์„ ์ ˆ์•ฝํ•˜๊ณ , ์ˆ˜์˜ ์•ˆ์ •์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค) ๋ชจ๋‘๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๊ณ„์ธต์—์„œ ์ œ๊ณตํ•œ ๋‘ ๊ฐœ์˜ blob๋ฅผ ์ทจํ•˜๋ฉฐ ์ฒ˜์Œ ๊ฒƒ์€ ์˜ˆ์ธกํ•œ ๊ฒƒ ๊ทธ๋ฆฌ๊ณ  ๋‘๋ฒˆ์งธ ๊ฒƒ์€ ๋ผ๋ฒจํ•œ ๊ฒƒ์ด๋‹ค (๊ธฐ์–ต๋‚˜๋‹ˆ?). ์–ด๋– ํ•œ ์ถœ๋ ฅ๋„ ์ƒ์‚ฐํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์ด๊ฒƒ์ด ํ•˜๋Š” ๋ชจ๋“  ๊ฒƒ์€ ์†์‹ค ํ•จ์ˆ˜ ๊ฐ’์„ ๊ณ„์‚ฐํ•  ๊ฒƒ์ด๋ฉฐ, backpropagation์ด ์‹œ์ž‘๋˜๋ฉด ๋ณด๊ณ ํ•˜๋ฉฐ, ip2์— ๋Œ€ํ•˜์—ฌ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค. ์ด๊ฒƒ์ด ๋ชจ๋“  ๋งˆ๋ฒ•์˜ ์‹œ์ž‘์ด๋‹ค.

์ถ”๊ฐ€ ํ•„๊ธฐ : ๊ณ„์ธต ๊ทœ์น™ ์ž‘์„ฑํ•˜๊ธฐ (Additional Notes: Writing Layer Rules)

๊ณ„์ธต ์ •์˜๋Š” ์•„๋ž˜์˜ ๊ฒƒ๊ณผ ๊ฐ™์ด ๋„คํŠธ์›Œํฌ ์ •์˜์— ํฌํ•จํ• ๋•Œ์™€ ํ• ์ง€ ์•ˆํ• ์ง€์— ๋Œ€ํ•˜์—ฌ ๊ทœ์น™์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ๋‹ค:

layer {
  // ...layer definition...
  include: { phase: TRAIN }
}

์ด๊ฒƒ์ด ๊ทœ์น™์ด๋ฉฐ, ์ด๋Š” ํ˜„์žฌ ๋„คํŠธ์›Œํฌ์˜ ์ƒํƒœ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋„คํŠธ์›Œํฌ์— ๊ณ„์ธต์ด ํฌํ•จ๋œ ๊ฒƒ์„ ํ†ต์ œํ•œ๋‹ค. ๊ณ„์ธต ๊ทœ์น™๊ณผ ๋ชจ๋ธ ๊ฐœ์š”์— ๋Œ€ํ•˜์—ฌ ๋” ๋งŽ์€ ์ •๋ณด๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋ฉด $CAFFE_ROOT/src/caffe/proto/caffe.proto๋ฅผ ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ๋‹ค.
์œ„์˜ ์˜ˆ์—์„œ, ์ด ๊ณ„์ธต์€ ์˜ค์ง TRAIN ๋‹จ๊ณ„์—์„œ ํฌํ•จ๋  ๊ฒƒ์ด๋‹ค. ๋งŒ์•ฝ TRAIN์„ TEST๋กœ ๋ฐ”๊พธ๋ฉด, ์ด ๊ณ„์ธต์€ ์˜ค์ง ํ…Œ์ŠคํŠธ ๋‹จ๊ณ„์—์„œ๋งŒ ์‚ฌ์šฉ๋  ๊ฒƒ์ด๋‹ค. ์ž๋™์ ์œผ๋กœ, ์ฆ‰ ๊ณ„์ธต ๊ทœ์น™์—†์ด, ๊ณ„์ธต์€ ํ•ญ์ƒ ๋„คํŠธ์›Œํฌ์— ํฌํ•จ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ, lenet_train_test.prototxt๋Š” (๊ฐ๊ธฐ ๋‹ค๋ฅธ ์ผํšŒ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๊ฐ€์ง€๋ฉด์„œ) ํ•˜๋‚˜๋Š” ํ›ˆ๋ จ๋‹จ๊ณ„์—์„œ, ํ•˜๋‚˜๋Š” ์‹คํ—˜ ๋‹จ๊ณ„์—์„œ ์ •์˜๋œ ๋‘๊ฐœ์˜ ๋ฐ์ดํ„ฐ ๊ณ„์ธต์„ ๊ฐ€์ง„๋‹ค. ๋˜ํ•œ lenet_solver.prototxt์—์„œ ์ •์˜๋œ ๊ฒƒ์— ๋”ฐ๋ผ, ๋งค 100๋ฒˆ์„ ๋ฐ˜๋ณตํ•˜์—ฌ ๋ชจ๋ธ ์ •ํ™•๋„๋ฅผ ๋ณด๊ณ ํ•˜๋Š”๊ฒƒ์„ ์œ„ํ•œ TEST ๋‹จ๊ณ„์—์„œ๋งŒ ํฌํ•จ๋˜๋Š” ์ •ํ™•๋„ ๊ณ„์ธต(Accuracy layer)๊ฐ€ ์กด์žฌํ•œ๋‹ค.

4. MNIST ํ•ด๊ฒฐ์‚ฌ ์ •์˜ํ•˜๊ธฐ (Define the MNIST Solver)

prototxt์—์„œ ๊ฐ๊ฐ์˜ ๋ผ์ธ ์„ค๋ช…ํ•˜๋Š” ์ฝ”๋ฉ˜ํŠธ๋ฅผ ํ™•์ธํ•˜๋ผ. ######$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt:

# ๋ง ํ”„๋กœํ† ์ฝœ ๋ฒ„ํผ ์ •์˜
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter๋Š” ์–ผ๋งˆ๋‚˜ ์ •๋ฐฉํ–ฅ๊ณผ์ •์„ test๋‹จ๊ณ„์—์„œ ์ˆ˜ํ–‰ํ•ด์•ผํ•˜๋Š” ๋ช…์‹œํ•œ๋‹ค.
# MNIST์˜ ๊ฒฝ์šฐ์—์„œ, ์ „์ฒด ์‹คํ—˜์šฉ 10000๊ฐœ์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•˜์—ฌ
# ์šฐ๋ฆฌ๋Š” ์‹คํ—˜ ์ผํšŒ ์ฒ˜๋ฆฌ๋Ÿ‰์„ 100์œผ๋กœ, 100๋ฒˆ์˜ ์‹คํ—˜ ๋ฐ˜๋ณต์ˆ˜๋ฅผ ๊ฐ€์ง„๋‹ค.
test_iter: 100
# ์‹คํ—˜์„ ๋งค 500๋ฒˆ ํ›ˆ๋ จ ๋ฐ˜๋ณต๋งˆ๋‹ค ํ•œ๋ฒˆ ์‹คํ–‰ํ•œ๋‹ค.
test_interval: 500
# ๊ธฐ๋ณธ ํ•™์Šต์œจ, ๋„คํŠธ์›Œํฌ์˜ ๋ชจ๋ฉ˜ํ…€๊ณผ ๊ฐ€์ค‘์น˜ ๊ฐ์†Œ
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# ํ•™์Šต์œจ ์ •์ฑ…
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# ๋งค 100ํšŒ๋งˆ๋‹ค ๋ณด์—ฌ์ค€๋‹ค.
display: 100
# ์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜
max_iter: 10000
# ์Šค๋ƒ…์ƒท ์‚ฌ์ด์‚ฌ์ด์˜ ๊ฒฐ๊ณผ
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# ํ•ด๊ฒฐ์‚ฌ ๋ชจ๋“œ : CPU or GPU
solver_mode: GPU

5. ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ์™€ ์‹คํ—˜ํ•˜๊ธฐ (Training and Testing the Model)

๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋„คํŠธ์›Œํฌ ์ •์˜ protobuf์™€ ํ•ด๊ฒฐ์‚ฌ protobuf ํŒŒ์ผ๋“ค์„ ์ž‘์„ฑํ•œ ํ›„์—๋Š” ๊ฐ„๋‹จํ•˜๋‹ค. ๊ฐ„๋‹จํžˆ train_lenet.sh ์ด๋‚˜ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋ผ:

cd $CAFFE_ROOT
./examples/mnist/train_lenet.sh

train_lenet.sh๋Š” ๊ฐ„๋‹จํ•œ ์Šคํฌ๋ฆฝํŠธ์ง€๋งŒ, ์—ฌ๊ธฐ ๋น ๋ฅธ ์„ค๋ช…์ด ์žˆ๋‹ค : ์–ธ๊ธ‰ํ•œ ๊ฒƒ์— ๋”ฐ๋ผ ํ•™์Šต์„ ์œ„ํ•œ ํ•ต์‹ฌ ๋„๊ตฌ๋Š” trainํ–‰๋™๊ณผ ํ•ด๊ฒฐ์‚ฌ protobuf ํ…์ŠคํŠธ ํŒŒ์ผ๋ฅผ ๊ฐ€์ง„ caffe์ด๋‹ค.
์ด ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ• ๋•Œ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚ ๋ผ๋‹ค๋‹ˆ๋Š” ์ˆ˜๋งŽ์€ ๋ฉ”์„ธ์ง€๋ฅผ ๋ณผ ๊ฒƒ์ด๋‹ค:

I1203 net.cpp:66] Creating Layer conv1
I1203 net.cpp:76] conv1 <- data
I1203 net.cpp:101] conv1 -> conv1
I1203 net.cpp:116] Top shape: 20 24 24
I1203 net.cpp:127] conv1 needs backward computation.

์ด๋Ÿฌํ•œ ๋ฉ”์„ธ์ง€๋Š” ๊ฐ๊ฐ์˜ ๊ณ„์ธต, ์ด์— ๋Œ€ํ•œ ์—ฐ๊ฒฐ๋“ค๊ณผ ์ถœ๋ ฅํ˜•ํƒœ์— ๋Œ€ํ•˜์—ฌ ์ž์„ธํ•œ ์‚ฌํ•ญ์„ ๋งํ•ด์ฃผ๋ฉฐ, ์ด๋Š” ๋””๋ฒ„๊น…์— ์žˆ์–ด ๋„์›€์ด ๋ ์ง€ ๋ชจ๋ฅธ๋‹ค. ์ดˆ๊ธฐํ™”ํ›„์—๋Š”, ํ•™์Šต์ด ์‹œ์ž‘๋  ๊ฒƒ์ด๋‹ค:

I1203 net.cpp:142] Network initialization done.
I1203 solver.cpp:36] Solver scaffolding done.
I1203 solver.cpp:44] Solving LeNet

ํ•ด๊ฒฐ์‚ฌ ์„ค์ •์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ๋งค 100ํšŒ ๋ฐ˜๋ณต๋งˆ๋‹ค ์†์‹คํ•จ์ˆ˜๋ฅผ ํ›ˆ๋ จํ•œ ๊ฒƒ์„ ์ถœ๋ ฅํ•  ๊ฒƒ์ด๊ณ , ๋งค 500ํšŒ ๋ฐ˜๋ณต๋งˆ๋‹ค ๋„คํŠธ์›Œํฌ๋ฅผ ํ…Œ์ŠคํŠธํ•  ๊ฒƒ์ด๋‹ค. ๋‹น์‹ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฉ”์„ธ์ง€๋ฅผ ๋ณผ ๊ฒƒ์ด๋‹ค:

I1203 solver.cpp:204] Iteration 100, lr = 0.00992565
I1203 solver.cpp:66] Iteration 100, loss = 0.26044
...
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9785
I1203 solver.cpp:111] Test score #1: 0.0606671

๊ฐ๊ฐ์˜ ๋ฐ˜๋ณต ํ›ˆ๋ จ์— ๋Œ€ํ•˜์—ฌ, lr์€ ์ด ๋ฐ˜๋ณต์˜ ํ•™์Šต์œจ์ด๊ณ , loss๋Š” ํ›ˆ๋ จํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. ํ…Œ์ŠคํŠธ ๋‹จ๊ณ„์˜ ์ถœ๋ ฅ์— ๋Œ€ํ•˜์—ฌ, score 0๋Š” ์ •ํ™•๋„, score 1์€ ํ…Œ์ŠคํŒ… ์†์‹ค ํ•จ์ˆ˜์ด๋‹ค.
๊ทธ๋ฆฌ๊ณ  ๋ช‡๋ถ„ ํ›„๋ฉด ๋๋‚œ๋‹ค!

I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9897
I1203 solver.cpp:111] Test score #1: 0.0324599
I1203 solver.cpp:126] Snapshotting to lenet_iter_10000
I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstate
I1203 solver.cpp:78] Optimization Done.

์ด์ง„์ˆ˜ protobuf ํŒŒ์ผ๋กœ ์ €์žฅ๋œ ์ตœ์ข…ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณณ์— ์ €์žฅ๋œ๋‹ค.

lenet_iter_10000

๋งŒ์•ฝ ์‹ค์„ธ๊ณ„์—์„œ ์‘์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ ํ›ˆ๋ จ๋œ ๊ฒƒ์ด๋ผ๋ฉด, ๋‹น์‹ ์˜ ์‘์šฉ์—์„œ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ๋กœ์จ ๋ฐฐ์น˜๋  ์ˆ˜ ์žˆ๋‹ค.

์Œ... GPU ํ›ˆ๋ จ์€ ์–ด๋•Œ?(Umโ€ฆ How about GPU training?)

์ด๋ฏธ ํ–ˆ๋‹ค! ๋ชจ๋“  ํ›ˆ๋ จ๋“ค์€ GPU ์ƒ์—์„œ ์ˆ˜ํ–‰๋˜์–ด์ง„๋‹ค. ์‚ฌ์‹ค ๋งŒ์•ฝ CPU์ƒ์—์„œ ํ›ˆ๋ จ์‹œํ‚ค๊ณ  ์‹ถ๋‹ค๋ฉด, ๊ฐ„๋‹จํžˆ lenet_solver.prototxt์— ํ•œ ์ค„์„ ๋ฐ”๊พธ๋ฉด ๋œ๋‹ค:

# solver mode: CPU or GPU
solver_mode: CPU

๊ทธ๋ฆฌ๊ณ  ํ›ˆ๋ จ์ด CPU๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ด๋ฃจ์–ด์งˆ ๊ฒƒ์ด๋‹ค. ๊ฐ„๋‹จํ•˜์ง€ ์•Š๋‹ˆ? ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ์˜ค๋ฒ„ํ—ค๋“œ ๋•๋ถ„์—, MNIST๋Š” ์ž‘์€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ผ GPU๋กœ ํ›ˆ๋ จ์‹œํ‚ค๋Š”๊ฒƒ์ด ์‚ฌ์‹ค ์†Œ๊ฐœํ•  ๊ฐ€์น˜๊ฐ€ ์žˆ์ง€๋Š” ์•Š๋‹ค. ๋” ๋ณต์žกํ•œ ๋ชจ๋ธ๋กœํ•˜๋Š” ๋” ํฐ ๋ฐ์ดํ„ฐ์„ธํŠธ์— ๋Œ€ํ•˜์—ฌ, ์˜ˆ๋ฅผ๋“ค๋ฉด ImageNet๊ฐ™์€, ์—ฐ์‚ฐ ์†๋„์˜ ์ฐจ์ด์ ์€ ์•„์ฃผ ์ƒ๋‹นํ•  ๊ฒƒ์ด๋‹ค.

๊ณ ์ • ๋‹จ๊ณ„์—์„œ ์–ด๋–ป๊ฒŒ ํ•™์Šต์œจ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‚˜์š”? (How to reduce the learning rate at fixed steps?)

lenet_multistep_solver.prototxt์„ ๋ณด์•„๋ผ.

์œ„ํ‚ค๋กœ ๋Œ์•„๊ฐ€๊ธฐ