transfer learning - taoualiw/My-Knowledge-Base GitHub Wiki
In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size.
It is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.
The three major Transfer Learning scenarios look as follows:
-
ConvNet as fixed feature extractor.
- Take a ConvNet pretrained on ImageNet,
- Remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet). This gives a high dimensional vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) .
- Treat the rest of the ConvNet as a fixed feature extractor for the new datas
- Train a linear classifier (e.g. Linear SVM or Softmax classifier) for the fixed features.
-
Fine-tuning the ConvNet.
- Not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation.
- It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network.
- Earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers become progressively more specific to the details of the classes in the original dataset.
-
Pretrained models:
- Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning.Model ZOO