models microsoft beit base patch16 224 pt22k ft22k - Azure/azureml-assets GitHub Wiki

microsoft-beit-base-patch16-224-pt22k-ft22k

Overview

BEiT (Bidirectional Encoder representation from Image Transformers) is a vision transformer(ViT) pre-trained with Masked Image Modeling(MIM), which is a self-supervised pre-training inspired by BERT from NLP, followed by Intermediate fine-tuning using ImageNet-22k dataset. It is then fine-tuned for Image Classification. Images have two views of representation in BEiT, image patches and visual tokens which serve as input and output during pre-training, respectively. During self-supervised pre-training stage, some percentage of image patches are masked randomly, and then the visual tokens corresponding to the masked patches are predicted.

Through pre-training, the model acquires an internal representation of images, enabling the extraction of features useful for subsequent tasks. After pre-training, a simple linear classifier layer is employed as a task layer on top of pre-trained encoder for image classification, which includes average pooling to aggregate the representations and feed the global to a softmax classifier.

For more details, refer BEiT-paper.

Training Details

Training Data

The BEiT model is pre-trained on ImageNet-22k, encompassing 14 million images and 21,000 classes and fine-tuned on the same dataset.

Training Procedure

In the preprocessing step, images are resized to the same resolution 224x224. Images are scaled with augmentations like random resized cropping, horizontal flipping, color jittering. Then normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).

For more details on self-supervised pre-training (ImageNet-22k) followed by supervised fine-tuning (ImageNet-1k) refer to the section 2 and 3 of the original paper.

Evaluation Results

For BEiT image classification benchmark results, Refer to the table 1 of the original-paper.

License

apache-2.0

Inference Samples

Inference type Python sample (Notebook) CLI with YAML
Real time image-classification-online-endpoint.ipynb image-classification-online-endpoint.sh
Batch image-classification-batch-endpoint.ipynb image-classification-batch-endpoint.sh

Finetuning Samples

Task Use case Dataset Python sample (Notebook) CLI with YAML
Image Multi-class classification Image Multi-class classification fridgeObjects fridgeobjects-multiclass-classification.ipynb fridgeobjects-multiclass-classification.sh
Image Multi-label classification Image Multi-label classification multilabel fridgeObjects fridgeobjects-multilabel-classification.ipynb fridgeobjects-multilabel-classification.sh

Evaluation Samples

Task Use case Dataset Python sample (Notebook)
Image Multi-class classification Image Multi-class classification fridgeObjects image-multiclass-classification.ipynb
Image Multi-label classification Image Multi-label classification multilabel fridgeObjects image-multilabel-classification.ipynb

Sample input and output

Sample input

{
  "input_data": ["image1", "image2"]
}

Note: "image1" and "image2" string should be in base64 format or publicly accessible urls.

Sample output

[
  [
    {
      "label" : "can",
      "score" : 0.91
    },
    {
      "label" : "carton",
      "score" : 0.09
    },
  ],
  [
    {
      "label" : "carton",
      "score" : 0.9
    },
    {
      "label" : "can",
      "score" : 0.1
    },
  ]
]

Visualization of inference result for a sample image

mc visualization

Version: 19

Tags

huggingface_model_id : microsoft/beit-base-patch16-224-pt22k-ft22k license : apache-2.0 model_specific_defaults : ordereddict({'apply_deepspeed': 'true', 'apply_ort': 'true'}) task : image-classification hiddenlayerscanned training_dataset : imagenet-1k, imagenet-21k SharedComputeCapacityEnabled author : Microsoft inference_compute_allow_list : ['Standard_DS3_v2', 'Standard_D4a_v4', 'Standard_D4as_v4', 'Standard_DS4_v2', 'Standard_D8a_v4', 'Standard_D8as_v4', 'Standard_DS5_v2', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_FX4mds', 'Standard_F8s_v2', 'Standard_FX12mds', 'Standard_F16s_v2', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E4s_v3', 'Standard_E8s_v3', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2'] evaluation_compute_allow_list : ['Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2'] finetune_compute_allow_list : ['Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']

View in Studio: https://ml.azure.com/registries/azureml/models/microsoft-beit-base-patch16-224-pt22k-ft22k/version/19

License: apache-2.0

Properties

SharedComputeCapacityEnabled: True

SHA: 9da301148150e37e533abef672062fa49f6bda4f

finetuning-tasks: image-classification

finetune-min-sku-spec: 4|1|28|176

finetune-recommended-sku: Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

evaluation-min-sku-spec: 4|1|28|176

evaluation-recommended-sku: Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

inference-min-sku-spec: 4|0|14|28

inference-recommended-sku: Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

⚠️ **GitHub.com Fallback** ⚠️