components multimodal_classification_pipeline - Azure/azureml-assets GitHub Wiki
Pipeline component for multimodal classification models.
Version: 0.0.3
View in Studio: https://ml.azure.com/registries/azureml/components/multimodal_classification_pipeline/version/0.0.3
Compute parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
compute_model_import | compute to be used for model_selector eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. | string | False | ||
compute_preprocess | compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. | string | False | ||
compute_finetune | compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. | string | False | ||
instance_count | Number of nodes to be used for finetuning (used for distributed training). | integer | 1 | True | |
process_count_per_instance | Number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune. | integer | 1 | True |
Model Selector Component
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
data_modalities | Modalities to be supported. | string | text-image-tabular | ['text-image', 'text-image-tabular'] |
pytorch_model_path: type: custom_model optional: true description: Input folder path containing pytorch model in azureml registry.
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
mlflow_model_path | Path to multimodal model in azureml registry. | mlflow_model | False |
Data Preprocessing Component
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
problem_type | Specify whether its single-label or multi-label classification task. | string | multimodal-classification-singlelabel | False | ['multimodal-classification-singlelabel', 'multimodal-classification-multilabel'] |
label_column | label column name. | string | False | ||
image_column | Image column name. | string | False | ||
drop_columns | Columns to ignore. | string | True | ||
numerical_columns_overrides | Columns to treat as numerical. Overrides automatic column purpose detection. | string | True | ||
categorical_columns_overrides | Columns to treat as categorical. Overrides automatic column purpose detection. | string | True | ||
text_columns_overrides | Columns to treat as text. Overrides automatic column purpose detection. | string | True |
Inputs
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
training_data | Enter the train mltable path. | mltable | False | ||
validation_data | Enter the validation mltable path. | mltable | False |
Finetuning Component Training parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
number_of_epochs | training epochs | integer | 15 | True | |
max_steps | If set to a positive number, the total number of training steps to perform. Overrides 'number_of_epochs'. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted. | integer | -1 | True | |
training_batch_size | Train batch size. | integer | 1 | True | |
validation_batch_size | Validation batch size. | integer | 1 | True | |
auto_find_batch_size | Flag to enable auto finding of batch size. If the provided 'training_batch_size' goes into Out Of Memory (OOM) enabling auto_find_batch_size will find the correct batch size by iteratively reducing 'training_batch_size' by a factor of 2 till the OOM is fixed. | string | false | True | ['true', 'false'] |
optimizer | Optimizer to be used while training. | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
learning_rate | Start learning rate. Defaults to linear scheduler. | number | 2e-05 | True | |
warmup_steps | Number of steps used for a linear warmup from 0 to learning_rate. | integer | 0 | True | |
weight_decay | The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer. | number | 0.0 | True | |
adam_beta1 | The beta1 hyperparameter for the AdamW optimizer. | number | 0.9 | True | |
adam_beta2 | The beta2 hyperparameter for the AdamW optimizer. | number | 0.999 | True | |
adam_epsilon | The epsilon hyperparameter for the AdamW optimizer. | number | 1e-08 | True | |
gradient_accumulation_steps | Number of updates steps to accumulate the gradients for, before performing a backward/update pass. | integer | 64 | True | |
learning_rate_scheduler | The scheduler type to use. | string | linear | True | ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup'] |
precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 32 | True | ['32', '16'] |
random_seed | Random seed that will be set at the beginning of training. | integer | 42 | True | |
evaluation_strategy | The evaluation strategy to adopt during training. | string | epoch | True | ['epoch', 'steps'] |
evaluation_steps_interval | The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0. | number | 0.0 | True | |
evaluation_steps | Number of update steps between two evals if evaluation_strategy='steps'. | integer | 500 | True | |
logging_strategy | The logging strategy to adopt during training. | string | epoch | True | ['epoch', 'steps'] |
logging_steps | Number of update steps between two logs if logging_strategy='steps'. | integer | 500 | True | |
primary_metric | Specify the metric to use to compare two different models. | string | loss | True | ['loss', 'f1_macro', 'mcc', 'accuracy', 'precision_macro', 'recall_macro'] |
resume_from_checkpoint | Loads Optimizer, Scheduler and Trainer state for finetuning if true. | string | false | True | ['true', 'false'] |
save_total_limit | If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. If the value is -1 saves all checkpoints". | integer | -1 | True |
Early Stopping Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_early_stopping | Enable early stopping. | string | false | True | ['true', 'false'] |
early_stopping_patience | Stop training when the specified metric worsens for early_stopping_patience evaluation calls. | integer | 1 | True | |
early_stopping_threshold | Denotes how much the specified metric must improve to satisfy early stopping conditions. | number | 0.0 | True |
Deepspeed Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_deepspeed | If set to true, will enable deepspeed for training. | string | false | True | ['true', 'false'] |
deepspeed_config | Deepspeed config to be used for finetuning. | uri_file | True |
ORT Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_ort | If set to true, will use the ONNXRunTime training. | string | false | True | ['true', 'false'] |
MLFlow Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
save_as_mlflow_model | If set to true, will save as mlflow model with pyfunc as flavour. | string | true | True | ['true', 'false'] |
########################### Finetuning Component ########################### #
Name | Description | Type |
---|---|---|
mlflow_model_folder | Output dir to save the finetune model as mlflow model. | mlflow_model |
pytorch_model_folder | Output dir to save the finetune model as torch model. | custom_model |