components multimodal_classification_pipeline - Azure/azureml-assets GitHub Wiki

Multimodal Classification Pipeline

multimodal_classification_pipeline

Overview

Pipeline component for multimodal classification models.

Version: 0.0.3

View in Studio: https://ml.azure.com/registries/azureml/components/multimodal_classification_pipeline/version/0.0.3

Inputs

Compute parameters

Name Description Type Default Optional Enum
compute_model_import compute to be used for model_selector eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. string False
compute_preprocess compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. string False
compute_finetune compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. string False
instance_count Number of nodes to be used for finetuning (used for distributed training). integer 1 True
process_count_per_instance Number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune. integer 1 True

Model Selector Component

Name Description Type Default Optional Enum
data_modalities Modalities to be supported. string text-image-tabular ['text-image', 'text-image-tabular']

pytorch_model_path: type: custom_model optional: true description: Input folder path containing pytorch model in azureml registry.

Name Description Type Default Optional Enum
mlflow_model_path Path to multimodal model in azureml registry. mlflow_model False

Data Preprocessing Component

Name Description Type Default Optional Enum
problem_type Specify whether its single-label or multi-label classification task. string multimodal-classification-singlelabel False ['multimodal-classification-singlelabel', 'multimodal-classification-multilabel']
label_column label column name. string False
image_column Image column name. string False
drop_columns Columns to ignore. string True
numerical_columns_overrides Columns to treat as numerical. Overrides automatic column purpose detection. string True
categorical_columns_overrides Columns to treat as categorical. Overrides automatic column purpose detection. string True
text_columns_overrides Columns to treat as text. Overrides automatic column purpose detection. string True

Inputs

Name Description Type Default Optional Enum
training_data Enter the train mltable path. mltable False
validation_data Enter the validation mltable path. mltable False

Finetuning Component Training parameters

Name Description Type Default Optional Enum
number_of_epochs training epochs integer 15 True
max_steps If set to a positive number, the total number of training steps to perform. Overrides 'number_of_epochs'. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted. integer -1 True
training_batch_size Train batch size. integer 1 True
validation_batch_size Validation batch size. integer 1 True
auto_find_batch_size Flag to enable auto finding of batch size. If the provided 'training_batch_size' goes into Out Of Memory (OOM) enabling auto_find_batch_size will find the correct batch size by iteratively reducing 'training_batch_size' by a factor of 2 till the OOM is fixed. string false True ['true', 'false']
optimizer Optimizer to be used while training. string adamw_hf True ['adamw_hf', 'adamw_torch', 'adafactor']
learning_rate Start learning rate. Defaults to linear scheduler. number 2e-05 True
warmup_steps Number of steps used for a linear warmup from 0 to learning_rate. integer 0 True
weight_decay The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer. number 0.0 True
adam_beta1 The beta1 hyperparameter for the AdamW optimizer. number 0.9 True
adam_beta2 The beta2 hyperparameter for the AdamW optimizer. number 0.999 True
adam_epsilon The epsilon hyperparameter for the AdamW optimizer. number 1e-08 True
gradient_accumulation_steps Number of updates steps to accumulate the gradients for, before performing a backward/update pass. integer 64 True
learning_rate_scheduler The scheduler type to use. string linear True ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup']
precision Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. string 32 True ['32', '16']
random_seed Random seed that will be set at the beginning of training. integer 42 True
evaluation_strategy The evaluation strategy to adopt during training. string epoch True ['epoch', 'steps']
evaluation_steps_interval The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0. number 0.0 True
evaluation_steps Number of update steps between two evals if evaluation_strategy='steps'. integer 500 True
logging_strategy The logging strategy to adopt during training. string epoch True ['epoch', 'steps']
logging_steps Number of update steps between two logs if logging_strategy='steps'. integer 500 True
primary_metric Specify the metric to use to compare two different models. string loss True ['loss', 'f1_macro', 'mcc', 'accuracy', 'precision_macro', 'recall_macro']
resume_from_checkpoint Loads Optimizer, Scheduler and Trainer state for finetuning if true. string false True ['true', 'false']
save_total_limit If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. If the value is -1 saves all checkpoints". integer -1 True

Early Stopping Parameters

Name Description Type Default Optional Enum
apply_early_stopping Enable early stopping. string false True ['true', 'false']
early_stopping_patience Stop training when the specified metric worsens for early_stopping_patience evaluation calls. integer 1 True
early_stopping_threshold Denotes how much the specified metric must improve to satisfy early stopping conditions. number 0.0 True

Deepspeed Parameters

Name Description Type Default Optional Enum
apply_deepspeed If set to true, will enable deepspeed for training. string false True ['true', 'false']
deepspeed_config Deepspeed config to be used for finetuning. uri_file True

ORT Parameters

Name Description Type Default Optional Enum
apply_ort If set to true, will use the ONNXRunTime training. string false True ['true', 'false']

MLFlow Parameters

Name Description Type Default Optional Enum
save_as_mlflow_model If set to true, will save as mlflow model with pyfunc as flavour. string true True ['true', 'false']

Outputs

########################### Finetuning Component ########################### #

Name Description Type
mlflow_model_folder Output dir to save the finetune model as mlflow model. mlflow_model
pytorch_model_folder Output dir to save the finetune model as torch model. custom_model
⚠️ **GitHub.com Fallback** ⚠️