Multimodal Classification Pipeline

multimodal_classification_pipeline

Overview

Pipeline component for multimodal classification models.

Version: 0.0.3

View in Studio: https://ml.azure.com/registries/azureml/components/multimodal_classification_pipeline/version/0.0.3

Inputs

Compute parameters

Name	Description	Type	Default	Optional
compute_model_import	compute to be used for model_selector eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'.	string		False
compute_preprocess	compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'.	string		False
compute_finetune	compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'.	string		False
instance_count	Number of nodes to be used for finetuning (used for distributed training).	integer	1	True
process_count_per_instance	Number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune.	integer	1	True

Model Selector Component

Name	Description	Type	Default	Optional	Enum
data_modalities	Modalities to be supported.	string	text-image-tabular		['text-image', 'text-image-tabular']

pytorch_model_path: type: custom_model optional: true description: Input folder path containing pytorch model in azureml registry.

Name	Description	Type	Default	Optional	Enum
mlflow_model_path	Path to multimodal model in azureml registry.	mlflow_model		False

Data Preprocessing Component

Name	Description	Type	Default	Optional	Enum
problem_type	Specify whether its single-label or multi-label classification task.	string	multimodal-classification-singlelabel	False	['multimodal-classification-singlelabel', 'multimodal-classification-multilabel']
label_column	label column name.	string		False
image_column	Image column name.	string		False
drop_columns	Columns to ignore.	string		True
numerical_columns_overrides	Columns to treat as numerical. Overrides automatic column purpose detection.	string		True
categorical_columns_overrides	Columns to treat as categorical. Overrides automatic column purpose detection.	string		True
text_columns_overrides	Columns to treat as text. Overrides automatic column purpose detection.	string		True

Inputs

Name	Description	Type	Default	Optional	Enum
training_data	Enter the train mltable path.	mltable		False
validation_data	Enter the validation mltable path.	mltable		False

Finetuning Component Training parameters

Name	Description	Type	Default	Optional	Enum
number_of_epochs	training epochs	integer	15	True
max_steps	If set to a positive number, the total number of training steps to perform. Overrides 'number_of_epochs'. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted.	integer	-1	True
training_batch_size	Train batch size.	integer	1	True
validation_batch_size	Validation batch size.	integer	1	True
auto_find_batch_size	Flag to enable auto finding of batch size. If the provided 'training_batch_size' goes into Out Of Memory (OOM) enabling auto_find_batch_size will find the correct batch size by iteratively reducing 'training_batch_size' by a factor of 2 till the OOM is fixed.	string	false	True	['true', 'false']
optimizer	Optimizer to be used while training.	string	adamw_hf	True	['adamw_hf', 'adamw_torch', 'adafactor']
learning_rate	Start learning rate. Defaults to linear scheduler.	number	2e-05	True
warmup_steps	Number of steps used for a linear warmup from 0 to learning_rate.	integer	0	True
weight_decay	The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer.	number	0.0	True
adam_beta1	The beta1 hyperparameter for the AdamW optimizer.	number	0.9	True
adam_beta2	The beta2 hyperparameter for the AdamW optimizer.	number	0.999	True
adam_epsilon	The epsilon hyperparameter for the AdamW optimizer.	number	1e-08	True
gradient_accumulation_steps	Number of updates steps to accumulate the gradients for, before performing a backward/update pass.	integer	64	True
learning_rate_scheduler	The scheduler type to use.	string	linear	True	['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup']
precision	Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision.	string	32	True	['32', '16']
random_seed	Random seed that will be set at the beginning of training.	integer	42	True
evaluation_strategy	The evaluation strategy to adopt during training.	string	epoch	True	['epoch', 'steps']
evaluation_steps_interval	The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0.	number	0.0	True
evaluation_steps	Number of update steps between two evals if evaluation_strategy='steps'.	integer	500	True
logging_strategy	The logging strategy to adopt during training.	string	epoch	True	['epoch', 'steps']
logging_steps	Number of update steps between two logs if logging_strategy='steps'.	integer	500	True
primary_metric	Specify the metric to use to compare two different models.	string	loss	True	['loss', 'f1_macro', 'mcc', 'accuracy', 'precision_macro', 'recall_macro']
resume_from_checkpoint	Loads Optimizer, Scheduler and Trainer state for finetuning if true.	string	false	True	['true', 'false']
save_total_limit	If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. If the value is -1 saves all checkpoints".	integer	-1	True

Early Stopping Parameters

Name	Description	Type	Default	Optional	Enum
apply_early_stopping	Enable early stopping.	string	false	True	['true', 'false']
early_stopping_patience	Stop training when the specified metric worsens for early_stopping_patience evaluation calls.	integer	1	True
early_stopping_threshold	Denotes how much the specified metric must improve to satisfy early stopping conditions.	number	0.0	True

Deepspeed Parameters

Name	Description	Type	Default	Optional	Enum
apply_deepspeed	If set to true, will enable deepspeed for training.	string	false	True	['true', 'false']
deepspeed_config	Deepspeed config to be used for finetuning.	uri_file		True

ORT Parameters

Name	Description	Type	Default	Optional	Enum
apply_ort	If set to true, will use the ONNXRunTime training.	string	false	True	['true', 'false']

MLFlow Parameters

Name	Description	Type	Default	Optional	Enum
save_as_mlflow_model	If set to true, will save as mlflow model with pyfunc as flavour.	string	true	True	['true', 'false']

Outputs

########################### Finetuning Component ########################### #

Name	Description	Type
mlflow_model_folder	Output dir to save the finetune model as mlflow model.	mlflow_model
pytorch_model_folder	Output dir to save the finetune model as torch model.	custom_model

components multimodal_classification_pipeline - Azure/azureml-assets GitHub Wiki

Multimodal Classification Pipeline

multimodal_classification_pipeline

Overview

Inputs

Outputs

⚠️ GitHub.com Fallback ⚠️

components multimodal_classification_pipeline - Azure/azureml-assets GitHub Wiki

Multimodal Classification Pipeline

multimodal_classification_pipeline

Overview

Inputs

Outputs

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️