components nlp_textclassification_multiclass - Azure/azureml-assets GitHub Wiki
Pipeline component for AutoML NLP Multiclass Text classification
Version: 0.0.2
View in Studio: https://ml.azure.com/registries/azureml/components/nlp_textclassification_multiclass/version/0.0.2
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
compute_model_import | compute to be used for model_selector eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
compute_preprocess | compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
compute_finetune | compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
num_nodes_finetune | number of nodes to be used for finetuning (used for distributed training) | integer | 1 | True | |
process_count_per_instance_finetune | number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune | integer | 1 | True | |
model_name | model id used to load model checkpoint. | string | bert-base-uncased |
Data PreProcess parameters (See docs to learn more)
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
label_column_name | label key name | string | False |
Dataset parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
training_data | Enter the train file path | uri_file | False | ||
validation_data | Enter the validation file path | uri_file | False |
Training parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
training_batch_size | Train batch size | integer | 32 | True | |
validation_batch_size | Validation batch size | integer | 32 | True | |
number_of_epochs | Number of epochs to train | integer | 3 | True | |
gradient_accumulation_steps | Gradient accumulation steps | integer | 1 | True | |
learning_rate | Start learning rate. Defaults to linear scheduler. | number | 5e-05 | True | |
warmup_steps | Number of steps used for a linear warmup from 0 to learning_rate | integer | 0 | True | |
weight_decay | The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | |
learning_rate_scheduler | The scheduler type to use | string | linear | True | ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup'] |
AutoML NLP parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
enable_long_range_text | label key name | boolean | True | True | |
precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 16 | True | ['32', '16'] |
MLFlow Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
enable_full_determinism | Ensure reproducible behavior during distributed training | string | false | True | ['true', 'false'] |
evaluation_strategy | The evaluation strategy to adopt during training | string | epoch | True | ['epoch', 'steps'] |
evaluation_steps_interval | The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0. | number | 0.0 | True | |
evaluation_steps | Number of update steps between two evals if evaluation_strategy='steps' | integer | 500 | True | |
logging_strategy | The logging strategy to adopt during training. | string | steps | True | ['epoch', 'steps'] |
logging_steps | Number of update steps between two logs if logging_strategy='steps' | integer | 500 | True | |
primary_metric | Specify the metric to use to compare two different models | string | accuracy | True | ['loss', 'f1_macro', 'mcc', 'accuracy', 'precision_macro', 'recall_macro'] |
Deepspeed Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_deepspeed | If set to true, will enable deepspeed for training | string | true | True | ['true', 'false'] |
ORT Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_ort | If set to true, will use the ONNXRunTime training | string | true | True | ['true', 'false'] |
deepspeed_config | Deepspeed config to be used for finetuning | uri_file | True |
Name | Description | Type |
---|---|---|
pytorch_model_folder_finetune | Output dir to save the finetune model and other metadata | uri_folder |
mlflow_model_folder_finetune | Output dir to save the finetune model as mlflow model | mlflow_model |