components nlp_textclassification_multilabel - Azure/azureml-assets GitHub Wiki
Pipeline component for AutoML NLP Multilabel Text classification
Version: 0.0.9
View in Studio: https://ml.azure.com/registries/azureml/components/nlp_textclassification_multilabel/version/0.0.9
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| compute_model_import | compute to be used for model_selector eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
| compute_preprocess | compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
| compute_finetune | compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
| num_nodes_finetune | number of nodes to be used for finetuning (used for distributed training) | integer | 1 | True | |
| process_count_per_instance_finetune | number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune | integer | 1 | True | |
| model_name | model id used to load model checkpoint. | string | bert-base-uncased |
Data PreProcess parameters (See docs to learn more)
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| label_column_name | label key name | string | False |
Dataset parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| training_data | Enter the train file path | uri_file | False | ||
| validation_data | Enter the validation file path | uri_file | False |
Training parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| training_batch_size | Train batch size | integer | 32 | True | |
| validation_batch_size | Validation batch size | integer | 32 | True | |
| number_of_epochs | Number of epochs to train | integer | 3 | True | |
| gradient_accumulation_steps | Gradient accumulation steps | integer | 1 | True | |
| learning_rate | Start learning rate. Defaults to linear scheduler. | number | 5e-05 | True | |
| warmup_steps | Number of steps used for a linear warmup from 0 to learning_rate | integer | 0 | True | |
| weight_decay | The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | |
| learning_rate_scheduler | The scheduler type to use | string | linear | True | ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup'] |
AutoML NLP parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| enable_long_range_text | label key name | boolean | True | True | |
| precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 16 | True | ['32', '16'] |
MLFlow Parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| enable_full_determinism | Ensure reproducible behavior during distributed training | string | false | True | ['true', 'false'] |
| evaluation_strategy | The evaluation strategy to adopt during training | string | epoch | True | ['epoch', 'steps'] |
| evaluation_steps_interval | The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0. | number | 0.0 | True | |
| evaluation_steps | Number of update steps between two evals if evaluation_strategy='steps' | integer | 500 | True | |
| logging_strategy | The logging strategy to adopt during training. | string | steps | True | ['epoch', 'steps'] |
| logging_steps | Number of update steps between two logs if logging_strategy='steps' | integer | 500 | True | |
| primary_metric | Specify the metric to use to compare two different models | string | accuracy | True | ['loss', 'f1_macro', 'mcc', 'accuracy', 'precision_macro', 'recall_macro'] |
Deepspeed Parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| apply_deepspeed | If set to true, will enable deepspeed for training | string | true | True | ['true', 'false'] |
ORT Parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| apply_ort | If set to true, will use the ONNXRunTime training | string | true | True | ['true', 'false'] |
| deepspeed_config | Deepspeed config to be used for finetuning | uri_file | True |
| Name | Description | Type |
|---|---|---|
| pytorch_model_folder_finetune | Output dir to save the finetune model and other metadata | uri_folder |
| mlflow_model_folder_finetune | Output dir to save the finetune model as mlflow model | mlflow_model |