components ft_nlp_common_validation - Azure/azureml-assets GitHub Wiki
Component to validate the finetune job against Validation Service
Version: 0.0.65
View in Studio: https://ml.azure.com/registries/azureml/components/ft_nlp_common_validation/version/0.0.65
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
mlflow_model_path | MLflow model asset path. Special characters like \ and ' are invalid in the parameter value. | mlflow_model | True | ||
compute_finetune | compute to be used for finetune component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | serverless | True | |
compute_model_import | compute to be used for model import component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | serverless | True | |
compute_data_preprocess | compute to be used for data preprocess component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | serverless | True | |
task_name | Finetuning task type | string | SingleLabelClassification | True | |
num_nodes_finetune | number of nodes to be used for finetuning (used for distributed training) | integer | 1 | True | |
number_of_gpu_to_use_finetuning | Number of GPUs to use for finetuning | integer | 1 | True | |
train_file_path | Path to the registered training data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | True | ||
validation_file_path | Path to the registered validation data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | True | ||
test_file_path | Path to the registered test data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | True | ||
train_mltable_path | Path to the registered training data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. |
mltable | True | ||
validation_mltable_path | Path to the registered validation data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. |
mltable | True | ||
test_mltable_path | Path to the registered test data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. |
mltable | True | ||
user_column_names | Comma separated list of column names to be used for training | string | True |
Validation parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
system_properties | Validation parameters propagated from pipeline. | string | True | ||
num_train_epochs | Number of training epochs | integer | True | ||
max_steps | Maximum number of training steps | integer | True | ||
per_device_train_batch_size | Batch size per GPU/CPU for training | integer | True | ||
per_device_eval_batch_size | Batch size per GPU/CPU for evaluation | integer | True | ||
auto_find_batch_size | If set to true, will enable auto_find_batch_size for training | string | false | True | ['true', 'false'] |
learning_rate | Learning rate for optimizer | number | True | ||
adam_beta1 | Beta1 hyperparameter for the Adam optimizer | number | True | ||
adam_beta2 | Beta2 hyperparameter for the Adam optimizer | number | True | ||
adam_epsilon | Epsilon hyperparameter for the Adam optimizer | number | True | ||
apply_deepspeed | If set to true, will enable deepspeed for training | string | false | True | ['true', 'false'] |
precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 32 | True | ['32', '16'] |
apply_lora | If "true" enables lora. | string | false | True | ['true', 'false'] |
apply_ort | If set to true, will use the ONNXRunTime training | string | false | True | ['true', 'false'] |
deepspeed_stage | This parameter configures which DEFAULT deepspeed config to be used - stage2 or stage3. The default choice is stage2. Note that, this parameter is ONLY applicable when user doesn't pass any config information via deepspeed port. | string | 2 | True | ['2', '3'] |
ignore_mismatched_sizes | Not setting this flag will raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model. | string | true | True | ['true', 'false'] |
max_seq_length | Controls the maximum length to use when pad_to_max_length parameter is set to true . Default is -1 which means the padding is done up to the model's max length. Else will be padded to max_seq_length . |
integer | -1 | True |
Name | Description | Type |
---|---|---|
validation_info | Validation status. | uri_file |
azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/81