components ft_nlp_common_validation - Azure/azureml-assets GitHub Wiki

Common Validation Component

ft_nlp_common_validation

Overview

Component to validate the finetune job against Validation Service

Version: 0.0.65

View in Studio: https://ml.azure.com/registries/azureml/components/ft_nlp_common_validation/version/0.0.65

Inputs

Name Description Type Default Optional Enum
mlflow_model_path MLflow model asset path. Special characters like \ and ' are invalid in the parameter value. mlflow_model True
compute_finetune compute to be used for finetune component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used string serverless True
compute_model_import compute to be used for model import component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used string serverless True
compute_data_preprocess compute to be used for data preprocess component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used string serverless True
task_name Finetuning task type string SingleLabelClassification True
num_nodes_finetune number of nodes to be used for finetuning (used for distributed training) integer 1 True
number_of_gpu_to_use_finetuning Number of GPUs to use for finetuning integer 1 True
train_file_path Path to the registered training data asset. The supported data formats are jsonl, json, csv, tsv and parquet. Special characters like \ and ' are invalid in the parameter value. uri_file True
validation_file_path Path to the registered validation data asset. The supported data formats are jsonl, json, csv, tsv and parquet. Special characters like \ and ' are invalid in the parameter value. uri_file True
test_file_path Path to the registered test data asset. The supported data formats are jsonl, json, csv, tsv and parquet. Special characters like \ and ' are invalid in the parameter value. uri_file True
train_mltable_path Path to the registered training data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. mltable True
validation_mltable_path Path to the registered validation data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. mltable True
test_mltable_path Path to the registered test data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. mltable True
user_column_names Comma separated list of column names to be used for training string True

Validation parameters

Name Description Type Default Optional Enum
system_properties Validation parameters propagated from pipeline. string True
num_train_epochs Number of training epochs integer True
max_steps Maximum number of training steps integer True
per_device_train_batch_size Batch size per GPU/CPU for training integer True
per_device_eval_batch_size Batch size per GPU/CPU for evaluation integer True
auto_find_batch_size If set to true, will enable auto_find_batch_size for training string false True ['true', 'false']
learning_rate Learning rate for optimizer number True
adam_beta1 Beta1 hyperparameter for the Adam optimizer number True
adam_beta2 Beta2 hyperparameter for the Adam optimizer number True
adam_epsilon Epsilon hyperparameter for the Adam optimizer number True
apply_deepspeed If set to true, will enable deepspeed for training string false True ['true', 'false']
precision Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. string 32 True ['32', '16']
apply_lora If "true" enables lora. string false True ['true', 'false']
apply_ort If set to true, will use the ONNXRunTime training string false True ['true', 'false']
deepspeed_stage This parameter configures which DEFAULT deepspeed config to be used - stage2 or stage3. The default choice is stage2. Note that, this parameter is ONLY applicable when user doesn't pass any config information via deepspeed port. string 2 True ['2', '3']
ignore_mismatched_sizes Not setting this flag will raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model. string true True ['true', 'false']
max_seq_length Controls the maximum length to use when pad_to_max_length parameter is set to true. Default is -1 which means the padding is done up to the model's max length. Else will be padded to max_seq_length. integer -1 True

Outputs

Name Description Type
validation_info Validation status. uri_file

Environment

azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/81

⚠️ **GitHub.com Fallback** ⚠️