components ft_nlp_common_validation - Azure/azureml-assets GitHub Wiki

Common Validation Component

ft_nlp_common_validation

Overview

Component to validate the finetune job against Validation Service

Version: 0.0.80

View in Studio: https://ml.azure.com/registries/azureml/components/ft_nlp_common_validation/version/0.0.80

Inputs

Name	Description	Type	Default	Optional	Enum
mlflow_model_path	MLflow model asset path. Special characters like \ and ' are invalid in the parameter value.	mlflow_model		True
compute_finetune	compute to be used for finetune component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used	string	serverless	True
compute_model_import	compute to be used for model import component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used	string	serverless	True
compute_data_preprocess	compute to be used for data preprocess component eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used	string	serverless	True
task_name	Finetuning task type	string	SingleLabelClassification	True
num_nodes_finetune	number of nodes to be used for finetuning (used for distributed training)	integer	1	True
number_of_gpu_to_use_finetuning	Number of GPUs to use for finetuning	integer	1	True
train_file_path	Path to the registered training data asset. The supported data formats are `jsonl`, `json`, `csv`, `tsv` and `parquet`. Special characters like \ and ' are invalid in the parameter value.	uri_file		True
validation_file_path	Path to the registered validation data asset. The supported data formats are `jsonl`, `json`, `csv`, `tsv` and `parquet`. Special characters like \ and ' are invalid in the parameter value.	uri_file		True
test_file_path	Path to the registered test data asset. The supported data formats are `jsonl`, `json`, `csv`, `tsv` and `parquet`. Special characters like \ and ' are invalid in the parameter value.	uri_file		True
train_mltable_path	Path to the registered training data asset in `mltable` format. Special characters like \ and ' are invalid in the parameter value.	mltable		True
validation_mltable_path	Path to the registered validation data asset in `mltable` format. Special characters like \ and ' are invalid in the parameter value.	mltable		True
test_mltable_path	Path to the registered test data asset in `mltable` format. Special characters like \ and ' are invalid in the parameter value.	mltable		True
user_column_names	Comma separated list of column names to be used for training	string		True

Validation parameters

Name	Description	Type	Default	Optional	Enum
system_properties	Validation parameters propagated from pipeline.	string		True
num_train_epochs	Number of training epochs	integer		True
max_steps	Maximum number of training steps	integer		True
per_device_train_batch_size	Batch size per GPU/CPU for training	integer		True
per_device_eval_batch_size	Batch size per GPU/CPU for evaluation	integer		True
auto_find_batch_size	If set to true, will enable auto_find_batch_size for training	string	false	True	['true', 'false']
learning_rate	Learning rate for optimizer	number		True
adam_beta1	Beta1 hyperparameter for the Adam optimizer	number		True
adam_beta2	Beta2 hyperparameter for the Adam optimizer	number		True
adam_epsilon	Epsilon hyperparameter for the Adam optimizer	number		True
apply_deepspeed	If set to true, will enable deepspeed for training	string	false	True	['true', 'false']
precision	Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision.	string	32	True	['32', '16']
apply_lora	If "true" enables lora.	string	false	True	['true', 'false']
apply_ort	If set to true, will use the ONNXRunTime training	string	false	True	['true', 'false']
deepspeed_stage	This parameter configures which DEFAULT deepspeed config to be used - stage2 or stage3. The default choice is stage2. Note that, this parameter is ONLY applicable when user doesn't pass any config information via deepspeed port.	string	2	True	['2', '3']
ignore_mismatched_sizes	Not setting this flag will raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model.	string	true	True	['true', 'false']
max_seq_length	Controls the maximum length to use when pad_to_max_length parameter is set to `true`. Default is -1 which means the padding is done up to the model's max length. Else will be padded to `max_seq_length`.	integer	-1	True

Outputs

Name	Description	Type
validation_info	Validation status.	uri_file

Environment

azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/112

⚠️ GitHub.com Fallback ⚠️