components nlp_multilabel_datapreprocessing - Azure/azureml-assets GitHub Wiki

DataPreProcessing for AutoMLNLPMultilabel

nlp_multilabel_datapreprocessing

Overview

Component to preprocess data for automl nlp multilabel classification task

Version: 0.0.80

View in Studio: https://ml.azure.com/registries/azureml/components/nlp_multilabel_datapreprocessing/version/0.0.80

Inputs

Sequence Classification task arguments

Name	Description	Type	Default	Optional	Enum
label_column_name	label column name	string		False
batch_size	Number of examples to batch before calling the tokenization function	integer	32	True

Inputs

Name	Description	Type	Default	Optional	Enum
train_file_path	Enter the train file path	uri_file		False
valid_file_path	Enter the validation file path	uri_file		False

Dataset parameters

Name	Description	Type	Default	Optional	Enum
model_selector_output	output folder of model selector containing model metadata like config, checkpoints, tokenizer config	uri_folder		False

AutoML NLP parameters

Name	Description	Type	Default	Optional	Enum
enable_long_range_text	label key name	boolean	True	True

Outputs

Name	Description	Type
output_dir	folder to store preprocessed outputs of input data	uri_folder

Environment

azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/109

⚠️ GitHub.com Fallback ⚠️