components nlp_multilabel_datapreprocessing - Azure/azureml-assets GitHub Wiki
Component to preprocess data for automl nlp multilabel classification task
Version: 0.0.2
View in Studio: https://ml.azure.com/registries/azureml/components/nlp_multilabel_datapreprocessing/version/0.0.2
Sequence Classification task arguments
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
label_column_name | label column name | string | False | ||
batch_size | Number of examples to batch before calling the tokenization function | integer | 32 | True |
Inputs
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
train_file_path | Enter the train file path | uri_file | False | ||
valid_file_path | Enter the validation file path | uri_file | False |
Dataset parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
model_selector_output | output folder of model selector containing model metadata like config, checkpoints, tokenizer config | uri_folder | False |
AutoML NLP parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
enable_long_range_text | label key name | boolean | True | True |
Name | Description | Type |
---|---|---|
output_dir | folder to store preprocessed outputs of input data | uri_folder |
azureml://registries/azureml/environments/acft-hf-nlp-gpu/labels/latest