components nlp_multilabel_datapreprocessing - Azure/azureml-assets GitHub Wiki

DataPreProcessing for AutoMLNLPMultilabel

nlp_multilabel_datapreprocessing

Overview

Component to preprocess data for automl nlp multilabel classification task

Version: 0.0.2

View in Studio: https://ml.azure.com/registries/azureml/components/nlp_multilabel_datapreprocessing/version/0.0.2

Inputs

Sequence Classification task arguments

Name Description Type Default Optional Enum
label_column_name label column name string False
batch_size Number of examples to batch before calling the tokenization function integer 32 True

Inputs

Name Description Type Default Optional Enum
train_file_path Enter the train file path uri_file False
valid_file_path Enter the validation file path uri_file False

Dataset parameters

Name Description Type Default Optional Enum
model_selector_output output folder of model selector containing model metadata like config, checkpoints, tokenizer config uri_folder False

AutoML NLP parameters

Name Description Type Default Optional Enum
enable_long_range_text label key name boolean True True

Outputs

Name Description Type
output_dir folder to store preprocessed outputs of input data uri_folder

Environment

azureml://registries/azureml/environments/acft-hf-nlp-gpu/labels/latest

⚠️ **GitHub.com Fallback** ⚠️