components nlp_textclassification_ner - Azure/azureml-assets GitHub Wiki

PipelineComponent for AutoML NLP NER

nlp_textclassification_ner

Overview

Pipeline component for AutoML NLP NER

Version: 0.0.2

View in Studio: https://ml.azure.com/registries/azureml/components/nlp_textclassification_ner/version/0.0.2

Inputs

Name Description Type Default Optional Enum
compute_model_import compute to be used for model_selector eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' string False
compute_preprocess compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' string False
compute_finetune compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' string False
compute_test_model compute to be used for test_model eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' string False
num_nodes_finetune number of nodes to be used for finetuning (used for distributed training) integer 1 True
process_count_per_instance_finetune number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune integer 1 True
model_name model id used to load model checkpoint. string bert-base-uncased

Dataset parameters

Name Description Type Default Optional Enum
training_data Enter the train file path uri_file False
validation_data Enter the validation file path uri_file False

Training parameters

Name Description Type Default Optional Enum
training_batch_size Train batch size integer 32 True
validation_batch_size Validation batch size integer 32 True
number_of_epochs Number of epochs to train integer 3 True
gradient_accumulation_steps Gradient acc integer 1 True
learning_rate Start learning rate. Defaults to linear scheduler. number 5e-05 True
warmup_steps Number of steps used for a linear warmup from 0 to learning_rate integer 0 True
weight_decay The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer number 0.0 True
learning_rate_scheduler The scheduler type to use string linear True ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup']
precision Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. string 16 True ['32', '16']

MLFlow Parameters

Name Description Type Default Optional Enum
enable_full_determinism Ensure reproducible behavior during distributed training string false True ['true', 'false']
evaluation_strategy The evaluation strategy to adopt during training string epoch True ['epoch', 'steps']
evaluation_steps_interval The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0. number 0.0 True
evaluation_steps Number of update steps between two evals if evaluation_strategy='steps' integer 500 True
logging_strategy The logging strategy to adopt during training. string steps True ['epoch', 'steps']
logging_steps Number of update steps between two logs if logging_strategy='steps' integer 500 True
primary_metric Specify the metric to use to compare two different models string accuracy True ['loss', 'f1_macro', 'mcc', 'accuracy', 'precision_macro', 'recall_macro']

Deepspeed Parameters

Name Description Type Default Optional Enum
apply_deepspeed If set to true, will enable deepspeed for training string true True ['true', 'false']

ORT Parameters

Name Description Type Default Optional Enum
apply_ort If set to true, will use the ONNXRunTime training string true True ['true', 'false']
deepspeed_config Deepspeed config to be used for finetuning uri_file True

Outputs

Name Description Type
pytorch_model_folder_finetune Output dir to save the finetune model and other metadata uri_folder
mlflow_model_folder_finetune Output dir to save the finetune model as mlflow model mlflow_model
⚠️ **GitHub.com Fallback** ⚠️