models AutoML Text Classification - Azure/azureml-assets GitHub Wiki

AutoML-Text-Classification

Overview

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased productivity and scalability.

AutoML Text Classification enables you to classify or categorize texts into predefined groups. Your dataset should be a labeled set of texts with their relevant tags that categorize each piece of text into a predefined group.

With this functionality, you can:

  • Directly use datasets coming from Azure Machine Learning data labeling
  • Utilize labeled data to create NLP models without any training code.
  • Enhance model performance by selecting the appropriate algorithm and fine-tuning the hyperparameters selecting the appropriate algorithm from a large selection of models or let AutoML find the best model for you.
  • Either download or deploy the resulting model as a endpoint in Azure Machine Learning.
  • Scale the operationalization process with the help of Azure Machine Learning's MLOps and ML Pipelines capabilities.

See How to train nlp models for more information.

Training Details

Training Data

To create NLP models, it is necessary to provide labeled text data as input for model training. For text classification, the dataset can contain several text columns and exactly one label column.

Please see documentation for data preparation requirements.

Language Setting

Currently, language selection defaults to English. But Automated ML supports 104 languages leveraging language specific and multilingual pre-trained text DNN models. Please see Language setting for documentation.

Training Procedure

You can initiate individual trials, or perform a manual sweeps, which explores multiple hyperparameter values near the more promising models and hyperparameter configurations.

For more information, see Model sweeping and hyperparameter tuning.

License

apache-2.0

Finetuning Samples

Task Dataset Python sample (Notebook) CLI with YAML
Multiclass Text Classification Yelp review automl-nlp-multiclass-sentiment-mlflow.ipynb cli-automl-text-classification-newsgroup.yml
Multilabel Text Classification arXiv paper abstract automl-nlp-multilabel-paper-cat.ipynb cli-automl-text-classification-multilabel-paper-cat.yml

Sample input and output

Sample input

{
    "input_data": {
        "input_string": ["Today was an amazing day!", "It was an unfortunate series of events."]
    }
}

Sample output

[
    {
        "0": "Fake"
    },
    {
        "0": "Fake"
    }
]

Version: 2

Tags

SharedComputeCapacityEnabled license : apache-2.0 task : text-classification finetune_compute_allow_list : ['Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2'] inference_compute_allow_list : ['Standard_D4a_v4', 'Standard_D4as_v4', 'Standard_DS4_v2', 'Standard_D8a_v4', 'Standard_D8as_v4', 'Standard_DS5_v2', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_FX4mds', 'Standard_F8s_v2', 'Standard_FX12mds', 'Standard_F16s_v2', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E4s_v3', 'Standard_E8s_v3', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']

View in Studio: https://ml.azure.com/registries/azureml/models/AutoML-Text-Classification/version/2

License: apache-2.0

Properties

SharedComputeCapacityEnabled: True

finetuning-tasks: token-classification

finetune-min-sku-spec: 4|1|28|176

finetune-recommended-sku: Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

inference-min-sku-spec: 4|0|16|32

inference-recommended-sku: Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

languages: en

⚠️ **GitHub.com Fallback** ⚠️