components oss_text_generation_pipeline - Azure/azureml-assets GitHub Wiki
FTaaS Pipeline component for text generation
Version: 0.0.25
View in Studio: https://ml.azure.com/registries/azureml/components/oss_text_generation_pipeline/version/0.0.25
Compute parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
instance_type_data_import | Instance type to be used for data_import component in case of virtual cluster compute, eg. Singularity.D8_v3. The parameter compute_data_import must be set to 'virtual cluster' for instance_type to be used | string | Singularity.D8_v3 | True | |
instance_type_finetune | Instance type to be used for finetune component in case of virtual cluster compute, eg. Singularity.ND40_v2. The parameter compute_finetune must be set to 'virtual cluster' for instance_type to be used | string | Singularity.ND40_v2 | True | |
number_of_gpu_to_use_finetuning | number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune | integer | 1 | True |
Continual-Finetuning model path
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
mlflow_model_path | MLflow model asset path. Special characters like \ and ' are invalid in the parameter value. | mlflow_model | False |
Preprocessing parameters TODO remove text key if the format is made similar to openai
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
text_key | key for text in an example. format your data keeping in mind that text is concatenated with ground_truth while finetuning in the form - text + groundtruth. for eg. "text"="knock knock\n", "ground_truth"="who's there"; will be treated as "knock knock\nwho's there" | string | False | ||
ground_truth_key | key for ground_truth in an example. we take separate column for ground_truth to enable use cases like summarization, translation, question_answering, etc. which can be repurposed in form of text-generation where both text and ground_truth are needed. This separation is useful for calculating metrics. for eg. "text"="Summarize this dialog:\n{input_dialogue}\nSummary:\n", "ground_truth"="{summary of the dialogue}" | string | True |
Dataset path Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
train_file_path | Path to the registered training data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | False | ||
validation_file_path | Path to the registered validation data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | True |
Finetuning parameters Training parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
max_seq_length | Default is 4096. | integer | 4096 | True | |
num_train_epochs | training epochs | integer | 1 | True | |
per_device_train_batch_size | Train batch size | integer | 1 | True | |
learning_rate | Start learning rate. | number | 0.0003 | True |
Validation parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
system_properties | Validation parameters propagated from pipeline. | string | True |
Compute parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
compute_data_import | compute to be used for model_import eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | virtual cluster | True | |
compute_finetune | compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | virtual cluster | True |
Model parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
model_asset_id | Asset id of model | string | False |
Model registration
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
registered_model_name | Name of the registered model | string | True |
Name | Description | Type |
---|---|---|
output_model | Output dir to save the finetuned lora weights | uri_folder |