components diffusers_text_to_image_dreambooth_pipeline - Azure/azureml-assets GitHub Wiki
Pipeline component for text to image dreambooth training using diffusers library and transformers models.
Version: 0.0.10
View in Studio: https://ml.azure.com/registries/azureml/components/diffusers_text_to_image_dreambooth_pipeline/version/0.0.10
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
compute_model_import | Compute to be used for model_import eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
compute_finetune | Compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster' | string | False | ||
instance_count | Number of nodes to be used for finetuning (used for distributed training) | integer | 1 | True | |
process_count_per_instance | Number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune | integer | 1 | True |
Model Selector Component Model family
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
model_family | Which framework the model belongs to. | string | HuggingFaceImage | True | ['HuggingFaceImage'] |
model_name | Please select models from AzureML Model Assets for all supported models. For HuggingFace models, which are not supported in AuzreML model registry, input HuggingFace model_name here. The Model will be downloaded from HuggingFace hub using this model_name and are subject to third party license terms available on the HuggingFace model details page. It is the user responsibility to comply with the model's license terms. | string | True | ||
pytorch_model | Pytorch Model registered in AzureML Asset. | custom_model | True | ||
mlflow_model | Mlflow Model registered in AzureML Asset. | mlflow_model | True | ||
download_from_source | Download model directly from HuggingFace instead of system registry | boolean | False | True |
Finetuning Component component input: Instance data dir
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
instance_data_dir | A folder containing the training data of instance images. | uri_folder | False | ||
class_data_dir | (Optional) A folder containing the training data of class images. You can place existing images in class_data_dir, and the training job will generate any additional images so that num_class_images are present in class_data_dir during training time. | uri_folder | True | ||
task_name | Which task the model is solving. | string | ['stable-diffusion-text-to-image'] |
Instance prompt
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
instance_prompt | The prompt with identifier specifying the instance. | string | True | ||
resolution | The image resolution for training. | integer | 512 | True |
Lora parameters LoRA reduces the number of trainable parameters by learning pairs of rank-decompostion matrices while freezing the original weights. This vastly reduces the storage requirement for large models adapted to specific tasks and enables efficient task-switching during deployment all without introducing inference latency. LoRA also outperforms several other adaptation methods including adapter, prefix-tuning, and fine-tuning.
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_lora | If "true" enables lora. | boolean | True | False | |
lora_alpha | alpha attention parameter for lora. | integer | 128 | True | |
lora_r | lora dimension | integer | 8 | True | |
lora_dropout | lora dropout value | number | 0.0 | True |
Tokenizer
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
tokenizer_max_length | The maximum length of the tokenizer. If not set, will default to the tokenizer's max length. | integer | True |
Text Encoder
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
text_encoder_type | Text encoder to be used. | string | True | ['CLIPTextModel', 'T5EncoderModel'] | |
text_encoder_name | Huggingface id of text encoder. This model should of type specified in text_encoder_type . If not specified the default from the model will be used. |
string | True | ||
train_text_encoder | Whether to train the text encoder. If set, the text encoder should be float32 precision. | boolean | False | True | |
pre_compute_text_embeddings | Whether or not to pre-compute text embeddings. If text embeddings are pre-computed, the text encoder will not be kept in memory during training and will leave more GPU memory available for training the rest of the model. This is not compatible with --train_text_encoder . |
boolean | True | True | |
text_encoder_use_attention_mask | Whether to use attention mask for the text encoder | boolean | False | True |
UNET related
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
class_labels_conditioning | The optional class_label conditioning to pass to the unet, available values are timesteps . |
string | True |
Noise Scheduler
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
noise_scheduler_name | Noise scheduler to be used. | string | True | ['DPMSolverMultistepScheduler', 'DDPMScheduler', 'PNDMScheduler'] | |
noise_scheduler_num_train_timesteps | The number of diffusion steps to train the model. | integer | True | ||
noise_scheduler_variance_type | Clip the variance when adding noise to the denoised sample. | string | True | ['fixed_small', 'fixed_small_log', 'fixed_large', 'fixed_large_log', 'learned', 'learned_range'] | |
noise_scheduler_prediction_type | Prediction type of the scheduler function; can be epsilon (predicts the noise of the diffusion process), sample (directly predicts the noisy sample) or v_prediction` (see section 2.4 of Imagen Video paper). |
string | True | ['epsilon', 'sample', 'v_prediction'] | |
noise_scheduler_timestep_spacing | The way the timesteps should be scaled. Refer to Table 2 of the Common Diffusion Noise Schedules and Sample Steps are Flawed for more information. | string | True | ||
noise_scheduler_steps_offset | An offset added to the inference steps. You can use a combination of offset=1 and set_alpha_to_one=False to make the last step use step 0 for the previous alpha product like in Stable Diffusion. |
integer | True | ||
extra_noise_scheduler_args | Optional additional arguments that are supplied to noise scheduler. The arguments should be semi-colon separated key value pairs and should be enclosed in double quotes. For example, "clip_sample_range=1.0; clip_sample=True" for DDPMScheduler. | string | True | ||
offset_noise | Fine-tuning against a modified noise. See https://www.crosslabs.org//blog/diffusion-with-offset-noise for more information. | boolean | True |
Prior preservation loss
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
with_prior_preservation | Flag to add prior preservation loss. | boolean | True | ||
class_prompt | The prompt to specify images in the same class as provided instance images. | string | True | ||
num_class_images | Minimal class images for prior preservation loss. If there are not enough images already present in class_data_dir, additional images will be sampled with class_prompt. | integer | 100 | True | |
prior_generation_precision | Choose prior generation precision between fp32, fp16 and bf16 (bfloat16). Bf16 requires PyTorch >= 1.10.and an Nvidia Ampere GPU. Default to fp16 if a GPU is available else fp32. | string | fp32 | True | ['fp32', 'fp16', 'bf16'] |
prior_loss_weight | The weight of prior preservation loss. | number | 1.0 | True | |
sample_batch_size | Batch size (per device) for sampling class images when training with_prior_preservation set to True. | integer | 4 | True |
Validation parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
num_validation_images | Specify number of images to generate using instance_prompt. Images are stored in the output/checkpoint-* directories. Please note that this will increase the training time. If you select num_validation_images = 0, then run will generate 5 images in last checkpoint. | integer | 0 |
Training related
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
number_of_workers | Number of subprocesses to use for data loading (PyTorch only). 0 means that the data will be loaded in the main process. | integer | 6 | True | |
number_of_epochs | Number of training epochs. If left empty, will be chosen automatically based on the task type and model selected. | integer | True | ||
max_steps | If set to a positive number, the total number of training steps to perform. Overrides 'number_of_epochs'. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted. If left empty, will be chosen automatically based on the task type and model selected. | integer | True | ||
training_batch_size | Train batch size. If left empty, will be chosen automatically based on the task type and model selected. | integer | 1 | True | |
auto_find_batch_size | Flag to enable auto finding of batch size. If the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) enabling auto_find_batch_size will find the correct batch size by iteratively reducing 'per_device_train_batch_size' by a factor of 2 till the OOM is fixed. | boolean | False | True |
learning rate and learning rate scheduler
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
learning_rate | Start learning rate. Defaults to linear scheduler. If left empty, will be chosen automatically based on the task type and model selected. | number | True | ||
learning_rate_scheduler | The scheduler type to use. If left empty, will be chosen automatically based on the task type and model selected. | string | True | ['warmup_linear', 'warmup_cosine', 'warmup_cosine_with_restarts', 'warmup_polynomial', 'constant', 'warmup_constant'] | |
warmup_steps | Number of steps used for a linear warmup from 0 to learning_rate. If left empty, will be chosen automatically based on the task type and model selected. | integer | 0 | True |
optimizer
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
optimizer | optimizer to be used while training. 'adamw_ort_fused' optimizer is only supported for ORT training. If left empty, will be chosen automatically based on the task type and model selected. | string | True | ['adamw_hf', 'adamw', 'sgd', 'adafactor', 'adagrad', 'adamw_ort_fused'] | |
weight_decay | The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW and sgd optimizer. If left empty, will be chosen automatically based on the task type and model selected. | number | 0 | True | |
extra_optim_args | Optional additional arguments that are supplied to SGD Optimizer. The arguments should be semi-colon separated key value pairs and should be enclosed in double quotes. For example, "momentum=0.5; nesterov=True" for sgd. Please make sure to use a valid parameter names for the chosen optimizer. For exact parameter names, please refer https://pytorch.org/docs/1.13/generated/torch.optim.SGD.html#torch.optim.SGD for SGD. Parameters supplied in extra_optim_args will take precedence over the parameter supplied via other arguments such as weight_decay. If weight_decay is provided via "weight_decay" parameter and via extra_optim_args both, values specified in extra_optim_args will be used. | string | True |
gradient accumulation
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
gradient_accumulation_step | Number of update steps to accumulate the gradients for, before performing a backward/update pass. If left empty, will be chosen automatically based on the task type and model selected. | integer | True | ||
max_grad_norm | Maximum gradient norm (for gradient clipping). If left empty, will be chosen automatically based on the task type and model selected. | number | True |
mixed precision training
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 32 | True | ['32', '16'] |
random seed
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
random_seed | Random seed that will be set at the beginning of training. | integer | 42 | True |
logging strategy parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
logging_strategy | The logging strategy to adopt during training. | string | epoch | True | ['epoch', 'steps'] |
logging_steps | Number of update steps between two logs if logging_strategy='steps'. | integer | 500 | True | |
save_total_limit | If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. If the value is -1 saves all checkpoints". | integer | 5 | True |
save mlflow model
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
save_as_mlflow_model | Save as mlflow model with pyfunc as flavour. | boolean | True | True |
########################### Finetuning Component ########################### #
Name | Description | Type |
---|---|---|
mlflow_model_folder | Output dir to save the finetune model as mlflow model. | mlflow_model |
pytorch_model_folder | Output dir to save the finetune model as torch model. | custom_model |