components oss_distillation_generate_data_batch_postprocess - Azure/azureml-assets GitHub Wiki

OSS Distillation Generate Data Postprocess Batch Scoring

oss_distillation_generate_data_batch_postprocess

Overview

Component to prepare data returned from teacher model enpoint in batch

Version: 0.0.1

View in Studio: https://ml.azure.com/registries/azureml/components/oss_distillation_generate_data_batch_postprocess/version/0.0.1

Inputs

Inputs

Name	Description	Type	Default	Optional	Enum
train_file_path	Path to the registered training data asset. The supported data formats are `jsonl`, `json`, `csv`, `tsv` and `parquet`.	uri_file
validation_file_path	Path to the registered validation data asset. The supported data formats are `jsonl`, `json`, `csv`, `tsv` and `parquet`.	uri_file		True
hash_train_data	jsonl file containing the hash for each payload.	uri_file		False
hash_validation_data	jsonl file containing the hash for each payload.	uri_file		True
batch_score_train_result	Path to the directory containing jsonl file(s) that have the result for each payload.	uri_folder
batch_score_validation_result	Path to the directory containing jsonl file(s) that have the result for each payload.	uri_folder		True
min_endpoint_success_ratio	The minimum value of (successful_requests / total_requests) required for classifying inference as successful. If (successful_requests / total_requests) < min_endpoint_success_ratio, the experiment will be marked as failed. By default it is 0.7 (0 means all requests are allowed to fail while 1 means no request should fail.)	number	0.7
enable_chain_of_thought	Enable Chain of thought for data generation	string	false	True
enable_chain_of_density	Enable Chain of density for text summarization	string	false	True
data_generation_task_type	Data generation task type. Supported values are: 1. NLI: Generate Natural Language Inference data 2. CONVERSATION: Generate conversational data (multi/single turn) 3. NLU_QA: Generate Natural Language Understanding data for Question Answering data 4. MATH: Generate Math data for numerical responses 5. SUMMARIZATION: Generate Key Summary for an Article	string			['NLI', 'CONVERSATION', 'NLU_QA', 'MATH', 'SUMMARIZATION']
connection_config_file	Connection config file for batch scoring	uri_file

Outputs

Name	Description	Type
generated_batch_train_file_path	Generated train data	uri_file
generated_batch_validation_file_path	Generated validation data	uri_file

Environment

azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/76

⚠️ GitHub.com Fallback ⚠️