components dataset_preprocessor - Azure/azureml-assets GitHub Wiki
Dataset Preprocessor
Version: 0.0.9
View in Studio: https://ml.azure.com/registries/azureml/components/dataset_preprocessor/version/0.0.9
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
dataset | Path to load the dataset. | uri_file | False | ||
template_input | JSON serialized dictionary to perform preprocessing on the dataset. Must contain key-value pair where key is the name of the column enclosed in " " and associated dict value is presented using jinja template logic which will be used to extract respective value from the dataset. Example format: {"<user_column_name>": {{key in the json file for this column}}, ....}. The processed output will be dumped to a jsonl file in this format: {"<user_column_name>": "", ....}. | string | True | ||
script_path | Path to the custom preprocessor python script provided by user. If both this input and template_inputare provided, then, template_input` is ignored. This [base template] (https://github.com/Azure/azureml-assets/tree/main/assets/aml-benchmark/scripts/custom_dataset_preprocessors/base_preprocessor_template.py) should be used to create a custom preprocessor script. |
uri_file | True | ||
encoder_config | JSON serialized dictionary to perform mapping. Must contain key-value pair "column_name": "<actual_column_name>" whose value needs mapping, followed by key-value pairs containing idtolabel or labeltoid mappers. Example format: {"column_name":"label", "0":"NEUTRAL", "1":"ENTAILMENT", "2":"CONTRADICTION"} | string | True |
Name | Description | Type |
---|---|---|
output_dataset | Path to the output the processed .jsonl file. | uri_file |
azureml://registries/azureml/environments/model-evaluation/labels/latest