components dataset_downloader - Azure/azureml-assets GitHub Wiki

Dataset Downloader

dataset_downloader

Overview

Downloads the dataset onto blob store.

Version: 0.0.9

View in Studio: https://ml.azure.com/registries/azureml/components/dataset_downloader/version/0.0.9

Inputs

Name Description Type Default Optional Enum
dataset_name Name of the dataset to download from HuggingFace; must be null if script is specified. string True
configuration If a specific sub-dataset of the dataset to download, specify the configuration name; specify 'all' to download all configurations; specify comma-separated values to download multiple configurations (Ex: config1,config2). Else, leave it null. string True
split If a specific split of the dataset to download, specify the split name; specify 'all' to download all splits. string False
script_path Path to the dataset loading script. Must follow the HuggingFace dataset loading script template. For example, please refer https://github.com/Azure/azureml-assets/tree/main/assets/aml-benchmark/scripts/data_loaders. uri_file True

Outputs

Name Description Type
output_dataset Path to the directory where the dataset will be downloaded. uri_folder

Environment

azureml://registries/azureml/environments/model-evaluation/labels/latest

⚠️ **GitHub.com Fallback** ⚠️