components llm_rag_qa_data_generation - Azure/azureml-assets GitHub Wiki
Generates a test dataset of questions and answers based on the input documents.
A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or jsonl file. Short-answer, long-answer, summary, and boolean-based QAs are generated.
Version: 0.0.74
Preview
View in Studio: https://ml.azure.com/registries/azureml/components/llm_rag_qa_data_generation/version/0.0.74
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
openai_api_version | Version of OpenAI API to use for communicating with LLM. | string | 2023-03-15-preview | ||
openai_api_type | Type of OpenAI endpoint hosting model. Defaults to azure for AOAI endpoints. | string | azure | ||
input_data | Uri folder of documents containing chunks of data. | uri_folder | |||
llm_config | JSON Configuration for what model to use for question generation. Must contain following keys: 'type' (value must be 'azure_open_ai' or 'azure'), 'model_name' (name of model to use for summary), 'deployment_name' (name of deployment for model), 'temperature' (randomness in response, float from 0 to 1), 'max_tokens' (number of tokens for response). | string | {"type": "azure_open_ai", "model_name": "gpt-35-turbo", "deployment_name": "gpt-35-turbo", "temperature": 0, "max_tokens": 2000} | ||
llm_connection | Workspace connection resource ID for the completion model. | string | False | ||
dataset_size | Number of questions to generate | integer | 100 | ||
chunk_batch_size | Number of chunks to be read and sent to LLM in parallel | integer | 5 | ||
output_format | File type to save the dataset as. Options are 'csv' and 'json' | string | json | ||
deployment_validation | Uri file containing information on if the Azure OpenAI deployments, if used, have been validated | uri_file | True |
Name | Description | Type |
---|---|---|
output_data | csv or jsonl file containing the question, answer, context, and metadata sets | uri_folder |
azureml:llm-rag@latest