components llm_rag_qa_data_generation - Azure/azureml-assets GitHub Wiki

LLM - Generate QnA Test Data

llm_rag_qa_data_generation

Overview

Generates a test dataset of questions and answers based on the input documents.

A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or jsonl file. Short-answer, long-answer, summary, and boolean-based QAs are generated.

Version: 0.0.74

Tags

Preview

View in Studio: https://ml.azure.com/registries/azureml/components/llm_rag_qa_data_generation/version/0.0.74

Inputs

Name Description Type Default Optional Enum
openai_api_version Version of OpenAI API to use for communicating with LLM. string 2023-03-15-preview
openai_api_type Type of OpenAI endpoint hosting model. Defaults to azure for AOAI endpoints. string azure
input_data Uri folder of documents containing chunks of data. uri_folder
llm_config JSON Configuration for what model to use for question generation. Must contain following keys: 'type' (value must be 'azure_open_ai' or 'azure'), 'model_name' (name of model to use for summary), 'deployment_name' (name of deployment for model), 'temperature' (randomness in response, float from 0 to 1), 'max_tokens' (number of tokens for response). string {"type": "azure_open_ai", "model_name": "gpt-35-turbo", "deployment_name": "gpt-35-turbo", "temperature": 0, "max_tokens": 2000}
llm_connection Workspace connection resource ID for the completion model. string False
dataset_size Number of questions to generate integer 100
chunk_batch_size Number of chunks to be read and sent to LLM in parallel integer 5
output_format File type to save the dataset as. Options are 'csv' and 'json' string json
deployment_validation Uri file containing information on if the Azure OpenAI deployments, if used, have been validated uri_file True

Outputs

Name Description Type
output_data csv or jsonl file containing the question, answer, context, and metadata sets uri_folder

Environment

azureml:llm-rag@latest

⚠️ **GitHub.com Fallback** ⚠️