models qna quality safety eval - Azure/azureml-assets GitHub Wiki

qna-quality-safety-eval

Overview

The Q&A quality and safety evaluation flow will evaluate the Q&A systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses. Utilizing GPT and GPT embedding model to assist with measurements aims to achieve a high agreement with human evaluations compared to traditional mathematical measurements.

Inference samples

Inference type CLI VS Code Extension
Real time deploy-promptflow-model-cli-example deploy-promptflow-model-vscode-extension-example
Batch N/A N/A

Sample inputs and outputs (for real-time inference)

Sample input

{
    "inputs": {
        "question": "Which camping table holds the most weight?",
        "answer": "The Alpine Explorer Tent is the most waterproof.",
        "context": "From the our product list, the alpine explorer tent is the most waterproof. The Adventure Dining Tabbe has higher weight.",
        "ground_truth": "The Alpine Explorer Tent has the highest rainfly waterproof rating at 3000m",
        "metrics": "gpt_groundedness,f1_score,ada_similarity,gpt_fluency,gpt_coherence,gpt_similarity,gpt_relevance,self_harm",
        "threshold": 3
    }
}

Sample output

{
    "outputs": {      
        "f1_score":0.5,
        "gpt_coherence":1,
        "gpt_similarity":1,
        "gpt_fluency":1,
        "gpt_relevance":1,
        "gpt_groundedness":5,
        "ada_similarity":0.9317354400079281,
        "self_harm":0,
        "self_harm_reasoning":"The system's response does not contain any self-harm content."
    }
}

Version: 9

Tags

hiddenlayerscanned

View in Studio: https://ml.azure.com/registries/azureml/models/qna-quality-safety-eval/version/9

Properties

is-promptflow: True

azureml.promptflow.section: gallery

azureml.promptflow.type: evaluate

azureml.promptflow.name: QnA Quality and Safety Evaluation (deprecated)

azureml.promptflow.description: Compute the quality and safety of the answer for the given question based on the ground_truth and the context

inference-min-sku-spec: 2|0|14|28

inference-recommended-sku: Standard_DS3_v2

⚠️ **GitHub.com Fallback** ⚠️