models QA Evaluator - Azure/azureml-assets GitHub Wiki
Score range | Float [0-1] for F1 score evaluator: the higher, the more similar is the response with ground truth. Integer [1-5] for AI-assisted quality evaluators for question-and-answering (QA) scenarios: where 1 is bad and 5 is good |
What is this metric? | Measures comprehensively the groundedness, coherence, and fluency of a response in QA scenarios, as well as the textual similarity between the response and its ground truth. |
How does it work? | The QA evaluator leverages prompt-based AI-assisted evaluators using a language model as a judge on the response to a user query, including GroundednessEvaluator (needs input context ), RelevanceEvaluator , CoherenceEvaluator , FluencyEvaluator , and SimilarityEvaluator (needs input ground_truth ). It also includes a Natural Language Process (NLP) metric F1ScoreEvaluator using F1 score on shared tokens between the response and its ground truth. See the definitions and scoring rubrics for these AI-assisted evaluators and F1 score evaluator. |
When to use it? | Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications. |
What does it need as input? | Query, Response, Context, Ground Truth |
Version: 1
Preview
View in Studio: https://ml.azure.com/registries/azureml/models/QA-Evaluator/version/1
is-promptflow: True
is-evaluator: True