models QA Evaluator - Azure/azureml-assets GitHub Wiki



Score range Float [0-1] for F1 score evaluator: the higher, the more similar is the response with ground truth. Integer [1-5] for AI-assisted quality evaluators for question-and-answering (QA) scenarios: where 1 is bad and 5 is good
What is this metric? Measures comprehensively the groundedness, coherence, and fluency of a response in QA scenarios, as well as the textual similarity between the response and its ground truth.
How does it work? The QA evaluator leverages prompt-based AI-assisted evaluators using a language model as a judge on the response to a user query, including GroundednessEvaluator (needs input context), RelevanceEvaluator, CoherenceEvaluator, FluencyEvaluator, and SimilarityEvaluator (needs input ground_truth). It also includes a Natural Language Process (NLP) metric F1ScoreEvaluator using F1 score on shared tokens between the response and its ground truth. See the definitions and scoring rubrics for these AI-assisted evaluators and F1 score evaluator.
When to use it? Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications.
What does it need as input? Query, Response, Context, Ground Truth

Version: 1



View in Studio:


is-promptflow: True

is-evaluator: True

⚠️ ** Fallback** ⚠️