adding_a_new_evaluator - Sidies/MasterThesis-HubLink GitHub Wiki
This page explains how to add a new Evaluator to the SQA system. The Evaluator is used to evaluate the results of the retrieval process.
The implementations of the Evaluators are located in the experimentation/evaluation/implementations
folder: ../blob/experiments/sqa-system/sqa_system/experimentation/evaluation/implementations. To add a new Evaluator, you need to create a Python file for the Evaluator in the implementations
folder. The file name should be the name of the Evaluator. Inside of the file, you can then add your Python implementations for the Evaluator.
The Evaluator should be added as a subclass of the Evaluator
class located in the experimentation/evaluation/base
folder: ../blob/experiments/sqa-system/sqa_system/experimentation/evaluation/base. The Evaluator
class is a base class for all Evaluators in the SQA system. It provides a common interface for all Evaluators and is a subclass of the Scorer
class provided by the Weave library. All evaluations that are done are automatically stored in the weights&biases dashboard.
You need to implement the following properties and methods in the class:
-
score()
: This is the main method of the Evaluator. It receives the output of the retrieval process, as well as the ground truth from the QA dataset such as the golden answer, golden triples, and golden document chunks. Based on this data, the evaluator should calculate the score. The output of the method should be a dictionary where the key is the name of the metric and the value is the score. There may also be multiple metrics that are calculated in the same evaluator. The output is a converted dictionary of thePipeIOData
class which is located in thecore/data/models/
dictory: ../blob/experiments/sqa-system/sqa_system/core/data/models/. -
summarize()
: This method is used aggregate the scores of the evaluator (averaging). It is useful to implement micro-averaging as macro-averaging is already used by default. -
ADDITIONAL_CONFIG_PARAMS
: This is a list of additional configuration parameters that are required for the evaluator. The parameters should be of typeAdditionalConfigParameter
and allow to add additional parameters to the configuration of the evaluator. For example, if the evaluator requires a specific threshold, you can add it to the list.
After the implementation of the Evaluator, you need to register it in the EvaluatorFactory
class located in the experimentation/evaluation/factory/
folder: ../blob/experiments/sqa-system/sqa_system/experimentation/evaluation/factory/. The EvaluatorFactory
class is responsible for creating instances of the Evaluator based on the configuration.
To add your evaluator, you first need to add the import at the top of the file. Then you extend the EvaluatorType
enum with the name of your evaluator. Here, the value of the enum is used in the configuration file to identify the evaluator:
class EvaluatorType(Enum):
"""The type of the evaluator."""
RAGAS = "ragas"
HITATONE = "hit_at_k"
MAP = "map_at_k"
MRR = "mrr_at_k"
BASIC_SCORE = "basic_score"
BERT_SCORE = "bert_score"
EXACT_MATCH = "exact_match"
ROUGE = "rouge_score"
BLEU = "bleu_score"
INSTRUCTION_FOLLOWING = "instruction_following"
# Add your evaluator here
Next you need to add the evaluator to the create()
method of the EvaluatorFactory
class. The create()
method is responsible for creating instances of the Evaluator based on the configuration. You need to add a new if
statement that checks whether the evaluator type is equal to your evaluator type. If it is, you should create an instance of your evaluator and return it.
Finally you need to add your evaluator in the get_evaluator_class()
method. This method is used to get the class of the evaluator based on the configuration. You need to add a new if
statement that checks whether the evaluator type is equal to your evaluator type. If it is, you should return the class of your evaluator.
🥳 That's it! You successfully added a new Evaluator to the SQA system that can be added in a configuration for example:
{
"additional_params": {
"model_type": "microsoft/deberta-xlarge-mnli"
},
"evaluator_type": "bert_score"
}
This applies the bert score evaluator to the pipeline using the microsoft/deberta-xlarge-mnli
model.