adding_a_new_evaluator - Sidies/MasterThesis-HubLink GitHub Wiki

title: Adding a new Evaluator

This page explains how to add a new Evaluator to the SQA system. The Evaluator is used to evaluate the results of the retrieval process.

Adding a new Evaluator

The implementations of the Evaluators are located in the experimentation/evaluation/implementations folder: ../blob/experiments/sqa-system/sqa_system/experimentation/evaluation/implementations. To add a new Evaluator, you need to create a Python file for the Evaluator in the implementations folder. The file name should be the name of the Evaluator. Inside of the file, you can then add your Python implementations for the Evaluator.

1. Implementation of the Evaluator

The Evaluator should be added as a subclass of the Evaluator class located in the experimentation/evaluation/base folder: ../blob/experiments/sqa-system/sqa_system/experimentation/evaluation/base. The Evaluator class is a base class for all Evaluators in the SQA system. It provides a common interface for all Evaluators and is a subclass of the Scorer class provided by the Weave library. All evaluations that are done are automatically stored in the weights&biases dashboard.

You need to implement the following properties and methods in the class:

score(): This is the main method of the Evaluator. It receives the output of the retrieval process, as well as the ground truth from the QA dataset such as the golden answer, golden triples, and golden document chunks. Based on this data, the evaluator should calculate the score. The output of the method should be a dictionary where the key is the name of the metric and the value is the score. There may also be multiple metrics that are calculated in the same evaluator. The output is a converted dictionary of the PipeIOData class which is located in the core/data/models/ dictory: ../blob/experiments/sqa-system/sqa_system/core/data/models/.
summarize(): This method is used aggregate the scores of the evaluator (averaging). It is useful to implement micro-averaging as macro-averaging is already used by default.
ADDITIONAL_CONFIG_PARAMS: This is a list of additional configuration parameters that are required for the evaluator. The parameters should be of type AdditionalConfigParameter and allow to add additional parameters to the configuration of the evaluator. For example, if the evaluator requires a specific threshold, you can add it to the list.

2. Registration of the Evaluator

After the implementation of the Evaluator, you need to register it in the EvaluatorFactory class located in the experimentation/evaluation/factory/ folder: ../blob/experiments/sqa-system/sqa_system/experimentation/evaluation/factory/. The EvaluatorFactory class is responsible for creating instances of the Evaluator based on the configuration.

To add your evaluator, you first need to add the import at the top of the file. Then you extend the EvaluatorType enum with the name of your evaluator. Here, the value of the enum is used in the configuration file to identify the evaluator:

class EvaluatorType(Enum):
    """The type of the evaluator."""
    RAGAS = "ragas"
    HITATONE = "hit_at_k"
    MAP = "map_at_k"
    MRR = "mrr_at_k"
    BASIC_SCORE = "basic_score"
    BERT_SCORE = "bert_score"
    EXACT_MATCH = "exact_match"
    ROUGE = "rouge_score"
    BLEU = "bleu_score"
    INSTRUCTION_FOLLOWING = "instruction_following"
    # Add your evaluator here

Next you need to add the evaluator to the create() method of the EvaluatorFactory class. The create() method is responsible for creating instances of the Evaluator based on the configuration. You need to add a new if statement that checks whether the evaluator type is equal to your evaluator type. If it is, you should create an instance of your evaluator and return it.

Finally you need to add your evaluator in the get_evaluator_class() method. This method is used to get the class of the evaluator based on the configuration. You need to add a new if statement that checks whether the evaluator type is equal to your evaluator type. If it is, you should return the class of your evaluator.

🥳 That's it! You successfully added a new Evaluator to the SQA system that can be added in a configuration for example:

{
    "additional_params": {
        "model_type": "microsoft/deberta-xlarge-mnli"
    },
    "evaluator_type": "bert_score"
}

This applies the bert score evaluator to the pipeline using the microsoft/deberta-xlarge-mnli model.

adding_a_new_evaluator - Sidies/MasterThesis-HubLink GitHub Wiki

title: Adding a new Evaluator