creating_a_new_experiment - Sidies/MasterThesis-HubLink GitHub Wiki
On this page, we will guide you through the process of creating a new experiment in the SQA system. This includes setting up the experiment configuration, running the experiment, and generating the plots and tables for the results.
To run an experiment, the ExperimentRunner
class is responsible. It is located in the experiments
folder of the SQA system: ../blob/experiments/sqa_system/sqa-system/experimentation/experiment_runner.py. The experiment runner is intialized with an ExperimentConfig
object, which contains all the necessary information to run the experiment. The ExperimentConfig
class is located in the config models folder of the SQA system: ../blob/experiments/sqa-system/sqa_system/core/config/models/experiment_config.py.
The ExperimentConfig
class can be initialized with a dictionary and at the core contains the following attributes:
YOUR_EXPERIMENT_CONFIG = ExperimentConfig.from_dict({
"additional_params": {}, # This can stay empty
"base_pipeline_config": { # The base pipeline config is used to run the experiment. If parameter ranges are provided, the experiment runner will create a new pipeline config for each combination of parameters.
"additional_params": {}, # This can stay empty
"pipes": [ # Here the pipeline is defined. The pipeline is a list of pipes which are subsequently executed. The order of the pipes is important, as the output of one pipe is the input of the next pipe.
{
...
}
],
"parameter_ranges": [ # The parameter ranges are used to create a new pipeline config for each combination of parameters. The parameter ranges are defined as a list of dictionaries.
{
"config_name": ..., # The name of the config in the pipeline config to be changed
"parameter_name": ..., # The name of the parameter in the config to be changed
"values": [ ... ], # The values to be used for the parameter. The experiment runner will create a new pipeline config for each combination of values.
}
],
"evaluators": [ # The Configurations for the evaluators that are used to evaluate the results of the experiment. The evaluators are defined as a list of dictionaries.
{
...
}
]
}
})
When constructing the ExperimentConfig
object it is helpful to look at some examples. You can look into the Experiments
folder of the SQA system where we have run several experiments and defined multiple of such ExperimentConfig
objects. For example here: ../blob/experiments/sqa-system/experiments/1_experiment/runs/1_parameter_selection/
When you have created the ExperimentConfig
object, the ExperimentRunner
is initialized with the configuration object. Furthermore, you can set settings which configurate the behavior of the experiment runner. For example, you can set the folder where the results are stored.
runner = ExperimentRunner(
experiment_config=YOUR_EXPERIMENT_CONFIG,
settings=ExperimentRunnerSettings(
results_folder_path="results", # The folder where the results are stored
qa_data_path=qa_dataset_path, # The path to the QA dataset that should be used for the experiment
debugging=True, # If set to true, the experiment is run at log level printing more information into the log file
log_to_results_folder=True, # If set to true, the log file is stored in the results folder
weave_project_name="experiment_tests", # The name of the project in Weave where the results are stored
number_of_workers=5 # How many workers should be used to run the experiment. This queries multiple questions in parallel.
)
)
results = runner.run() # Here, a DataFrame with the results is returned
🥳 Now your experiment will be run and the results will be stored in the specified folder.
After the experiment is run, it will generate a results folder of the following structure:
[THE UNIQUE IDENTIFIER OF THE EXPERIMENT CONFIG]/
├── configs/
│ ├── [THE UNIQUE IDENTIFIER OF THE PIPELINE CONFIG].json # The pipeline config used to run the experiment.
│ └── ... # There may be multiple of these files if the experiment was run with multiple pipeline configs in case that the parameter ranges were used.
├── predictions/
│ ├── [THE UNIQUE IDENTIFIER OF THE PIPELINE CONFIG].csv # The predictions of the experiment stored as a CSV file where each row is a question and the corresponding outputs of the pipeline. It also includes the metrics that have been calculated using the configurated evaluators.
│ └── ... # There may be multiple of these files if the experiment was run with multiple pipeline configs in case that the parameter ranges were used.
└── experiment.log # The log file of the experiment. This contains all the information that was printed during the experiment. It is stored in the results folder if the `log_to_results_folder` setting was set to true.
To generate the plots and tables, we need to extract those from the predictions csv files. This is done using the ExperimentVisualizer
class. The ExperimentVisualizer
class is located in the experimentation/utils/visualizer
folder: ../blob/experiments/sqa_system/sqa-system/experimentation/utils/visualizer/. The ExperimentVisualizer
class provides many different plotting functions to visualize the results of the experiment.
We prepared a template for you that you can use to generate the plots and tables. This script will automatically retrieve all prediction files in the subfolder where the script is located and generate the plots and tables for each of them. Therefore, you can simply copy the script into the results folder of your experiment and run it:
# A simple script to generate visualizations for the results of the
# Experiments.
# ----------------
# -- HOW TO USE --
# ----------------
# 1. Place this script in the same directory as the results of the
# experiment(s) you want to visualize. It will automatically retrieve
# all prediction files from the subfolders.
# 2. Run the script.
# (3.) Optionally you can change the variables inside of the script to change
# what is plotted.
import os
from sqa_system.core.data.file_path_manager import FilePathManager
from sqa_system.experimentation.utils.visualizer.experiment_visualizer import (
ExperimentVisualizer, ExperimentVisualizerSettings, PlotType)
# Here you replace the name of the configs with a unique
# name. Place the id on the left, and the name on the right.
CONFIG_TO_NAME_MAPPING: dict = {}
# The base config (or name if you replace it above)
# is highlighted in red in some plots. Should be a string.
BASE_CONFIG: str = None
FPM = FilePathManager()
CURRENT_DIRECTORY = os.path.dirname(os.path.realpath(__file__))
VISUALIZATIONS_DIR = FPM.combine_paths(
CURRENT_DIRECTORY, "result_visualizations")
QA_DATASET_PATH = FPM.combine_paths(
FPM.get_parent_directory(CURRENT_DIRECTORY, 5),
"qa_datasets",
"qa_datasets",
"reduced",
"reduced_deep_distributed_graph_dataset.csv"
)
def plot_average_metrics_per_config():
"""
Generates a plot with the y-axis being the average of the metrics,
the x-axis being the names of the metrics and the hue being the
configuration hashes.
"""
visualizer = ExperimentVisualizer(
ExperimentVisualizerSettings(
data_folder_path=CURRENT_DIRECTORY,
should_print=False,
should_save_to_file=True,
baseline_config=BASE_CONFIG,
config_to_name_mapping=CONFIG_TO_NAME_MAPPING,
save_folder_path=VISUALIZATIONS_DIR,
qa_file_path=QA_DATASET_PATH,
)
)
visualizer.run(
plots_to_generate=[
PlotType.AVERAGE_METRICS_PER_CONFIG,
PlotType.TABLE
]
)
def plot_average_metrics_by_column():
"""
This function generates a plot with the y-axis beeing the average of the
metrics, the x-axis being the names of the metrics and the hue being the
unique values of the column to group by.
"""
visualizer = ExperimentVisualizer(
ExperimentVisualizerSettings(
data_folder_path=CURRENT_DIRECTORY,
should_print=False,
should_save_to_file=True,
save_folder_path=VISUALIZATIONS_DIR,
baseline_config=BASE_CONFIG,
config_to_name_mapping=CONFIG_TO_NAME_MAPPING,
qa_file_path=QA_DATASET_PATH,
# It is recommended to add the config to visualize here else it will
# generate many plots for each config that it finds.
configs_to_visualize=[]
)
)
visualizer.run(
plots_to_generate=[
PlotType.AVERAGE_METRICS_BY_COLUMN,
],
column_to_group_by="retrieval_operation"
)
def plot_metric_by_column():
"""
This function generates per metric a plot with the metrics average value
being on the y-axis, the unique values of the column on the x-axis and
the hue being the configuration hashes.
"""
visualizer = ExperimentVisualizer(
ExperimentVisualizerSettings(
data_folder_path=CURRENT_DIRECTORY,
should_print=False,
should_save_to_file=True,
file_type="png",
save_folder_path=VISUALIZATIONS_DIR,
baseline_config=BASE_CONFIG,
config_to_name_mapping=CONFIG_TO_NAME_MAPPING,
qa_file_path=QA_DATASET_PATH,
# Here you need to define the names of the metrics you want to
# visualize. It has to be inside of a dictionary.
metrics={
"retrieval_operation": [
"recall_triples",
]
}
)
)
visualizer.run(
plots_to_generate=[
PlotType.METRIC_BY_COLUMN_PER_CONFIG
],
column_to_group_by="retrieval_operation"
)
visualizer.run(
plots_to_generate=[
PlotType.METRIC_BY_COLUMN_PER_CONFIG
],
column_to_group_by="use_case"
)
if __name__ == '__main__':
plot_average_metrics_per_config()
plot_average_metrics_by_column()
plot_metric_by_column()
🥳 The script will now create plots and tables that you can use to analyze the results.