plot_contrastive.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
import json- json encoder and decoder - json documentation -
import os- provides a portable way of using operating system dependent functionality - os documentation -
import tempfile- used to create temporary files and directories - tempfile documentation
-
import baker- easy, powerful access to Python functions from the command line - baker documentation -
import matplotlib- comprehensive library for creating static, animated, and interactive visualizations in Python - matplotlib documentation -
import mlflow- open source platform for managing the end-to-end machine learning lifecycle - mlflow documentation -
import numpy as np- the fundamental package for scientific computing with Python - numpy documentation -
import pandas as pd- pandas is a flexible and easy to use open source data analysis and manipulation tool - pandas documentation -
from matplotlib import pyplot as plt- state-based interface to matplotlib, provides a MATLAB-like way of plotting - matplotlib.pyplot documentation -
from sklearn.metrics import accuracy_score- used to compute the Accuracy classification score - sklearn.metrics.accuracy_score documentation -
from sklearn.metrics import confusion_matrix- used to compute the confusion matrix to evaluate the accuracy of a classification - sklearn.metrics.confusion_matrix documentation -
from sklearn.metrics import f1_score- used to compute the f1 score - sklearn.metrics.f1_score documentation -
from sklearn.metrics import jaccard_score- used to compute the Jaccard similarity coefficient score - sklearn.metrics.jaccard_score documentation -
from sklearn.metrics import precision_score- used to compute the Precision score - sklearn.metrics.precision_score documentation -
from sklearn.metrics import recall_score- used to compute the Recall score - sklearn.metrics.recall_score documentation
from nets.generators.fresh_generators import get_generatorfrom utils.plot_utils import collect_dataframes
plot_score_trend(values_dict, filename, key, style, std_alpha) (function) - Plot score trend given a dict of values as input.
-
values_dict(arg) - Dict containing the values of the score to plot -
filename(arg) - Path where to save the plot -
key(arg) - Name of the score -
style(arg) - Style to use in the plot -
std_alpha(arg) - Standard deviation alpha value (default: .2)
get_fresh_dataset_info(ds_path) (function) - Get some fresh_dataset specific variables.
-
ds_path(arg) - Fresh dataset root directory (where to find .dat files)
compute_scores(id_to_dataframe_dict, dest_file, k, zero_division) (function) - Estimate some micro, macro and weighted averaged Score values (jaccard similarity, recall, precision, f1 score) and the macro for a dataframe/key combination.
-
id_to_dataframe_dict(arg) - Run ID - result dataframe dictionary -
dest_file(arg) - The filename to save the resulting scores to -
k(arg) - Number of nearest neighbours used with the k-NN algorithm -
zero_division(arg) - Sets the value to return when there is a zero division. If set to βwarnβ, this acts as 0, but warnings are also raised (default: 1.0)
plot_confusion_matrix(conf_mtx, filename, families) (function) - Plot and save to file a figure containing the confusion matrix passed as input.
-
conf_mtx(arg) - Ndarray containing the confusion matrix to plot -
filename(arg) - Path where to save the generated confusion matrix plot -
families(arg) - List of families of interest
create_confusion_matrixes(results_file, families, knn_k_min, knn_k_max) (function) - Create confusion matrixes for the contrastive learning model using odd numbers of nearest neighbors (k) between knn_k_min and knn_k_max.
-
results_file(arg) - Complete path to a results.csv file that contains the output of a model run -
families(arg) - List of families of interest -
knn_k_min(arg) - Min number of nearest neighbours to use when applying the k-NN algorithm -
knn_k_max(arg) - Max number of nearest neighbours to use when applying the k-NN algorithm
compute_run_scores(results_file, knn_k_min, knn_k_max, zero_division) (function) - Compute multi-class classification scores.
-
results_file(arg) - Path to results.csv containing the output of a model run -
knn_k_min(arg) - Min number of nearest neighbours to use when applying the k-NN algorithm (default: 1) -
knn_k_max(arg) - Max number of nearest neighbours to use when applying the k-NN algorithm (default: 11) -
zero_division(arg) - Sets the value to return when there is a zero division (default: 1.0)
compute_contrastive_learning_results(results_file, fresh_ds_path, knn_k_min, knn_k_max, zero_division) (function, baker command) - Take a contrastive model result file and produce multi-class classification scores and confusion matrix.
-
results_file(arg) - Path to results.csv containing the output of a model run -
fresh_ds_path(arg) - Fresh dataset root directory (where to find .dat files) -
knn_k_min(arg) - Min number of nearest neighbours to use when applying the k-NN algorithm (default: 1) -
knn_k_max(arg) - Max number of nearest neighbours to use when applying the k-NN algorithm (default: 11) -
zero_division(arg) - Sets the value to return when there is a zero division. If set to βwarnβ, this acts as 0, but warnings are also raised (default: 1.0)
plot_all_scores_trends(run_to_filename_json, knn_k_min, knn_k_max) (function, baker command) - Plot contrastive model classification scores trends.
{
"run_id_0": "/full/path/to/results.csv/for/run/0/results.csv",
"run_id_1": "/full/path/to/results.csv/for/run/1/results.csv",
...
}
-
run_to_filename_json(arg) - A json file that contains a key-value map that links run IDs to the full path to a results file (including the file name) -
knn_k_min(arg) - Min number of nearest neighbours to use when applying the k-NN algorithm (default: 1) -
knn_k_max(arg) - Max number of nearest neighbours to use when applying the k-NN algorithm (default: 11)
__main__ (main) - Start baker in order to make it possible to run the script and use function names and parameters as the command line interface, using optparse-style options