plot_utils.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
import numpy as np- the fundamental package for scientific computing with Python - numpy documentation -
import pandas as pd- pandas is a flexible and easy to use open source data analysis and manipulation tool - pandas documentation -
from matplotlib import pyplot as plt- state-based interface to matplotlib, provides a MATLAB-like way of plotting - matplotlib.pyplot documentation -
from sklearn.metrics import accuracy_score- used to compute the Accuracy classification score - sklearn.metrics.accuracy_score documentation -
from sklearn.metrics import f1_score- used to compute the f1 score - sklearn.metrics.f1_score documentation -
from sklearn.metrics import precision_score- used to compute the Precision score - sklearn.metrics.precision_score documentation -
from sklearn.metrics import recall_score- used to compute the Recall score - sklearn.metrics.recall_score documentation -
from sklearn.metrics import roc_auc_score- used to compute the ROC AUC from prediction scores - sklearn.metrics.roc_auc_score documentation -
from sklearn.metrics import roc_curve- used to compute the Receiver operating characteristic (ROC) curve - sklearn.metrics.roc_curve documentation
collect_dataframes(run_id_to_filename_dictionary) (function) - Load dataframes given a run ID - filename dict.
-
run_id_to_filename_dictionary(arg) - Run ID - filename dictionary
get_binary_predictions(dataframe, key, target_fprs) (function) - Get binary predictions for a dataframe/key combination at specific False Positive Rates of interest.
-
dataframe(arg) - A pandas dataframe -
key(arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have a column namespred_malwareandlabel_malware -
target_fprs(arg) - The FPRs at which you wish to estimate the TPRs; (1-d numpy array)
get_all_predictions(result_dataframe, keys, target_fprs) (function) - Get labels and binarized predictions (for all keys) for a dataframe at specific False Positive Rates of interest.
-
result_dataframe(arg) - A pandas dataframe -
tags(arg) - Keys (list) to extract results for -
target_fprs(arg) - The FPRs at which you wish to estimate the TPRs; None (uses default np.array([1e-5, 1e-4, 1e-3, 1e-2, 1e-1]) or a 1-d numpy array
get_tprs_at_fpr(result_dataframe, key, target_fprs) (function) - Estimate the True Positive Rate for a dataframe/key combination at specific False Positive Rates of interest.
-
result_dataframe(arg) - A pandas dataframe -
key(arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have a column namespred_malwareandlabel_malware -
target_fprs(arg) - The FPRs at which you wish to estimate the TPRs; None (uses default np.array([1e-5, 1e-4, 1e-3, 1e-2, 1e-1]) or a 1-d numpy array
get_score_per_fpr(score_function, result_dataframe, key, target_fprs, zero_division) (function) - Estimate the Score for a dataframe/key combination using a provided score function at specific False Positive Rates of interest.
-
score_function(arg) - Score function to use -
result_dataframe(arg) - A pandas dataframe -
key(arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have as column namespred_malwareandlabel_malware -
target_fprs(arg) - The FPRs at which you wish to estimate the TPRs; None (uses default np.array([1e-5, 1e-4, 1e-3, 1e-2, 1e-1]) or a 1-d numpy array -
zero_division(arg) - Sets the value to return when there is a zero division. If set to βwarnβ, this acts as 0, but warnings are also raised (default: 1.0)
get_roc_curve(result_dataframe, key) (function) - Get the ROC curve for a single result in a dataframe.
-
result_dataframe(arg) - Result dataframe for a certain run -
key(arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have as column namespred_malwareandlabel_malware
get_auc_score(result_dataframe, key) (function) - Get the Area Under the Curve for the indicated key in the dataframe.
-
result_dataframe(arg) - Result dataframe for a certain run -
key(arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have as column namespred_malwareandlabel_malware
interpolate_rocs(id_to_roc_dictionary, eval_fpr_points) (function) - This function takes several sets of ROC results and interpolates them to a common set of evaluation (FPR) values to allow for computing e.g. a mean ROC or pointwise variance of the curve across multiple model fittings.
-
id_to_roc_dictionary(arg) - A list of results from get_roc_score (run ID - ROC curve dictionary) -
eval_fpr_points(arg) - The set of FPR values at which to interpolate the results; defaults tonp.logspace(-6, 0, 1000)
compute_scores(results_file, key, zero_division) (function) - Estimate some Score values (tpr at fpr, accuracy, recall, precision, f1 score) for a dataframe/key combination at specific False Positive Rates of interest.
-
results_file(arg) - Complete path to a results.csv file that contains the output of a model run. -
key(arg) - The key from the results to consider; defaults to "malware" -
zero_division(arg) - Sets the value to return when there is a zero division. If set to βwarnβ, this acts as 0, but warnings are also raised (default: 1.0)
plot_roc_with_confidence(id_to_dataframe_dictionary, key, filename, include_range, style, std_alpha, range_alpha) (function) - Compute the mean and standard deviation of the ROC curve from a sequence of results and plot it with shading.
-
id_to_dataframe_dictionary(arg) - Run ID - result dataframe dictionary -
key(arg) - The name of the result to get the curve for -
filename(arg) - The filename to save the resulting figure to -
include_range(arg) - Plot the min/max value as well -
style(arg) - Style (color, linestyle) to use in the plot (default: False) -
std_alpha(arg) - The alpha value for the shading for standard deviation range (default: .2) -
range_alpha(arg) - The alpha value for the shading for range, if plotted (default: .1)