plot.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
import json- json encoder and decoder - json documentation -
import os- provides a portable way of using operating system dependent functionality - os documentation -
import tempfile- used to create temporary files and directories - tempfile documentation
-
import baker- easy, powerful access to Python functions from the command line - baker documentation -
import matplotlib- comprehensive library for creating static, animated, and interactive visualizations in Python - matplotlib documentation -
import mlflow- open source platform for managing the end-to-end machine learning lifecycle - mlflow documentation -
from logzero import logger- robust and effective logging for Python - logzero documentation -
from sklearn.metrics import jaccard_score- used to compute the Jaccard similarity coefficient score - sklearn.metrics.jaccard_score documentation
from nets.generators.generators import Datasetfrom utils.plot_utils import *
plot_tag_results(dataframe, filename, tags) (function) - Produce multiple overlaid ROC plots (one for each tag individually) and save the overall figure to file.
-
dataframe(arg) - Result dataframe -
filename(arg) - The name of the file where to save the resulting plot -
tags(arg) - Tags (list) to extract results for
plot_tag_mean_results(id_to_dataframe_dictionary, filename, tags) (function) - Produce multiple overlaid ROC plots (one for each tag individually) and save the overall figure to file.
-
id_to_dataframe_dictionary(arg) - Run ID - result dataframe dictionary -
filename(arg) - The name of the file where to save the resulting plot -
tags(arg) - Tags (list) to extract results for
compute_scores(results_file, tag, zero_division) (function) - Estimate some Score values (tpr at fpr, accuracy, recall, precision, f1 score) for a dataframe/key combination at specific False Positive Rates of interest.
-
results_file(arg) - Complete path to a results.csv file that contains the output of a model run. -
tag(arg) - The tag from the results to consider; defaults to "malware" -
zero_division(arg) - Sets the value to return when there is a zero division. If set to βwarnβ, this acts as 0, but warnings are also raised
compute_run_scores(results_file, use_malicious_labels, use_tag_labels, zero_division) (function) - Compute all scores for all tags.
-
results_file(arg) - Path to results.csv containing the output of a model run -
use_malicious_labels(arg) - Whether or not (1/0) to compute malware/benignware label scores (default: 1) -
use_tag_labels(arg) - Whether or not (1/0) to compute the tag label scores (default: 1) -
zero_division(arg) - Sets the value to return when there is a zero division (default: 1.0)
compute_run_mean_scores(results_file, use_malicious_labels, use_tag_labels, zero_division) (function) - Estimate some mean, per-sample, scores (jaccard similarity and mean per-sample accuracy) for a dataframe at specific False Positive Rates of interest.
-
results_file(arg) - Path to results.csv containing the output of a model run -
use_malicious_labels(arg) - Whether or not (1/0) to compute malware/benignware label scores (default: 1) -
use_tag_labels(arg) - Whether or not (1/0) to compute the tag label scores (default: 1) -
zero_division(arg) - Sets the value to return when there is a zero division. If set to βwarnβ, this acts as 0, but warnings are also raised (default: 1.0)
plot_run_results(results_file, use_malicious_labels, use_tag_labels) (function) - Takes a result file from a feedforward neural network model that includes all tags, and produces multiple overlaid ROC plots for each tag individually.
-
results_file(arg) - Path to results.csv containing the output of a model run -
use_malicious_labels(arg) - Whether or not (1/0) to compute malware/benignware label scores (default: 1) -
use_tag_labels(arg) - use_tag_labels: Whether or not (1/0) to compute the tag label scores (default: 1)
plot_mean_results(run_to_filename_json, all_tags) (function) - Computes the mean of the TPR at a range of FPRS (the ROC curve) over several sets of results (at least 2 runs) for all tags (provided) and produces multiple overlaid ROC plots for each tag individually. The run_to_filename_json file must have the following format:
{
"run_id_0": "/full/path/to/results.csv/for/run/0/results.csv",
"run_id_1": "/full/path/to/results.csv/for/run/1/results.csv",
...
}
-
run_to_filename_json(arg) - A json file that contains a key-value map that links run IDs to the full path to a results file (including the file name) -
all_tags(arg) - List of all tags to plot results of
plot_single_roc_distribution(run_to_filename_json, tag_to_plot, linestyle, color, include_range, std_alpha, range_alpha) (function) - Compute the mean and standard deviation of the TPR at a range of FPRS (the ROC curve) over several sets of results (at least 2 runs) for a given tag. The run_to_filename_json file must have the following format:
{
"run_id_0": "/full/path/to/results.csv/for/run/0/results.csv",
"run_id_1": "/full/path/to/results.csv/for/run/1/results.csv",
...
}
-
run_to_filename_json(arg) - A json file that contains a key-value map that links run IDs to the full path to a results file (including the file name) -
tag_to_plot(arg) - The tag from the results to plot (default: "malware") -
linestyle(arg) - The linestyle to use in the plot (defaults to the tag value in plot.style_dict) -
color(arg) - The color to use in the plot (defaults to the tag value in plot.style_dict) -
include_range(arg) - Plot the min/max value as well (default False) -
std_alpha(arg) - The alpha value for the shading for standard deviation range (default .2) -
range_alpha(arg) - The alpha value for the shading for range, if plotted (default .1)
compute_all_run_results(results_file, use_malicious_labels, use_tag_labels, zero_division) (function, baker command) - Takes a result file from a feedforward neural network model and produces results plots, computes per-tag scores and mean per-sample scores.
-
results_file(arg) - Path to results.csv containing the output of a model run -
use_malicious_labels(arg) - Whether or not (1/0) to compute malware/benignware label scores (default: 1) -
use_tag_labels(arg) - Whether or not (1/0) to compute the tag label scores (default: 1) -
zero_division(arg) - Sets the value to return when there is a zero division. If set to βwarnβ, this acts as 0, but warnings are also raised (default: 1.0)
plot_all_roc_distributions(run_to_filename_json, use_malicious_labels, use_tag_labels) (function, baker command) - Plot ROC distributions for all tags.
-
run_to_filename_json(arg) - A json file that contains a key-value map that links run IDs to the full path to a results file (including the file name) -
use_malicious_labels(arg) - Whether or not (1/0) to compute malware/benignware label scores (default: 1) -
use_tag_labels(arg) - Whether or not (1/0) to compute the tag label scores (default: 1)
__main__ (main) - Start baker in order to make it possible to run the script and use function names and parameters as the command line interface, using optparse-style options