plot_utils.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules

import numpy as np - the fundamental package for scientific computing with Python - numpy documentation
import pandas as pd - pandas is a flexible and easy to use open source data analysis and manipulation tool - pandas documentation
from matplotlib import pyplot as plt - state-based interface to matplotlib, provides a MATLAB-like way of plotting - matplotlib.pyplot documentation
from sklearn.metrics import accuracy_score - used to compute the Accuracy classification score - sklearn.metrics.accuracy_score documentation
from sklearn.metrics import f1_score - used to compute the f1 score - sklearn.metrics.f1_score documentation
from sklearn.metrics import precision_score - used to compute the Precision score - sklearn.metrics.precision_score documentation
from sklearn.metrics import recall_score - used to compute the Recall score - sklearn.metrics.recall_score documentation
from sklearn.metrics import roc_auc_score - used to compute the ROC AUC from prediction scores - sklearn.metrics.roc_auc_score documentation
from sklearn.metrics import roc_curve - used to compute the Receiver operating characteristic (ROC) curve - sklearn.metrics.roc_curve documentation

Classes and functions

collect_dataframes(run_id_to_filename_dictionary) (function) - Load dataframes given a run ID - filename dict.

run_id_to_filename_dictionary (arg) - Run ID - filename dictionary

get_binary_predictions(dataframe, key, target_fprs) (function) - Get binary predictions for a dataframe/key combination at specific False Positive Rates of interest.

dataframe (arg) - A pandas dataframe
key (arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have a column names pred_malware and label_malware
target_fprs (arg) - The FPRs at which you wish to estimate the TPRs; (1-d numpy array)

get_all_predictions(result_dataframe, keys, target_fprs) (function) - Get labels and binarized predictions (for all keys) for a dataframe at specific False Positive Rates of interest.

result_dataframe (arg) - A pandas dataframe
tags (arg) - Keys (list) to extract results for
target_fprs (arg) - The FPRs at which you wish to estimate the TPRs; None (uses default np.array([1e-5, 1e-4, 1e-3, 1e-2, 1e-1]) or a 1-d numpy array

get_tprs_at_fpr(result_dataframe, key, target_fprs) (function) - Estimate the True Positive Rate for a dataframe/key combination at specific False Positive Rates of interest.

result_dataframe (arg) - A pandas dataframe
key (arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have a column names pred_malware and label_malware
target_fprs (arg) - The FPRs at which you wish to estimate the TPRs; None (uses default np.array([1e-5, 1e-4, 1e-3, 1e-2, 1e-1]) or a 1-d numpy array

get_score_per_fpr(score_function, result_dataframe, key, target_fprs, zero_division) (function) - Estimate the Score for a dataframe/key combination using a provided score function at specific False Positive Rates of interest.

score_function (arg) - Score function to use
result_dataframe (arg) - A pandas dataframe
key (arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have as column names pred_malware and label_malware
target_fprs (arg) - The FPRs at which you wish to estimate the TPRs; None (uses default np.array([1e-5, 1e-4, 1e-3, 1e-2, 1e-1]) or a 1-d numpy array
zero_division (arg) - Sets the value to return when there is a zero division. If set to “warn”, this acts as 0, but warnings are also raised (default: 1.0)

get_roc_curve(result_dataframe, key) (function) - Get the ROC curve for a single result in a dataframe.

result_dataframe (arg) - Result dataframe for a certain run
key (arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have as column names pred_malware and label_malware

get_auc_score(result_dataframe, key) (function) - Get the Area Under the Curve for the indicated key in the dataframe.

result_dataframe (arg) - Result dataframe for a certain run
key (arg) - The name of the result to get the curve for; if (e.g.) the key 'malware' is provided the dataframe is expected to have as column names pred_malware and label_malware

interpolate_rocs(id_to_roc_dictionary, eval_fpr_points) (function) - This function takes several sets of ROC results and interpolates them to a common set of evaluation (FPR) values to allow for computing e.g. a mean ROC or pointwise variance of the curve across multiple model fittings.

id_to_roc_dictionary (arg) - A list of results from get_roc_score (run ID - ROC curve dictionary)
eval_fpr_points (arg) - The set of FPR values at which to interpolate the results; defaults to np.logspace(-6, 0, 1000)

compute_scores(results_file, key, zero_division) (function) - Estimate some Score values (tpr at fpr, accuracy, recall, precision, f1 score) for a dataframe/key combination at specific False Positive Rates of interest.

results_file (arg) - Complete path to a results.csv file that contains the output of a model run.
key (arg) - The key from the results to consider; defaults to "malware"
zero_division (arg) - Sets the value to return when there is a zero division. If set to “warn”, this acts as 0, but warnings are also raised (default: 1.0)

plot_roc_with_confidence(id_to_dataframe_dictionary, key, filename, include_range, style, std_alpha, range_alpha) (function) - Compute the mean and standard deviation of the ROC curve from a sequence of results and plot it with shading.

id_to_dataframe_dictionary (arg) - Run ID - result dataframe dictionary
key (arg) - The name of the result to get the curve for
filename (arg) - The filename to save the resulting figure to
include_range (arg) - Plot the min/max value as well
style (arg) - Style (color, linestyle) to use in the plot (default: False)
std_alpha (arg) - The alpha value for the shading for standard deviation range (default: .2)
range_alpha (arg) - The alpha value for the shading for range, if plotted (default: .1)

Back to top

Repository file structure

root/
|
├── src/
|   |
|   ├── FreshDatasetBuilder/
|   |   |
|   |   ├── emberFeatures/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── features.py  - - - - - - - - - - - - - - - (features python code 📖Wiki)
|   |   |   └── vectorize_features.py  - - - - - - - - - - (vectorize features python code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── fresh_dataset_utils.py - - - - - - - - - - (fresh dataset utils python code 📖Wiki)
|   |   |   └── malware_bazaar_api.py  - - - - - - - - - - (malware bazaar API python code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   └── build_fresh_dataset.py - - - - - - - - - - (fresh dataset builder python code 📖Wiki)
|   |
|   ├── Model/
|   |   |
|   |   ├── nets/
|   |   |   |
|   |   |   ├── generators/
|   |   |   |   |
|   |   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   |   ├── dataset.py - - - - - - - - - - - - - - - - (dataset (base) code 📖Wiki)
|   |   |   |   ├── dataset_alt.py - - - - - - - - - - - - - - (dataset_alt code 📖Wiki)
|   |   |   |   ├── fresh_dataset.py - - - - - - - - - - - - - (fresh_dataset code 📖Wiki)
|   |   |   |   ├── fresh_generators.py  - - - - - - - - - - - (fresh_generators code 📖Wiki)
|   |   |   |   ├── generators.py  - - - - - - - - - - - - - - (generators (base) code 📖Wiki)
|   |   |   |   ├── generators_alt1.py - - - - - - - - - - - - (generators_alt1 code 📖Wiki)
|   |   |   |   ├── generators_alt2.py - - - - - - - - - - - - (generators_alt2 code 📖Wiki)
|   |   |   |   └── generators_alt3.py - - - - - - - - - - - - (generators_alt3 code 📖Wiki)
|   |   |   |
|   |   |   ├── utils/
|   |   |   |   |
|   |   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   |   └── Net.py - - - - - - - - - - - - - - - - - - (Net code 📖Wiki)
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── ALOHA_net.py - - - - - - - - - - - - - - - (ALOHA_net code 📖Wiki)
|   |   |   ├── Contrastive_Model_net.py - - - - - - - - - (Contrastive_Model_net code 📖Wiki)
|   |   |   ├── Family_Classifier_net.py - - - - - - - - - (Family_Classifier_net code 📖Wiki)
|   |   |   ├── MTJE_net.py  - - - - - - - - - - - - - - - (MTJE_net code 📖Wiki)
|   |   |   ├── MTJE_net_cosine.py - - - - - - - - - - - - (MTJE_net_cosine code 📖Wiki)
|   |   |   └── MTJE_net_pairwise_distance.py  - - - - - - (MTJE_net_pairwise_distance code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── contrastive_utils.py - - - - - - - - - - - (contrastive_utils code 📖Wiki)
|   |   |   ├── opt_utils.py - - - - - - - - - - - - - - - (opt_utils code 📖Wiki)
|   |   |   ├── plot_utils.py  - - - - - - - - - - - - - - (plot_utils code 📖Wiki)
|   |   |   └── ranking_metrics.py - - - - - - - - - - - - (ranking_metrics code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   ├── evaluate.py  - - - - - - - - - - - - - - - (evaluate code 📖Wiki)
|   |   ├── evaluate_contrastive.py  - - - - - - - - - (evaluate_contrastive code 📖Wiki)
|   |   ├── evaluate_family_classifier.py  - - - - - - (evaluate_family_classifier code 📖Wiki)
|   |   ├── evaluate_fresh.py  - - - - - - - - - - - - (evaluate_fresh code 📖Wiki)
|   |   ├── gen3_speed_evaluation.py - - - - - - - - - (gen3_speed_evaluation code 📖Wiki)
|   |   ├── plot.py  - - - - - - - - - - - - - - - - - (plot code 📖Wiki)
|   |   ├── plot_contrastive.py  - - - - - - - - - - - (plot_contrastive code 📖Wiki)
|   |   ├── plot_family_classifier.py  - - - - - - - - (plot_family_classifier code 📖Wiki)
|   |   ├── plot_fresh.py  - - - - - - - - - - - - - - (plot_fresh code 📖Wiki)
|   |   ├── train.py - - - - - - - - - - - - - - - - - (train code 📖Wiki)
|   |   ├── train_contrastive.py - - - - - - - - - - - (train_contrastive code 📖Wiki)
|   |   └── train_family_classifier.py - - - - - - - - (train_family_classifier code 📖Wiki)
|   |
|   ├── Sorel20mDataset/
|   |   |
|   |   ├── generators/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── sorel_dataset.py - - - - - - - - - - - - - (sorel_dataset code 📖Wiki)
|   |   |   └── sorel_generators.py  - - - - - - - - - - - (sorel_generators code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── download_utils.py  - - - - - - - - - - - - (download_utils code 📖Wiki)
|   |   |   └── preproc_utils.py - - - - - - - - - - - - - (preproc_utils code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   ├── preprocess_dataset.py  - - - - - - - - - - (preprocess_dataset code 📖Wiki)
|   |   ├── preprocess_ds_multi.py - - - - - - - - - - (preprocess_ds_multi code 📖Wiki)
|   |   └── sorel20mDownloader.py  - - - - - - - - - - (sorel20mDownloader code 📖Wiki)
|   |
|   ├── utils/
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   └── workflow_utils.py  - - - - - - - - - - - - - - - - - (workflow_utils code 📖Wiki)
|   |
|   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   ├── config.ini - - - - - - - - - - - - - - - - (configuration file 📖Wiki)
|   └── main.py  - - - - - - - - - - - - - - - - - (main code 📖Wiki)
|
├── MLproject  - - - - - - - - - - - - - - - - (MLproject file)
├── README.md  - - - - - - - - - - - - - - - - (README)
└── conda.yaml - - - - - - - - - - - - - - - - (conda yaml environment)