plot_contrastive.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules

import json - json encoder and decoder - json documentation
import os - provides a portable way of using operating system dependent functionality - os documentation
import tempfile - used to create temporary files and directories - tempfile documentation

import baker - easy, powerful access to Python functions from the command line - baker documentation
import matplotlib - comprehensive library for creating static, animated, and interactive visualizations in Python - matplotlib documentation
import mlflow - open source platform for managing the end-to-end machine learning lifecycle - mlflow documentation
import numpy as np - the fundamental package for scientific computing with Python - numpy documentation
import pandas as pd - pandas is a flexible and easy to use open source data analysis and manipulation tool - pandas documentation
from matplotlib import pyplot as plt - state-based interface to matplotlib, provides a MATLAB-like way of plotting - matplotlib.pyplot documentation
from sklearn.metrics import accuracy_score - used to compute the Accuracy classification score - sklearn.metrics.accuracy_score documentation
from sklearn.metrics import confusion_matrix - used to compute the confusion matrix to evaluate the accuracy of a classification - sklearn.metrics.confusion_matrix documentation
from sklearn.metrics import f1_score - used to compute the f1 score - sklearn.metrics.f1_score documentation
from sklearn.metrics import jaccard_score - used to compute the Jaccard similarity coefficient score - sklearn.metrics.jaccard_score documentation
from sklearn.metrics import precision_score - used to compute the Precision score - sklearn.metrics.precision_score documentation
from sklearn.metrics import recall_score - used to compute the Recall score - sklearn.metrics.recall_score documentation

from nets.generators.fresh_generators import get_generator
from utils.plot_utils import collect_dataframes

Classes and functions

plot_score_trend(values_dict, filename, key, style, std_alpha) (function) - Plot score trend given a dict of values as input.

values_dict (arg) - Dict containing the values of the score to plot
filename (arg) - Path where to save the plot
key (arg) - Name of the score
style (arg) - Style to use in the plot
std_alpha (arg) - Standard deviation alpha value (default: .2)

get_fresh_dataset_info(ds_path) (function) - Get some fresh_dataset specific variables.

ds_path (arg) - Fresh dataset root directory (where to find .dat files)

compute_scores(id_to_dataframe_dict, dest_file, k, zero_division) (function) - Estimate some micro, macro and weighted averaged Score values (jaccard similarity, recall, precision, f1 score) and the macro for a dataframe/key combination.

id_to_dataframe_dict (arg) - Run ID - result dataframe dictionary
dest_file (arg) - The filename to save the resulting scores to
k (arg) - Number of nearest neighbours used with the k-NN algorithm
zero_division (arg) - Sets the value to return when there is a zero division. If set to “warn”, this acts as 0, but warnings are also raised (default: 1.0)

plot_confusion_matrix(conf_mtx, filename, families) (function) - Plot and save to file a figure containing the confusion matrix passed as input.

conf_mtx (arg) - Ndarray containing the confusion matrix to plot
filename (arg) - Path where to save the generated confusion matrix plot
families (arg) - List of families of interest

create_confusion_matrixes(results_file, families, knn_k_min, knn_k_max) (function) - Create confusion matrixes for the contrastive learning model using odd numbers of nearest neighbors (k) between knn_k_min and knn_k_max.

results_file (arg) - Complete path to a results.csv file that contains the output of a model run
families (arg) - List of families of interest
knn_k_min (arg) - Min number of nearest neighbours to use when applying the k-NN algorithm
knn_k_max (arg) - Max number of nearest neighbours to use when applying the k-NN algorithm

compute_run_scores(results_file, knn_k_min, knn_k_max, zero_division) (function) - Compute multi-class classification scores.

results_file (arg) - Path to results.csv containing the output of a model run
knn_k_min (arg) - Min number of nearest neighbours to use when applying the k-NN algorithm (default: 1)
knn_k_max (arg) - Max number of nearest neighbours to use when applying the k-NN algorithm (default: 11)
zero_division (arg) - Sets the value to return when there is a zero division (default: 1.0)

compute_contrastive_learning_results(results_file, fresh_ds_path, knn_k_min, knn_k_max, zero_division) (function, baker command) - Take a contrastive model result file and produce multi-class classification scores and confusion matrix.

results_file (arg) - Path to results.csv containing the output of a model run
fresh_ds_path (arg) - Fresh dataset root directory (where to find .dat files)
knn_k_min (arg) - Min number of nearest neighbours to use when applying the k-NN algorithm (default: 1)
knn_k_max (arg) - Max number of nearest neighbours to use when applying the k-NN algorithm (default: 11)
zero_division (arg) - Sets the value to return when there is a zero division. If set to “warn”, this acts as 0, but warnings are also raised (default: 1.0)

plot_all_scores_trends(run_to_filename_json, knn_k_min, knn_k_max) (function, baker command) - Plot contrastive model classification scores trends.

    {
      "run_id_0": "/full/path/to/results.csv/for/run/0/results.csv",
      "run_id_1": "/full/path/to/results.csv/for/run/1/results.csv",
      ...
    }

run_to_filename_json (arg) - A json file that contains a key-value map that links run IDs to the full path to a results file (including the file name)
knn_k_min (arg) - Min number of nearest neighbours to use when applying the k-NN algorithm (default: 1)
knn_k_max (arg) - Max number of nearest neighbours to use when applying the k-NN algorithm (default: 11)

__main__ (main) - Start baker in order to make it possible to run the script and use function names and parameters as the command line interface, using optparse-style options

Back to top

Repository file structure

root/
|
├── src/
|   |
|   ├── FreshDatasetBuilder/
|   |   |
|   |   ├── emberFeatures/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── features.py  - - - - - - - - - - - - - - - (features python code 📖Wiki)
|   |   |   └── vectorize_features.py  - - - - - - - - - - (vectorize features python code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── fresh_dataset_utils.py - - - - - - - - - - (fresh dataset utils python code 📖Wiki)
|   |   |   └── malware_bazaar_api.py  - - - - - - - - - - (malware bazaar API python code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   └── build_fresh_dataset.py - - - - - - - - - - (fresh dataset builder python code 📖Wiki)
|   |
|   ├── Model/
|   |   |
|   |   ├── nets/
|   |   |   |
|   |   |   ├── generators/
|   |   |   |   |
|   |   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   |   ├── dataset.py - - - - - - - - - - - - - - - - (dataset (base) code 📖Wiki)
|   |   |   |   ├── dataset_alt.py - - - - - - - - - - - - - - (dataset_alt code 📖Wiki)
|   |   |   |   ├── fresh_dataset.py - - - - - - - - - - - - - (fresh_dataset code 📖Wiki)
|   |   |   |   ├── fresh_generators.py  - - - - - - - - - - - (fresh_generators code 📖Wiki)
|   |   |   |   ├── generators.py  - - - - - - - - - - - - - - (generators (base) code 📖Wiki)
|   |   |   |   ├── generators_alt1.py - - - - - - - - - - - - (generators_alt1 code 📖Wiki)
|   |   |   |   ├── generators_alt2.py - - - - - - - - - - - - (generators_alt2 code 📖Wiki)
|   |   |   |   └── generators_alt3.py - - - - - - - - - - - - (generators_alt3 code 📖Wiki)
|   |   |   |
|   |   |   ├── utils/
|   |   |   |   |
|   |   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   |   └── Net.py - - - - - - - - - - - - - - - - - - (Net code 📖Wiki)
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── ALOHA_net.py - - - - - - - - - - - - - - - (ALOHA_net code 📖Wiki)
|   |   |   ├── Contrastive_Model_net.py - - - - - - - - - (Contrastive_Model_net code 📖Wiki)
|   |   |   ├── Family_Classifier_net.py - - - - - - - - - (Family_Classifier_net code 📖Wiki)
|   |   |   ├── MTJE_net.py  - - - - - - - - - - - - - - - (MTJE_net code 📖Wiki)
|   |   |   ├── MTJE_net_cosine.py - - - - - - - - - - - - (MTJE_net_cosine code 📖Wiki)
|   |   |   └── MTJE_net_pairwise_distance.py  - - - - - - (MTJE_net_pairwise_distance code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── contrastive_utils.py - - - - - - - - - - - (contrastive_utils code 📖Wiki)
|   |   |   ├── opt_utils.py - - - - - - - - - - - - - - - (opt_utils code 📖Wiki)
|   |   |   ├── plot_utils.py  - - - - - - - - - - - - - - (plot_utils code 📖Wiki)
|   |   |   └── ranking_metrics.py - - - - - - - - - - - - (ranking_metrics code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   ├── evaluate.py  - - - - - - - - - - - - - - - (evaluate code 📖Wiki)
|   |   ├── evaluate_contrastive.py  - - - - - - - - - (evaluate_contrastive code 📖Wiki)
|   |   ├── evaluate_family_classifier.py  - - - - - - (evaluate_family_classifier code 📖Wiki)
|   |   ├── evaluate_fresh.py  - - - - - - - - - - - - (evaluate_fresh code 📖Wiki)
|   |   ├── gen3_speed_evaluation.py - - - - - - - - - (gen3_speed_evaluation code 📖Wiki)
|   |   ├── plot.py  - - - - - - - - - - - - - - - - - (plot code 📖Wiki)
|   |   ├── plot_contrastive.py  - - - - - - - - - - - (plot_contrastive code 📖Wiki)
|   |   ├── plot_family_classifier.py  - - - - - - - - (plot_family_classifier code 📖Wiki)
|   |   ├── plot_fresh.py  - - - - - - - - - - - - - - (plot_fresh code 📖Wiki)
|   |   ├── train.py - - - - - - - - - - - - - - - - - (train code 📖Wiki)
|   |   ├── train_contrastive.py - - - - - - - - - - - (train_contrastive code 📖Wiki)
|   |   └── train_family_classifier.py - - - - - - - - (train_family_classifier code 📖Wiki)
|   |
|   ├── Sorel20mDataset/
|   |   |
|   |   ├── generators/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── sorel_dataset.py - - - - - - - - - - - - - (sorel_dataset code 📖Wiki)
|   |   |   └── sorel_generators.py  - - - - - - - - - - - (sorel_generators code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── download_utils.py  - - - - - - - - - - - - (download_utils code 📖Wiki)
|   |   |   └── preproc_utils.py - - - - - - - - - - - - - (preproc_utils code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   ├── preprocess_dataset.py  - - - - - - - - - - (preprocess_dataset code 📖Wiki)
|   |   ├── preprocess_ds_multi.py - - - - - - - - - - (preprocess_ds_multi code 📖Wiki)
|   |   └── sorel20mDownloader.py  - - - - - - - - - - (sorel20mDownloader code 📖Wiki)
|   |
|   ├── utils/
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   └── workflow_utils.py  - - - - - - - - - - - - - - - - - (workflow_utils code 📖Wiki)
|   |
|   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   ├── config.ini - - - - - - - - - - - - - - - - (configuration file 📖Wiki)
|   └── main.py  - - - - - - - - - - - - - - - - - (main code 📖Wiki)
|
├── MLproject  - - - - - - - - - - - - - - - - (MLproject file)
├── README.md  - - - - - - - - - - - - - - - - (README)
└── conda.yaml - - - - - - - - - - - - - - - - (conda yaml environment)