evaluate_fresh.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules

import configparser - implements a basic configuration language for Python programs - os documentation
import importlib - provides the implementation of the import statement in Python source code - importlib documentation
import json - json encoder and decoder - json documentation
import os - provides a portable way of using operating system dependent functionality - os documentation
import sys - system-specific parameters and functions - sys documentation
import tempfile - used to create temporary files and directories - tempfile documentation
import time - provides various time-related functions - time documentation
from copy import deepcopy - creates a new object and recursively copies the original object elements - copy documentation

import baker - easy, powerful access to Python functions from the command line - baker documentation
import mlflow - open source platform for managing the end-to-end machine learning lifecycle - mlflow documentation
import numpy as np - the fundamental package for scientific computing with Python - numpy documentation
import pandas as pd - pandas is a flexible and easy to use open source data analysis and manipulation tool - pandas documentation
import psutil - used for retrieving information on running processes and system utilization - psutil documentation
import torch - tensor library like NumPy, with strong GPU support - pytorch documentation
from logzero import logger - robust and effective logging for Python - logzero documentation

from nets.generators.fresh_dataset import Dataset
from nets.generators.fresh_generators import get_generator
from utils.ranking_metrics import mean_reciprocal_rank
from utils.ranking_metrics import mean_average_precision
from utils.ranking_metrics import max_reciprocal_rank_index
from utils.ranking_metrics import min_reciprocal_rank_index
from utils.ranking_metrics import max_average_precision_index
from utils.ranking_metrics import min_average_precision_index

Classes and functions

distance_to_similarity(distances, a, function) (function) - Calculate similarity scores from distances by using an inversion function.

distances (arg) - Tensor containing the distances calculated between two embeddings
a (arg) - Inversion multiplication factor (default: 1.0)
function (arg) - Inversion function to use. Possible values are: 'exp', 'inv' or 'inv_pow' (default: 'exp')

import_modules(net_type) (function) - Dynamically import network, dataset and generator modules depending on the provided arguments.

net_type (arg) - Network type (possible values: mtje, mtje_cosine, mtje_pairwise_distance, aloha)

detach_and_copy_array(array) (function) - Detach numpy array or pytorch tensor and return a deep copy of it.

array (arg) - Numpy array or pytorch tensor to copy

normalize_embeddings(results_dict) (function) - Take a set of results dicts and break them out into a single dict of 1d arrays with appropriate column names that pandas can convert to a DataFrame.

results_dict (arg) - Results (predicted labels) dictionary

get_samples(model, generator, n_families, n_samples_to_get, other) (function) - Get 'n_samples_to_get' from the prodived 'generator' among the samples not in 'other'.

model (arg) - Model to evaluate
generator (arg) - Dataset generator (dataloader) containing the data to retrieve (fresh dataset)
n_families (arg) - Number of families contained in the fresh dataset
n_samples_to_get (arg) - Number of samples to get per family from the Generator
other (arg) - Dictionary containing samples not to provide as result (default: None)

evaluate_fresh_scores(ds_path, checkpoint_path, net_type, n_query_samples, min_n_anchor_samples, max_n_anchor_samples, n_evaluations, batch_size) (function) - Evaluate model on the Malware Family Prediction task.

ds_path (arg) - Path of the directory where to find the fresh dataset (containing .dat files)
checkpoint_path (arg) - Path to the model checkpoint to load
net_type (arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha' (default: 'mtje')
n_query_samples (arg) - Number of query samples to retrieve, per-family (default: 23)
min_n_anchor_samples (arg) - Minimum number of anchor samples to use, per-family (default: 1)
max_n_anchor_samples (arg) - Maximum number of anchor samples to use, per-family (default: 10)
n_evaluations (arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15)
batch_size (arg) - How many samples per batch to load (default: 1000)

compute_ranking_scores(ranking_scores, global_ranks_to_save, rank_per_query) (function) - Compute ranking scores (MRR and MAP) and a bunch of interesting ranks to save to file from a list of ranks.

ranking_scores (arg) - Ranking scores previously computed
global_ranks_to_save (arg) - Global interesting ranks to save to file
rank_per_query (arg) - List of ranks computed by the model evaluation procedure

evaluate_fresh_rankings(ds_path, checkpoint_path, net_type, n_query_samples, n_evaluations, batch_size) (function) - Evaluate model on the Malware Family ranking task.

ds_path (arg) - Path of the directory where to find the fresh dataset (containing .dat files)
checkpoint_path (arg) - Path to the model checkpoint to load
net_type (arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha'. (default: 'mtje')
n_query_samples (arg) - Number of query samples per-family to consider (default: 23)
n_evaluations (arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15)
batch_size (arg) - How many samples per batch to load (default: 1000)

evaluate_fresh(fresh_ds_path, checkpoint_path, net_type, min_n_anchor_samples, max_n_anchor_samples, n_query_samples, n_evaluations, batch_size) (function) - Evaluate the model on both the family prediction task and on the family ranking task.

fresh_ds_path (arg) - Path of the directory where to find the fresh dataset (containing .dat files)
checkpoint_path (arg) - Path to the model checkpoint to load
net_type (arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha' (default: 'mtje')
min_n_anchor_samples (arg) - Minimum number of anchor samples to use, per-family (default: 1)
max_n_anchor_samples (arg) - Maximum number of anchor samples to use, per-family (default: 10)
n_query_samples (arg) - Number of query samples per-family to consider (default: 23)
n_evaluations (arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15)
batch_size (arg) - How many samples per batch to load (default: 1000)

__main__ (main) - Start baker in order to make it possible to run the script and use function names and parameters as the command line interface, using optparse-style options

Back to top

Repository file structure

root/
|
├── src/
|   |
|   ├── FreshDatasetBuilder/
|   |   |
|   |   ├── emberFeatures/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── features.py  - - - - - - - - - - - - - - - (features python code 📖Wiki)
|   |   |   └── vectorize_features.py  - - - - - - - - - - (vectorize features python code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── fresh_dataset_utils.py - - - - - - - - - - (fresh dataset utils python code 📖Wiki)
|   |   |   └── malware_bazaar_api.py  - - - - - - - - - - (malware bazaar API python code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   └── build_fresh_dataset.py - - - - - - - - - - (fresh dataset builder python code 📖Wiki)
|   |
|   ├── Model/
|   |   |
|   |   ├── nets/
|   |   |   |
|   |   |   ├── generators/
|   |   |   |   |
|   |   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   |   ├── dataset.py - - - - - - - - - - - - - - - - (dataset (base) code 📖Wiki)
|   |   |   |   ├── dataset_alt.py - - - - - - - - - - - - - - (dataset_alt code 📖Wiki)
|   |   |   |   ├── fresh_dataset.py - - - - - - - - - - - - - (fresh_dataset code 📖Wiki)
|   |   |   |   ├── fresh_generators.py  - - - - - - - - - - - (fresh_generators code 📖Wiki)
|   |   |   |   ├── generators.py  - - - - - - - - - - - - - - (generators (base) code 📖Wiki)
|   |   |   |   ├── generators_alt1.py - - - - - - - - - - - - (generators_alt1 code 📖Wiki)
|   |   |   |   ├── generators_alt2.py - - - - - - - - - - - - (generators_alt2 code 📖Wiki)
|   |   |   |   └── generators_alt3.py - - - - - - - - - - - - (generators_alt3 code 📖Wiki)
|   |   |   |
|   |   |   ├── utils/
|   |   |   |   |
|   |   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   |   └── Net.py - - - - - - - - - - - - - - - - - - (Net code 📖Wiki)
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── ALOHA_net.py - - - - - - - - - - - - - - - (ALOHA_net code 📖Wiki)
|   |   |   ├── Contrastive_Model_net.py - - - - - - - - - (Contrastive_Model_net code 📖Wiki)
|   |   |   ├── Family_Classifier_net.py - - - - - - - - - (Family_Classifier_net code 📖Wiki)
|   |   |   ├── MTJE_net.py  - - - - - - - - - - - - - - - (MTJE_net code 📖Wiki)
|   |   |   ├── MTJE_net_cosine.py - - - - - - - - - - - - (MTJE_net_cosine code 📖Wiki)
|   |   |   └── MTJE_net_pairwise_distance.py  - - - - - - (MTJE_net_pairwise_distance code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── contrastive_utils.py - - - - - - - - - - - (contrastive_utils code 📖Wiki)
|   |   |   ├── opt_utils.py - - - - - - - - - - - - - - - (opt_utils code 📖Wiki)
|   |   |   ├── plot_utils.py  - - - - - - - - - - - - - - (plot_utils code 📖Wiki)
|   |   |   └── ranking_metrics.py - - - - - - - - - - - - (ranking_metrics code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   ├── evaluate.py  - - - - - - - - - - - - - - - (evaluate code 📖Wiki)
|   |   ├── evaluate_contrastive.py  - - - - - - - - - (evaluate_contrastive code 📖Wiki)
|   |   ├── evaluate_family_classifier.py  - - - - - - (evaluate_family_classifier code 📖Wiki)
|   |   ├── evaluate_fresh.py  - - - - - - - - - - - - (evaluate_fresh code 📖Wiki)
|   |   ├── gen3_speed_evaluation.py - - - - - - - - - (gen3_speed_evaluation code 📖Wiki)
|   |   ├── plot.py  - - - - - - - - - - - - - - - - - (plot code 📖Wiki)
|   |   ├── plot_contrastive.py  - - - - - - - - - - - (plot_contrastive code 📖Wiki)
|   |   ├── plot_family_classifier.py  - - - - - - - - (plot_family_classifier code 📖Wiki)
|   |   ├── plot_fresh.py  - - - - - - - - - - - - - - (plot_fresh code 📖Wiki)
|   |   ├── train.py - - - - - - - - - - - - - - - - - (train code 📖Wiki)
|   |   ├── train_contrastive.py - - - - - - - - - - - (train_contrastive code 📖Wiki)
|   |   └── train_family_classifier.py - - - - - - - - (train_family_classifier code 📖Wiki)
|   |
|   ├── Sorel20mDataset/
|   |   |
|   |   ├── generators/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── sorel_dataset.py - - - - - - - - - - - - - (sorel_dataset code 📖Wiki)
|   |   |   └── sorel_generators.py  - - - - - - - - - - - (sorel_generators code 📖Wiki)
|   |   |
|   |   ├── utils/
|   |   |   |
|   |   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   |   ├── download_utils.py  - - - - - - - - - - - - (download_utils code 📖Wiki)
|   |   |   └── preproc_utils.py - - - - - - - - - - - - - (preproc_utils code 📖Wiki)
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   ├── preprocess_dataset.py  - - - - - - - - - - (preprocess_dataset code 📖Wiki)
|   |   ├── preprocess_ds_multi.py - - - - - - - - - - (preprocess_ds_multi code 📖Wiki)
|   |   └── sorel20mDownloader.py  - - - - - - - - - - (sorel20mDownloader code 📖Wiki)
|   |
|   ├── utils/
|   |   |
|   |   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   |   └── workflow_utils.py  - - - - - - - - - - - - - - - - - (workflow_utils code 📖Wiki)
|   |
|   ├── __init__.py  - - - - - - - - - - - - - - - (python module init)
|   ├── config.ini - - - - - - - - - - - - - - - - (configuration file 📖Wiki)
|   └── main.py  - - - - - - - - - - - - - - - - - (main code 📖Wiki)
|
├── MLproject  - - - - - - - - - - - - - - - - (MLproject file)
├── README.md  - - - - - - - - - - - - - - - - (README)
└── conda.yaml - - - - - - - - - - - - - - - - (conda yaml environment)