evaluate_fresh.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
import configparser- implements a basic configuration language for Python programs - os documentation -
import importlib- provides the implementation of the import statement in Python source code - importlib documentation -
import json- json encoder and decoder - json documentation -
import os- provides a portable way of using operating system dependent functionality - os documentation -
import sys- system-specific parameters and functions - sys documentation -
import tempfile- used to create temporary files and directories - tempfile documentation -
import time- provides various time-related functions - time documentation -
from copy import deepcopy- creates a new object and recursively copies the original object elements - copy documentation
-
import baker- easy, powerful access to Python functions from the command line - baker documentation -
import mlflow- open source platform for managing the end-to-end machine learning lifecycle - mlflow documentation -
import numpy as np- the fundamental package for scientific computing with Python - numpy documentation -
import pandas as pd- pandas is a flexible and easy to use open source data analysis and manipulation tool - pandas documentation -
import psutil- used for retrieving information on running processes and system utilization - psutil documentation -
import torch- tensor library like NumPy, with strong GPU support - pytorch documentation -
from logzero import logger- robust and effective logging for Python - logzero documentation
from nets.generators.fresh_dataset import Datasetfrom nets.generators.fresh_generators import get_generatorfrom utils.ranking_metrics import mean_reciprocal_rankfrom utils.ranking_metrics import mean_average_precisionfrom utils.ranking_metrics import max_reciprocal_rank_indexfrom utils.ranking_metrics import min_reciprocal_rank_indexfrom utils.ranking_metrics import max_average_precision_indexfrom utils.ranking_metrics import min_average_precision_index
distance_to_similarity(distances, a, function) (function) - Calculate similarity scores from distances by using an inversion function.
-
distances(arg) - Tensor containing the distances calculated between two embeddings -
a(arg) - Inversion multiplication factor (default: 1.0) -
function(arg) - Inversion function to use. Possible values are: 'exp', 'inv' or 'inv_pow' (default: 'exp')
import_modules(net_type) (function) - Dynamically import network, dataset and generator modules depending on the provided arguments.
-
net_type(arg) - Network type (possible values: mtje, mtje_cosine, mtje_pairwise_distance, aloha)
detach_and_copy_array(array) (function) - Detach numpy array or pytorch tensor and return a deep copy of it.
-
array(arg) - Numpy array or pytorch tensor to copy
normalize_embeddings(results_dict) (function) - Take a set of results dicts and break them out into a single dict of 1d arrays with appropriate column names that pandas can convert to a DataFrame.
-
results_dict(arg) - Results (predicted labels) dictionary
get_samples(model, generator, n_families, n_samples_to_get, other) (function) - Get 'n_samples_to_get' from the prodived 'generator' among the samples not in 'other'.
-
model(arg) - Model to evaluate -
generator(arg) - Dataset generator (dataloader) containing the data to retrieve (fresh dataset) -
n_families(arg) - Number of families contained in the fresh dataset -
n_samples_to_get(arg) - Number of samples to get per family from the Generator -
other(arg) - Dictionary containing samples not to provide as result (default: None)
evaluate_fresh_scores(ds_path, checkpoint_path, net_type, n_query_samples, min_n_anchor_samples, max_n_anchor_samples, n_evaluations, batch_size) (function) - Evaluate model on the Malware Family Prediction task.
-
ds_path(arg) - Path of the directory where to find the fresh dataset (containing .dat files) -
checkpoint_path(arg) - Path to the model checkpoint to load -
net_type(arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha' (default: 'mtje') -
n_query_samples(arg) - Number of query samples to retrieve, per-family (default: 23) -
min_n_anchor_samples(arg) - Minimum number of anchor samples to use, per-family (default: 1) -
max_n_anchor_samples(arg) - Maximum number of anchor samples to use, per-family (default: 10) -
n_evaluations(arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15) -
batch_size(arg) - How many samples per batch to load (default: 1000)
compute_ranking_scores(ranking_scores, global_ranks_to_save, rank_per_query) (function) - Compute ranking scores (MRR and MAP) and a bunch of interesting ranks to save to file from a list of ranks.
-
ranking_scores(arg) - Ranking scores previously computed -
global_ranks_to_save(arg) - Global interesting ranks to save to file -
rank_per_query(arg) - List of ranks computed by the model evaluation procedure
evaluate_fresh_rankings(ds_path, checkpoint_path, net_type, n_query_samples, n_evaluations, batch_size) (function) - Evaluate model on the Malware Family ranking task.
-
ds_path(arg) - Path of the directory where to find the fresh dataset (containing .dat files) -
checkpoint_path(arg) - Path to the model checkpoint to load -
net_type(arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha'. (default: 'mtje') -
n_query_samples(arg) - Number of query samples per-family to consider (default: 23) -
n_evaluations(arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15) -
batch_size(arg) - How many samples per batch to load (default: 1000)
evaluate_fresh(fresh_ds_path, checkpoint_path, net_type, min_n_anchor_samples, max_n_anchor_samples, n_query_samples, n_evaluations, batch_size) (function) - Evaluate the model on both the family prediction task and on the family ranking task.
-
fresh_ds_path(arg) - Path of the directory where to find the fresh dataset (containing .dat files) -
checkpoint_path(arg) - Path to the model checkpoint to load -
net_type(arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha' (default: 'mtje') -
min_n_anchor_samples(arg) - Minimum number of anchor samples to use, per-family (default: 1) -
max_n_anchor_samples(arg) - Maximum number of anchor samples to use, per-family (default: 10) -
n_query_samples(arg) - Number of query samples per-family to consider (default: 23) -
n_evaluations(arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15) -
batch_size(arg) - How many samples per batch to load (default: 1000)
__main__ (main) - Start baker in order to make it possible to run the script and use function names and parameters as the command line interface, using optparse-style options