evaluate_fresh.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules

  • import configparser - implements a basic configuration language for Python programs - os documentation
  • import importlib - provides the implementation of the import statement in Python source code - importlib documentation
  • import json - json encoder and decoder - json documentation
  • import os - provides a portable way of using operating system dependent functionality - os documentation
  • import sys - system-specific parameters and functions - sys documentation
  • import tempfile - used to create temporary files and directories - tempfile documentation
  • import time - provides various time-related functions - time documentation
  • from copy import deepcopy - creates a new object and recursively copies the original object elements - copy documentation

  • import baker - easy, powerful access to Python functions from the command line - baker documentation
  • import mlflow - open source platform for managing the end-to-end machine learning lifecycle - mlflow documentation
  • import numpy as np - the fundamental package for scientific computing with Python - numpy documentation
  • import pandas as pd - pandas is a flexible and easy to use open source data analysis and manipulation tool - pandas documentation
  • import psutil - used for retrieving information on running processes and system utilization - psutil documentation
  • import torch - tensor library like NumPy, with strong GPU support - pytorch documentation
  • from logzero import logger - robust and effective logging for Python - logzero documentation

  • from nets.generators.fresh_dataset import Dataset
  • from nets.generators.fresh_generators import get_generator
  • from utils.ranking_metrics import mean_reciprocal_rank
  • from utils.ranking_metrics import mean_average_precision
  • from utils.ranking_metrics import max_reciprocal_rank_index
  • from utils.ranking_metrics import min_reciprocal_rank_index
  • from utils.ranking_metrics import max_average_precision_index
  • from utils.ranking_metrics import min_average_precision_index

Back to top

Classes and functions

distance_to_similarity(distances, a, function) (function) - Calculate similarity scores from distances by using an inversion function.

  • distances (arg) - Tensor containing the distances calculated between two embeddings
  • a (arg) - Inversion multiplication factor (default: 1.0)
  • function (arg) - Inversion function to use. Possible values are: 'exp', 'inv' or 'inv_pow' (default: 'exp')

import_modules(net_type) (function) - Dynamically import network, dataset and generator modules depending on the provided arguments.

  • net_type (arg) - Network type (possible values: mtje, mtje_cosine, mtje_pairwise_distance, aloha)

detach_and_copy_array(array) (function) - Detach numpy array or pytorch tensor and return a deep copy of it.

  • array (arg) - Numpy array or pytorch tensor to copy

normalize_embeddings(results_dict) (function) - Take a set of results dicts and break them out into a single dict of 1d arrays with appropriate column names that pandas can convert to a DataFrame.

  • results_dict (arg) - Results (predicted labels) dictionary

get_samples(model, generator, n_families, n_samples_to_get, other) (function) - Get 'n_samples_to_get' from the prodived 'generator' among the samples not in 'other'.

  • model (arg) - Model to evaluate
  • generator (arg) - Dataset generator (dataloader) containing the data to retrieve (fresh dataset)
  • n_families (arg) - Number of families contained in the fresh dataset
  • n_samples_to_get (arg) - Number of samples to get per family from the Generator
  • other (arg) - Dictionary containing samples not to provide as result (default: None)

evaluate_fresh_scores(ds_path, checkpoint_path, net_type, n_query_samples, min_n_anchor_samples, max_n_anchor_samples, n_evaluations, batch_size) (function) - Evaluate model on the Malware Family Prediction task.

  • ds_path (arg) - Path of the directory where to find the fresh dataset (containing .dat files)
  • checkpoint_path (arg) - Path to the model checkpoint to load
  • net_type (arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha' (default: 'mtje')
  • n_query_samples (arg) - Number of query samples to retrieve, per-family (default: 23)
  • min_n_anchor_samples (arg) - Minimum number of anchor samples to use, per-family (default: 1)
  • max_n_anchor_samples (arg) - Maximum number of anchor samples to use, per-family (default: 10)
  • n_evaluations (arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15)
  • batch_size (arg) - How many samples per batch to load (default: 1000)

compute_ranking_scores(ranking_scores, global_ranks_to_save, rank_per_query) (function) - Compute ranking scores (MRR and MAP) and a bunch of interesting ranks to save to file from a list of ranks.

  • ranking_scores (arg) - Ranking scores previously computed
  • global_ranks_to_save (arg) - Global interesting ranks to save to file
  • rank_per_query (arg) - List of ranks computed by the model evaluation procedure

evaluate_fresh_rankings(ds_path, checkpoint_path, net_type, n_query_samples, n_evaluations, batch_size) (function) - Evaluate model on the Malware Family ranking task.

  • ds_path (arg) - Path of the directory where to find the fresh dataset (containing .dat files)
  • checkpoint_path (arg) - Path to the model checkpoint to load
  • net_type (arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha'. (default: 'mtje')
  • n_query_samples (arg) - Number of query samples per-family to consider (default: 23)
  • n_evaluations (arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15)
  • batch_size (arg) - How many samples per batch to load (default: 1000)

evaluate_fresh(fresh_ds_path, checkpoint_path, net_type, min_n_anchor_samples, max_n_anchor_samples, n_query_samples, n_evaluations, batch_size) (function) - Evaluate the model on both the family prediction task and on the family ranking task.

  • fresh_ds_path (arg) - Path of the directory where to find the fresh dataset (containing .dat files)
  • checkpoint_path (arg) - Path to the model checkpoint to load
  • net_type (arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha' (default: 'mtje')
  • min_n_anchor_samples (arg) - Minimum number of anchor samples to use, per-family (default: 1)
  • max_n_anchor_samples (arg) - Maximum number of anchor samples to use, per-family (default: 10)
  • n_query_samples (arg) - Number of query samples per-family to consider (default: 23)
  • n_evaluations (arg) - Number of evaluations to perform (for uncertainty estimates) (default: 15)
  • batch_size (arg) - How many samples per batch to load (default: 1000)

__main__ (main) - Start baker in order to make it possible to run the script and use function names and parameters as the command line interface, using optparse-style options


Back to top

⚠️ **GitHub.com Fallback** ⚠️