train.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules

  • import configparser - implements a basic configuration language for Python programs - configparser documentation
  • import importlib - provides the implementation of the import statement in Python source code - importlib documentation
  • import json - json encoder and decoder - json documentation
  • import os - provides a portable way of using operating system dependent functionality - os documentation
  • import shutil - used to recursively copy an entire directory tree rooted at src to a directory named dst - shutil documentation
  • import sys - system-specific parameters and functions - sys documentation
  • import time - provides various time-related functions - time documentation
  • from collections import defaultdict - dict subclass that calls a factory function to supply missing values - collections documentation
  • from copy import deepcopy - creates a new object and recursively copies the original object elements - copy documentation
  • from urllib import parse - standard interface to break Uniform Resource Locator (URL) in components - urllib.parse documentation

  • import baker - easy, powerful access to Python functions from the command line - baker documentation
  • import mlflow - open source platform for managing the end-to-end machine learning lifecycle - mlflow documentation
  • import numpy as np - the fundamental package for scientific computing with Python - numpy documentation
  • import psutil - used for retrieving information on running processes and system utilization - psutil documentation
  • import torch - tensor library like NumPy, with strong GPU support - pytorch documentation
  • from logzero import logger - robust and effective logging for Python - logzero documentation

  • from utils.opt_utils import get_opt_state
  • from utils.opt_utils import save_opt_state

Back to top

Classes and functions

import_modules(net_type, gen_type) (function) - Dynamically import network, dataset and generator modules depending on the provided arguments.

  • net_type (arg) - Network type (possible values: mtje, mtje_cosine, mtje_pairwise_distance, aloha)
  • gen_type (arg) - Generator type (possible values: base, alt1, alt2, alt3)

train_network(ds_path, net_type, gen_type, run_id, training_run, batch_size, epochs, training_n_samples, validation_n_samples, use_malicious_labels, use_count_labels, use_tag_labels, feature_dimension, random_seed, workers) (function, baker command) - Train a feed-forward neural network on EMBER 2.0 features, optionally with additional targets as described in the ALOHA paper (https://arxiv.org/abs/1903.05700). SMART tags based on (https://arxiv.org/abs/1905.06262).

  • ds_path (arg) - Path of the directory where to find the pre-processed dataset (containing .dat files)
  • net_type (arg) - Network to use between 'mtje', 'mtje_cosine', 'mtje_pairwise_distance' and 'aloha' (default: 'mtje')
  • gen_type (arg) - Generator (and dataset) class to use between 'base', 'alt1', 'alt2' or 'alt3' (default: 'base')
  • run_id (arg) - Mlflow run id of a previously stopped run to resume (default: None)
  • training_run (arg) - Training run identifier -> to plot base evaluation results with mean and confidence we need at least 2 runs (default: 0)
  • batch_size (arg) - How many samples per batch to load (default: 8192)
  • epochs (arg) - How many epochs to train for (default: 10)
  • training_n_samples (arg) - Number of training samples to consider (used to access the right files) (default: 0 -> all)
  • validation_n_samples (arg) - Number of validation samples to consider (used to access the right files) (default: 0 -> all)
  • use_malicious_labels (arg) - Whether or not (1/0) to use malware/benignware labels as a target (default: 1)
  • use_count_labels (arg) - Whether or not (1/0) to use the counts as an additional target (default: 1)
  • use_tag_labels (arg) - Whether or not (1/0) to use the tags as additional targets (default: 1)
  • feature_dimension (arg) - The input dimension of the model (default: 2381 -> EMBER 2.0 feature size)
  • random_seed (arg) - If provided, seed random number generation with this value (default: None -> no seeding)
  • workers (arg) - How many workers (threads) should the dataloader use (default: 0 -> use multiprocessing.cpu_count())

__main__ (main) - Start baker in order to make it possible to run the script and use function names and parameters as the command line interface, using optparse-style options


Back to top

⚠️ **GitHub.com Fallback** ⚠️