config.ini - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

Tool configuration file. Modify values inside as needed.

In this page

Configuration variables

Section general

  • device - desired device to train the model on, e.g. 'cuda:0' if a GPU is available, otherwise 'cpu'
  • workers - number of workers to be used (if 0 -> set to current system cpu count)
  • runs - number of training runs to do

Section sorel20mDataset

  • training_n_samples - max number of training data samples to use (if -1 -> takes all)
  • validation_n_samples - max number of validation data samples to use (if -1 -> takes all)
  • test_n_samples - max number of test data samples to use (if -1 -> takes all)
  • validation_test_split - (should not be changed) timestamp that divides the validation data (used to check convergence/overfitting) from test data (used to assess final performance)
  • train_validation_split - (should not be changed) timestamp that splits training data from validation data
  • total_training_samples - (should not be changed) total number of available training samples in the original Sorel20M dataset
  • total_validation_samples - (should not be changed) total number of available validation samples in the original Sorel20M dataset
  • total_test_samples - (should not be changed) total number of available test samples in the original Sorel20M dataset

Section aloha

  • batch_size - how many samples per batch to load
  • epochs - how many epochs to train for
  • use_malicious_labels - whether or not (1/0) to use malware/benignware labels as a target
  • use_count_labels - whether or not (1/0) to use the counts as an additional target
  • use_tag_labels - whether or not (1/0) to use the tags as additional targets
  • layer_sizes - define aloha net initial linear layers sizes (and amount). Examples:
    • [512,512,128]: the initial layers (before the task branches) will be 3 with sizes 512, 512, 128 respectively
    • [512,256]: the initial layers (before the task branches) will be 2 with sizes 512, 256 respectively
  • dropout_p - dropout probability between the first aloha net layers
  • activation_function - activation function between the first aloha net layers. Possible values:
    • elu: Exponential Linear Unit activation function
    • leakyRelu: leaky Relu activation function
    • pRelu: parametric Relu activation function (better to use this with weight decay = 0)
    • relu: Rectified Linear Unit activation function
  • normalization_function - normalization function between the first aloha net layers. Possible values:
    • layer_norm: the torch.nn.LayerNorm function
    • batch_norm: the torch.nn.BatchNorm1d function
  • loss_weights - label weights to be used during loss calculation (Notice: only the weights corresponding to enabled labels will be used). Example: {'malware': 1.0, 'count': 0.1, 'tags': 1.0}
  • optimizer - optimizer to use during training. Possible values:
    • adam: Adam algorithm
    • sgd: stochastic gradient descent
  • lr - learning rate to use during training
  • momentum - momentum to be used during training when using 'sgd' optimizer
  • weight_decay - weight decay (L2 penalty) to use with selected optimizer
  • gen_type - generator type. Possible values are:
    • base: use basic generator (from the original SOREL20M code) modified to work with the pre-processed dataset
    • alt1: use alternative generator 1. Inspired by the 'index select' version of https://discuss.pytorch.org/t/dataloader-much-slower-than-manual-batching/27014/6, this version uses a new dataloader class, called FastTensorDataloader, to process tabular based data. It was modified from the original version available at the above link to be able to work with the pre-processed dataset (numpy memmap) and with multiple workers (in multiprocessing)
    • alt2: use alternative generator 2. Inspired by the 'shuffle in-place' version of https://discuss.pytorch.org/t/dataloader-much-slower-than-manual-batching/27014/6, this version uses a new dataloader class, called FastTensorDataloader, to process tabular based data. It was modified from the original version available at the above link to be able to work with the pre-processed dataset (numpy memmap) and with multiple workers (in multiprocessing)
    • alt3: use alternative generator 3. This version uses a new dataloader class, called FastTensorDataloader which asynchronously (if workers > 1) loads the dataset into memory in randomly chosen chunks which are concatenated together to form a 'chunk aggregate' -> the data inside a chunk aggregate is then shuffled. Finally batches of data are extracted from a chunk aggregate. The samples shuffling is therefore more localised but the loading speed is greatly increased

Section mtje

  • batch_size - how many samples per batch to load

  • epochs - how many epochs to train for

  • use_malicious_labels - whether or not (1/0) to use malware/benignware labels as a target

  • use_count_labels - whether or not (1/0) to use the counts as an additional target

  • layer_sizes - define mtje net initial linear layers sizes (and amount). Examples:

    • [512,512,128]: the initial layers (before the task branches) will be 3 with sizes 512, 512, 128 respectively
    • [512,256]: the initial layers (before the task branches) will be 2 with sizes 512, 256 respectively
  • dropout_p - dropout probability between the first mtje net layers

  • activation_function - activation function between the first aloha net layers. Possible values:

    • elu: Exponential Linear Unit activation function
    • leakyRelu: leaky Relu activation function
    • pRelu: parametric Relu activation function (better to use this with weight decay = 0)
    • relu: Rectified Linear Unit activation function
  • normalization_function - normalization function between the first aloha net layers. Possible values:

    • layer_norm: the torch.nn.LayerNorm function
    • batch_norm: the torch.nn.BatchNorm1d function
  • loss_weights - label weights to be used during loss calculation (Notice: only the weights corresponding to enabled labels will be used). Example: {'malware': 1.0, 'count': 0.1, 'tags': 1.0}

  • optimizer - optimizer to use during training. Possible values:

    • adam: Adam algorithm
    • sgd: stochastic gradient descent
  • lr - learning rate to use during training

  • momentum - momentum to be used during training when using 'sgd' optimizer

  • weight_decay - weight decay (L2 penalty) to use with selected optimizer

  • gen_type - generator type. Possible values are:

    • base: use basic generator (from the original SOREL20M code) modified to work with the pre-processed dataset
    • alt1: use alternative generator 1. Inspired by the 'index select' version of https://discuss.pytorch.org/t/dataloader-much-slower-than-manual-batching/27014/6, this version uses a new dataloader class, called FastTensorDataloader, to process tabular based data. It was modified from the original version available at the above link to be able to work with the pre-processed dataset (numpy memmap) and with multiple workers (in multiprocessing)
    • alt2: use alternative generator 2. Inspired by the 'shuffle in-place' version of https://discuss.pytorch.org/t/dataloader-much-slower-than-manual-batching/27014/6, this version uses a new dataloader class, called FastTensorDataloader, to process tabular based data. It was modified from the original version available at the above link to be able to work with the pre-processed dataset (numpy memmap) and with multiple workers (in multiprocessing)
    • alt3: use alternative generator 3. This version uses a new dataloader class, called FastTensorDataloader which asynchronously (if workers > 1) loads the dataset into memory in randomly chosen chunks which are concatenated together to form a 'chunk aggregate' -> the data inside a chunk aggregate is then shuffled. Finally batches of data are extracted from a chunk aggregate. The samples shuffling is therefore more localised but the loading speed is greatly increased
  • similarity_measure - similarity measure used to evaluate distances in joint embedding space. Possible values are:

    • dot: dot product between vectors in the embedding space. The similarity measure used in mtje paper
    • cosine: cosine similarity between vectors in the embedding space
    • pairwise_distance: calculates the pairwise distance and then transforms it to a similarity measure (between 0 and 1)
  • pairwise_distance_to_similarity_function - (IF 'pairwise_distance' IS SELECTED AS similarity_measure) - distance-to-similarity function to use. These functions will map values belonging to the R+ set (Real positives) to real values belonging to the [0,1] interval. Possible values are:

    • exp: will compute e^(-x/a)
    • inv: will compute 1/(1+x/a)
    • inv_pow: will compute 1/(1+(x^2)/a)

    where 'a' is a multiplicative factor (see 'pairwise_a')

  • pairwise_a - (IF 'pairwise_distance' IS SELECTED AS similarity_measure) - distance-to-similarity function 'a' multiplicative factor


Section freshDataset

  • families - malware Bazaar families of interest. NOTE: It is recommended to specify more families than 'number_of_families' since Malware Bazaar may not have 'amount_each' samples for some of them. These families will be considered in order.
  • number_of_families - number of families to consider. The ones in excess, going in order, will not be considered.
  • amount_each - amount of samples for each malware family to retrieve from Malware Bazaar
  • n_queries - number of query samples per-family to consider
  • min_n_anchor_samples - minimum number of anchor samples to use, per-family
  • max_n_anchor_samples - maximum number of anchor samples to use, per-family
  • n_evaluations - number of evaluations to perform (for uncertainty estimates)

Section familyClassifier

  • epochs - how many epochs to train the family classifier for
  • train_split_proportion - proportion of the whole fresh dataset to use for training the family classifier
  • valid_split_proportion - proportion of the whole fresh dataset to use for validating the family classifier
  • test_split_proportion - proportion of the whole fresh dataset to use for testing the family classifier
  • batch_size - how many samples per batch to load for the family classifier
  • optimizer - optimizer to use during training. Possible values:
    • adam: Adam algorithm
    • sgd: stochastic gradient descent
  • lr - learning rate to use during training
  • momentum - momentum to be used during training when using 'sgd' optimizer
  • weight_decay - weight decay (L2 penalty) to use with selected optimizer
  • layer_sizes - define family classifier output head size and number of linear layers. Examples:
    • [128,256,64]: the family classifier layers will be 3 with sizes 128, 256, 64 respectively
    • [128,64]: the family classifier layers will be 2 with sizes 128, 64 respectively

Section contrastiveLearning

  • epochs - how many epochs to train the contrastive model for
  • train_split_proportion - proportion of the whole fresh dataset to use for training the contrastive model
  • valid_split_proportion - proportion of the whole fresh dataset to use for validating the contrastive model
  • test_split_proportion - proportion of the whole fresh dataset to use for testing the contrastive model
  • batch_size - how many samples per batch to load for the contrastive model
  • optimizer - optimizer to use during training. Possible values:
    • adam: Adam algorithm
    • sgd: stochastic gradient descent
  • lr - learning rate to use during training
  • momentum - momentum to be used during training when using 'sgd' optimizer
  • weight_decay - weight decay (L2 penalty) to use with selected optimizer
  • hard - online triplet mining function to use when training the model with contrastive learning. Possible values:
    • 0: batch_all_triplet_loss online triplet mining function
    • 1: batch_hard_triplet_loss online triplet mining function
  • margin - margin to use in the triplet loss
  • squared - whether (1) to use the squared euclidean norm as distance metric or the simple euclidean norm (0)
  • rank_size - size of the produced rankings
  • knn_k_min - minimum number of nearest neighbours to consider when classifying samples with k-NN algorithm (only odd numbers between knn_k_min and knn_k_max, included, will be used)
  • knn_k_max - maximum number of nearest neighbours to consider when classifying samples with k-NN algorithm (only odd numbers between knn_k_min and knn_k_max, included, will be used)

Back to top

⚠️ **GitHub.com Fallback** ⚠️