MTJE_net.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules

  • import configparser - implements a basic configuration language for Python programs - configparser documentation
  • import os - provides a portable way of using operating system dependent functionality - os documentation
  • from copy import deepcopy - creates a new object and recursively copies the original object elements - copy documentation


  • from .generators.dataset import Dataset
  • from .utils.Net import Net as baseNet

Back to top

Classes and functions

Net (class) - Multi Task Joint Embedding Network which calculates embeddings similarity using the dot product.

  • __init__(self, use_malware, use_counts, use_tags, n_tags, feature_dimension, embedding_dimension, max_embedding_norm, layer_sizes, dropout_p, activation_function) (member function) - Initialize net.
    • use_malware (arg) - Whether to use the malicious label for the data points or not (default: True)
    • use_counts (arg) - Whether to use the counts for the data points or not (default: True)
    • use_tags (arg) - Whether to use the tags for the data points or not. NOTE: this is here just for compatibility with the training procedure. With the joint embedding network the tags will always be used, even if this flag is false. (default: True)
    • n_tags (arg) - Number of tags to predict (default: None)
    • feature_dimension (arg) - Dimension of the input data feature vector (default: 2381)
    • embedding_dimension (arg) - Joint latent space size (default: 32)
    • max_embedding_norm (arg) - Value at which to constrain the embedding vector norm to (default: 1)
    • layer_sizes (arg) - Layer sizes (array of sizes) (default: None -> use [512, 512, 128])
    • dropout_p (arg) - Dropout probability (default: 0.05)
    • activation_function (arg) - Non-linear activation function to use (may be "elu", "leakyRelu", "pRelu" or "relu") (default: "elu")
    • normalization_function (arg) - Normalization function to use (may be "layer_norm" or "batch_norm") (default: "batch_norm")
  • forward(self, data) (member function) - Forward batch of data through the net.
    • data (arg) - Current batch of data (features)
  • get_embedding(self, data) (member function) - Forward batch of data through the net and get resulting embedding.
    • data (arg) - Current batch of data (features)
  • get_similarity(self, first_embedding, second_embedding) (member function) - Get similarity scores between two embedding matrices (embeddings of batches of data).
    • first_embedding (arg) - Embeddings of a batch of data (dim: batch_dim_1 x 32)
    • second_embedding (arg) - Embeddings of a batch of data (dim: batch_dim_2 x 32)
  • compute_loss(predictions, labels, loss_wts) (static member function) - Compute Net losses (optionally with SMART tags and vendor detection count auxiliary losses).
    • predictions (arg) - A dictionary of results from the Net
    • labels (arg) - A dictionary of labels
    • loss_wts (arg) - Weights to assign to each head of the network (if it exists); defaults to {'malware': 1.0, 'count': 0.1, 'tags': 1.0}
  • normalize_results(labels_dict, results_dict, use_malware, use_count, use_tags) (member function) - Take a set of results dicts and break them out into a single dict of 1d arrays with appropriate column names that pandas can convert to a DataFrame.
    • labels_dict (arg) - Labels (ground truth) dictionary
    • results_dict (arg) - Results (predicted labels) dictionary
    • use_malware (arg) - Whether to use malware/benignware labels as a target (default: False)
    • use_count (arg) - Whether to use the counts as an additional target (default: False)
    • use_tags (arg) - Whether to use SMART tags as additional targets. NOTE: this is here just for compatibility with the evaluation procedure. With the joint embedding network the tags will always be used, even if this flag is false. (default: False)

Back to top

⚠️ **GitHub.com Fallback** ⚠️