vectorize_features.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules


  • import numpy as np - the fundamental package for scientific computing with Python - numpy documentation
  • from logzero import logger - robust and effective logging for Python - logzero documentation
  • from tqdm import tqdm - instantly makes loops show a smart progress meter - tqdm documentation

  • from .features import PEFeatureExtractor

Back to top

Classes and functions

features_postproc_func(x) (function) - Features post-processing function.

  • x (arg) - Data point to apply the post processing function to

raw_feature_iterator(file_paths) (function) - Yield raw feature strings from the inputed file paths.

  • file_paths (arg) - List of files to read, one line at a time

vectorize(irow, raw_features_string, X_path, y_path, S_path, extractor, nrows) (function) - Vectorize a single sample of raw features and write to a large numpy file.

  • irow (arg) - Raw feature index
  • raw_features_string (arg) - Raw feature string
  • X_path (arg) - Features vector destination filename
  • y_path (arg) - Labels vector destination filename
  • S_path (arg) - Shas vector destination filename
  • extractor (arg) - PEFeatureExtractor instance
  • nrows (arg) - Total number of rows in raw features files

vectorize_unpack(args) (function) - Pass through function for unpacking vectorize arguments.

  • args (arg) - Vectorization arguments

vectorize_subset(X_path, y_path, S_path, raw_feature_paths, extractor, nrows) (function) - Vectorize a subset of data and write it to disk.

  • X_path (arg) - Features vector destination filename
  • y_path (arg) - Labels vector destination filename
  • S_path (arg) - Shas vector destination filename
  • raw_feature_paths (arg) - List of files where to look for raw features
  • extractor (arg) - PEFeatureExtractor instance
  • nrows (arg) - Total number of rows in raw features files

create_vectorized_features(dataset_dest_dir, raw_features_paths, feature_version) (function) - Create feature vectors from raw features and write them to disk.

  • dataset_dest_dir (arg) - Dir where to find the raw features and where to write the dataset
  • raw_features_paths (arg) - List of all files containing raw features
  • feature_version (arg) - Ember features version (default: 2)

Back to top

⚠️ **GitHub.com Fallback** ⚠️