vectorize_features.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
import json- json encoder and decoder - json documentation -
import os- provides a portable way of using operating system dependent functionality - os documentation -
from multiprocessing.pool import ThreadPool- pool of worker threads jobs can be submitted to - multiprocessing documentation
-
import numpy as np- the fundamental package for scientific computing with Python - numpy documentation -
from logzero import logger- robust and effective logging for Python - logzero documentation -
from tqdm import tqdm- instantly makes loops show a smart progress meter - tqdm documentation
from .features import PEFeatureExtractor
features_postproc_func(x) (function) - Features post-processing function.
-
x(arg) - Data point to apply the post processing function to
raw_feature_iterator(file_paths) (function) - Yield raw feature strings from the inputed file paths.
-
file_paths(arg) - List of files to read, one line at a time
vectorize(irow, raw_features_string, X_path, y_path, S_path, extractor, nrows) (function) - Vectorize a single sample of raw features and write to a large numpy file.
-
irow(arg) - Raw feature index -
raw_features_string(arg) - Raw feature string -
X_path(arg) - Features vector destination filename -
y_path(arg) - Labels vector destination filename -
S_path(arg) - Shas vector destination filename -
extractor(arg) - PEFeatureExtractor instance -
nrows(arg) - Total number of rows in raw features files
vectorize_unpack(args) (function) - Pass through function for unpacking vectorize arguments.
-
args(arg) - Vectorization arguments
vectorize_subset(X_path, y_path, S_path, raw_feature_paths, extractor, nrows) (function) - Vectorize a subset of data and write it to disk.
-
X_path(arg) - Features vector destination filename -
y_path(arg) - Labels vector destination filename -
S_path(arg) - Shas vector destination filename -
raw_feature_paths(arg) - List of files where to look for raw features -
extractor(arg) - PEFeatureExtractor instance -
nrows(arg) - Total number of rows in raw features files
create_vectorized_features(dataset_dest_dir, raw_features_paths, feature_version) (function) - Create feature vectors from raw features and write them to disk.
-
dataset_dest_dir(arg) - Dir where to find the raw features and where to write the dataset -
raw_features_paths(arg) - List of all files containing raw features -
feature_version(arg) - Ember features version (default: 2)