features.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki

In this page

Imported Modules

  • import hashlib - common interface to many different secure hash and message digest algorithms - hashlib documentation
  • import re - provides regular expression matching operations - re documentation


Back to top

Classes and functions

FeatureType (class) - Base class from which each feature type may inherit.

  • __repr__(self) (member function) - Get unambiguous object representation in string format.
  • raw_features(self, bytez, lief_binary) (member function) - Generate a JSON-able representation of the file (raw features).
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Generate a feature vector from the raw features.
    • raw_obj (arg) - Dictionary of raw features
  • feature_vector(self, bytez, lief_binary) (member function) - Directly calculate the feature vector from the sample itself. This should only be implemented differently if there are significant speedups to be gained from combining the two functions.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries

ByteHistogram (class) - Byte histogram (count + non-normalized) over the entire binary file.

  • __init__(self) (member function) - Initialize ByteHistogram class.
  • raw_features(self, bytez, lief_binary) (member function) - Generate raw byte histogram.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw byte histogram (normalizing).
    • raw_obj (arg) - Byte histogram raw features

ByteEntropyHistogram (class) - 2d byte/entropy histogram based loosely on (Saxe and Berlin, 2015). This roughly approximates the joint probability of byte value and local entropy. See Section 2.1.1 in https://arxiv.org/pdf/1508.03096.pdf for more info.

  • __init__(self) (member function) - Initialize ByteEntropyHistogram class.
  • _entropy_bin_counts(self, block) (member function) - Get bin frequencies (counts) and entropy bin index (Hbin).
    • block (arg) - Ndarray containing a piece (block) of the PE file binary data
  • raw_features(self, bytez, lief_binary) (member function) - Generate raw entropy byte histogram.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw byte histogram (normalizing).
    • raw_obj (arg) - Byte entropy histogram raw features

SectionInfo (class) - Information about section names, sizes and entropy. Uses hashing trick to summarize all this section info into a feature vector.

  • __init__(self) (member function) - Initialize SectionInfo class.
  • _properties(s) (static method) - Get section characteristics list.
    • s (arg) - Lief binary section
  • raw_features(self, bytez, lief_binary) (member function) - Generate raw section info.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw section info (hashing trick and stacking).
    • raw_obj (arg) - Section info raw features

ImportsInfo (class) - Information about imported libraries and functions from the import address table. Note that the total number of imported functions is contained in GeneralFileInfo.

  • __init__(self) (member function) - Initialize ImportsInfo class.
  • raw_features(self, bytez, lief_binary) (member function) - Generate raw imports info.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw imports info (hashing trick and stacking).
    • raw_obj (arg) - Imports info raw features

ExportsInfo (class) - Information about exported functions. Note that the total number of exported functions is contained in GeneralFileInfo.

  • __init__(self) (member function) - Initialize ExportsInfo class.
  • raw_features(self, bytez, lief_binary) (member function) - Generate raw Exports info.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw exports info (hashing trick and stacking).
    • raw_obj (arg) - Exports info raw features

GeneralFileInfo (class) - General information about the file.

  • __init__(self) (member function) - Initialize GeneralFileInfo class.
  • raw_features(self, bytez, lief_binary) (member function) - Generate raw general info.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw general info (stacking).
    • raw_obj (arg) - General file info raw features

HeaderFileInfo (class) - Machine, architecture, OS, linker and other information extracted from header.

  • __init__(self) (member function) - Initialize HeaderFileInfo class.
  • raw_features(self, bytez, lief_binary) (member function) - Generate raw header info.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw header info (hashing trick and stacking).
    • raw_obj (arg) - Header file info raw features

StringExtractor (class) - Extracts strings from raw byte stream.

  • __init__(self) (member function) - Initialize StringExtractor class.
  • raw_features(self, bytez, lief_binary) (member function) - Extract raw string info.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw string info (stacking).
    • raw_obj (arg) - String extractor raw features

DataDirectories (class) - Extracts size and virtual address of the first 15 data directories.

  • __init__(self) (member function) - Initialize DataDirectories class.
  • raw_features(self, bytez, lief_binary) (member function) - Extract raw data directories info.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw data directories info.
    • raw_obj (arg) - Data dictionaries raw features

PEFeatureExtractor (class) - Extract useful features from a PE file, and return as a vector of fixed size.

  • __init__(self, feature_version, print_feature_warning) (member function) - Initialize PEFeatureExtractor class.
    • feature_version (arg) - EMBER feature version
    • print_feature_warning (arg) - Whether to print warnings or not
  • raw_features(self, bytez, lief_binary) (member function) - Calculate sha256 hash and all raw features from the PE file.
    • bytez (arg) - PE file binary data
    • lief_binary (arg) - Lief parsing of PE file binaries
  • process_raw_features(self, raw_obj) (member function) - Process raw features and concatenate the results in a single, one dimensional, array.
    • raw_obj (arg) - Dictionary of raw features
  • feature_vector(self, bytez) (member function) - Extract raw features and then process them to get the final feature vector.
    • bytez (arg) - PE file binary data

Back to top

⚠️ **GitHub.com Fallback** ⚠️