features.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
import hashlib- common interface to many different secure hash and message digest algorithms - hashlib documentation -
import re- provides regular expression matching operations - re documentation
-
import lief- cross platform library which able to parse, modify and abstract ELF, PE and MachO formats - lief documentation -
import numpy as np- the fundamental package for scientific computing with Python - numpy documentation -
from logzero import logger- robust and effective logging for Python - logzero documentation -
from sklearn.feature_extraction import FeatureHasher- implements feature hashing, aka the hashing trick - sklearn.feature_extraction.FeatureHasher documentation
FeatureType (class) - Base class from which each feature type may inherit.
-
__repr__(self)(member function) - Get unambiguous object representation in string format. -
raw_features(self, bytez, lief_binary)(member function) - Generate a JSON-able representation of the file (raw features).-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Generate a feature vector from the raw features.-
raw_obj(arg) - Dictionary of raw features
-
-
feature_vector(self, bytez, lief_binary)(member function) - Directly calculate the feature vector from the sample itself. This should only be implemented differently if there are significant speedups to be gained from combining the two functions.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
ByteHistogram (class) - Byte histogram (count + non-normalized) over the entire binary file.
-
__init__(self)(member function) - Initialize ByteHistogram class. -
raw_features(self, bytez, lief_binary)(member function) - Generate raw byte histogram.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw byte histogram (normalizing).-
raw_obj(arg) - Byte histogram raw features
-
ByteEntropyHistogram (class) - 2d byte/entropy histogram based loosely on (Saxe and Berlin, 2015). This roughly approximates the joint probability of byte value and local entropy. See Section 2.1.1 in https://arxiv.org/pdf/1508.03096.pdf for more info.
-
__init__(self)(member function) - Initialize ByteEntropyHistogram class. -
_entropy_bin_counts(self, block)(member function) - Get bin frequencies (counts) and entropy bin index (Hbin).-
block(arg) - Ndarray containing a piece (block) of the PE file binary data
-
-
raw_features(self, bytez, lief_binary)(member function) - Generate raw entropy byte histogram.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw byte histogram (normalizing).-
raw_obj(arg) - Byte entropy histogram raw features
-
SectionInfo (class) - Information about section names, sizes and entropy. Uses hashing trick to summarize all this section info into a feature vector.
-
__init__(self)(member function) - Initialize SectionInfo class. -
_properties(s)(static method) - Get section characteristics list.-
s(arg) - Lief binary section
-
-
raw_features(self, bytez, lief_binary)(member function) - Generate raw section info.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw section info (hashing trick and stacking).-
raw_obj(arg) - Section info raw features
-
ImportsInfo (class) - Information about imported libraries and functions from the import address table. Note that the total number of imported functions is contained in GeneralFileInfo.
-
__init__(self)(member function) - Initialize ImportsInfo class. -
raw_features(self, bytez, lief_binary)(member function) - Generate raw imports info.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw imports info (hashing trick and stacking).-
raw_obj(arg) - Imports info raw features
-
ExportsInfo (class) - Information about exported functions. Note that the total number of exported functions is contained in GeneralFileInfo.
-
__init__(self)(member function) - Initialize ExportsInfo class. -
raw_features(self, bytez, lief_binary)(member function) - Generate raw Exports info.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw exports info (hashing trick and stacking).-
raw_obj(arg) - Exports info raw features
-
GeneralFileInfo (class) - General information about the file.
-
__init__(self)(member function) - Initialize GeneralFileInfo class. -
raw_features(self, bytez, lief_binary)(member function) - Generate raw general info.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw general info (stacking).-
raw_obj(arg) - General file info raw features
-
HeaderFileInfo (class) - Machine, architecture, OS, linker and other information extracted from header.
-
__init__(self)(member function) - Initialize HeaderFileInfo class. -
raw_features(self, bytez, lief_binary)(member function) - Generate raw header info.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw header info (hashing trick and stacking).-
raw_obj(arg) - Header file info raw features
-
StringExtractor (class) - Extracts strings from raw byte stream.
-
__init__(self)(member function) - Initialize StringExtractor class. -
raw_features(self, bytez, lief_binary)(member function) - Extract raw string info.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw string info (stacking).-
raw_obj(arg) - String extractor raw features
-
DataDirectories (class) - Extracts size and virtual address of the first 15 data directories.
-
__init__(self)(member function) - Initialize DataDirectories class. -
raw_features(self, bytez, lief_binary)(member function) - Extract raw data directories info.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw data directories info.-
raw_obj(arg) - Data dictionaries raw features
-
PEFeatureExtractor (class) - Extract useful features from a PE file, and return as a vector of fixed size.
-
__init__(self, feature_version, print_feature_warning)(member function) - Initialize PEFeatureExtractor class.-
feature_version(arg) - EMBER feature version -
print_feature_warning(arg) - Whether to print warnings or not
-
-
raw_features(self, bytez, lief_binary)(member function) - Calculate sha256 hash and all raw features from the PE file.-
bytez(arg) - PE file binary data -
lief_binary(arg) - Lief parsing of PE file binaries
-
-
process_raw_features(self, raw_obj)(member function) - Process raw features and concatenate the results in a single, one dimensional, array.-
raw_obj(arg) - Dictionary of raw features
-
-
feature_vector(self, bytez)(member function) - Extract raw features and then process them to get the final feature vector.-
bytez(arg) - PE file binary data
-