fresh_dataset.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
import json- json encoder and decoder - json documentation -
import os- provides a portable way of using operating system dependent functionality - os documentation -
import sys- system-specific parameters and functions - sys documentation
-
import numpy as np- the fundamental package for scientific computing with Python - numpy documentation -
import torch- tensor library like NumPy, with strong GPU support - pytorch documentation -
from logzero import logger- robust and effective logging for Python - logzero documentation -
from torch.utils import data- used to import data.Dataset - torch.utils.data documentation
Dataset (class) - Fresh dataset class.
-
__init__(self, S, X, y, sig_to_label_dict, return_shas)(member function) - Initialize fresh dataset given a set of already initialized tensors (memmaps).-
S(arg) - Already initialized memmap containing the sha256 hashes of samples from the Fresh Dataset -
X(arg) - Already initialized tensor (memmap) containing the features of samples from the Fresh Dataset -
y(arg) - Already initialized tensor (memmap) containing the labels of samples from the Fresh Dataset -
sig_to_label_dict(arg) - Signature-to-label dict -
return_shas(arg) - Whether to return the sha256 of the data points or not (default: False)
-
-
from_file(cls, ds_root, return_shas)(class method) - Open fresh dataset from file and initialize the corresponding Fresh Dataset instance.-
ds_root(arg) - Fresh dataset root directory (where to find .dat files) -
return_shas(arg) - Whether to return the sha256 of the data points or not (default: False)
-
-
__len__(self)(member function) - Get Dataset total length. -
__getitem__(self, index)(member function) - Get item from dataset.-
index(arg) - Index of the item to get
-
-
sig_to_label(self, sig)(member function) - Convert family signature to numerical label.-
sig(arg) - Family signature
-
-
label_to_sig(self, label)(member function) - Convert numerical label to family signature.-
label(arg) - Numerical label
-
-
get_as_tensors(self)(member function) - Get dataset tensors (numpy memmap arrays).