generators.py - cmikke97/Automatic-Malware-Signature-Generation GitHub Wiki
-
from multiprocessing import cpu_count- used to get the number of CPUs in the system - multiprocessing documentation
-
from torch.utils import data- we need it for the Dataloader which is at the heart of PyTorch data loading utility - torch.utils.data documentation
from .dataset import Dataset
GeneratorFactory (class) - Generator factory class.
-
__init__(self, ds_root, batch_size, mode, num_workers, n_samples, use_malicious_labels, use_count_labels, use_tag_labels, return_shas, shuffle)(member function) - Initialize generator factory class.-
ds_root(arg) - Path of the directory where to find the pre-processed dataset (containing .dat files) -
batch_size(arg) - How many samples per batch to load (default: None -> 1024) -
mode(arg) - Mode of use of the dataset object (may be 'train', 'validation' or 'test') (default: 'train') -
num_workers(arg) - How many subprocesses to use for data loading by the Dataloader (default: max_workers) -
n_samples(arg) - Number of samples to consider (used just to access the right pre-processed files) (default: None -> all) -
use_malicious_labels(arg) - Whether to return the malicious label for the data points or not (default: False) -
use_count_labels(arg) - Whether to return the counts for the data points or not (default: False) -
use_tag_labels(arg) - Whether to return the tags for the data points or not (default: False) -
return_shas(arg) - Whether to return the sha256 of the data points or not (default: False) -
shuffle(arg) - Set to True to have the data reshuffled at every epoch (default: None -> if mode is 'train' then shuffle is set to True, otherwise it is set to False)
-
-
__call__(self)(member function) - Generator-factory call method.
get_generator(ds_root, batch_size, mode, num_workers, n_samples, use_malicious_labels, use_count_labels, use_count_labels, use_tag_labels, return_shas, shuffle) (function) - Get generator based on the provided arguments.
-
ds_root(arg) - Path of the directory where to find the pre-processed dataset (containing .dat files) -
batch_size(arg) - How many samples per batch to load (default: 8192) -
mode(arg) - Mode of use of the dataset object (may be 'train', 'validation' or 'test') (default: 'train') -
num_workers(arg) - How many subprocesses to use for data loading by the Dataloader (if None -> set to current system cpu count) (default: None) -
n_samples(arg) - Number of samples to consider (used just to access the right pre-processed files) (default: None -> all) -
use_malicious_labels(arg) - Whether to return the malicious label for the data points or not (default: False) -
use_count_labels(arg) - Whether to return the counts for the data points or not (default: False) -
use_tag_labels(arg) - Whether to return the tags for the data points or not (default: False) -
return_shas(arg) - Whether to return the sha256 of the data points or not (default: False) -
shuffle(arg) - Set to True to have the data reshuffled at every epoch (default: None -> if mode is 'train' then shuffle is set to True, otherwise it is set to False)