Inputs - Titousensei/sisyphus GitHub Wiki
Inputs are the data that the Pusher iterates on. It populates the current row with the entry from a single input at a time, until all the inputs are processed.
The most common Inputs are text files, gzipped or not, in TSV format:
- InputFile: single file, options to skip a header.
- InputFileGroup: wildcard matching of the filenames, read each file in a serial manner.
- InputBinayFile: single file in binary format, with fixed-length records. Each column can be any number of bytes. Values are presented as decimal or hexadecimal.
Sisyphus can also iterate through the different types of hashtables: InputKey, InputKeyMap, InputKeyBinding, InputKeyDouble
There is also a few special Inputs used for sorting and joining: InputMergeSorted, InputJoinSorted
Custom inputs can be implemented easily by extending a base class:
- InputYielder: the simplest way is to implement an input, where you can use a generator pattern. See examples.InputRange for a class that generate rows with a counter.
- InputCustom: to chain input pre-processing, similar to chaining java Streams. See examples.InputSplitRows for a class that splits merged rows.
- Input: of course, you always have the option to extend the most generic base class.