Modifiers - Titousensei/sisyphus GitHub Wiki
Users can create custom functions (a.k.a. "UDF") by extending the Modifier (abstract) base class.
Typically, a Modifier will generate one output value from one or more input values. The constructor parameters of most modifiers is in the form (String out_value, String... in_values) because of the varargs. In some cases (string split, for instance), in_value is first and out_values is last, for the same reason.
When implementing a Modifier, you only need to implement two things:
- Constructor, to declare the schema: super(in_cols, out_cols);
- public void compute(String[] input, String[] result);
The length of String[] input will be exactly the same as the length of in_cols, and the length of String[] result will be exactly the same as the length of out_cols. Their values will be in the same order as their corresponding schema declaration.
Here's a few useful Modifier already provided by Sisyphus:
- KeyMapIncrement: increment the value of a KeyMap entry by a number
- KeyMapGetter: read a value of a KeyMap into a column
- KeyMapSetter: sets a value of a KeyMap to an int (example: to reset the counter)
- KeyDeleter: delete an entry in a Key (can't be used if this key is an input)
- ColumnCopy, ColumnSwap: copy or swap the value of one column into another
- ColumnSet: sets the value of a column
- ColumnsHashLong: compute a single hash value for one or more columns combined (useful with Keys)
- ColumnTrim: output column = input removed leading and trailing spaces
- ColumnToLower: output column = input to lower case
- ColumnRegex: match a regex to one input column, and write the groups to multiple output columns