Modifiers - Titousensei/sisyphus GitHub Wiki

Users can create custom functions (a.k.a. "UDF") by extending the Modifier (abstract) base class.

Typically, a Modifier will generate one output value from one or more input values. The constructor parameters of most modifiers is in the form (String out_value, String... in_values) because of the varargs. In some cases (string split, for instance), in_value is first and out_values is last, for the same reason.

When implementing a Modifier, you only need to implement two things:

  • Constructor, to declare the schema: super(in_cols, out_cols);
  • public void compute(String[] input, String[] result);

The length of String[] input will be exactly the same as the length of in_cols, and the length of String[] result will be exactly the same as the length of out_cols. Their values will be in the same order as their corresponding schema declaration.

Here's a few useful Modifier already provided by Sisyphus:

  • KeyMapIncrement: increment the value of a KeyMap entry by a number
  • KeyMapGetter: read a value of a KeyMap into a column
  • KeyMapSetter: sets a value of a KeyMap to an int (example: to reset the counter)
  • KeyDeleter: delete an entry in a Key (can't be used if this key is an input)
  • ColumnCopy, ColumnSwap: copy or swap the value of one column into another
  • ColumnSet: sets the value of a column
  • ColumnsHashLong: compute a single hash value for one or more columns combined (useful with Keys)
  • ColumnTrim: output column = input removed leading and trailing spaces
  • ColumnToLower: output column = input to lower case
  • ColumnRegex: match a regex to one input column, and write the groups to multiple output columns

Previous: Keys - Next: Pusher