PipeOp Specifications - mlr-org/mlr3pipelines GitHub Wiki
General rules:
- Inherit from
PipeOpfor general pipeops,PipeOpTaskPreprocfor preprocessing pipeops that have one task input, one task output, and fromPipeOpTaskPreprocSimplefor a subset of these that perform exactly the same operation during training and prediction. - Overwrite the
train_internal()andpredict_internal()functions when inheritingPipeOp. Overwrite thetrain_task()/train_dt()andpredict_task()/predict_dt()as well as possiblyselect_cols()(for..._dt()) functions when inheritingPipeOpTaskPreproc. Overwrite theget_state()/get_state_dt(),transform()/transform_dt()as well as possiblyselect_cols()(for..._dt()) functions when inheriting PipeOpTaskPreprocSimple. - Set the
$inputand$outputtrainandpredictcolumns to the acceptable types for these operations. Do not check input values for types that are already specified in the$inputand$outputtables. Ok:
Bad (because the input typetrain_internal(inputs) { if (inputs$nrow < 1) stop("Input too small")"Task"is already checked by thetrain()function):train_internal(inputs) { assert_task(inputs[1](/mlr-org/mlr3pipelines/wiki/1)) - Inputs in
train_internal()/predict_internal()are always given by-reference, so if any R6 objects are modified, they must be cloned before. This is not the case fortrain_task,train_dt, ... in PipeOpTaskPreproc[Simple]: The PipeOpTaskPreproc[Simple] takes care of cloning so Tasks/data.tables can be modified in-place. - PipeOpTaskPreproc[Simple]
$statemust always be a named list; The machinery in PipeOpTaskPreproc[Simple] adds a few slots:$affected_cols,$intasklayout,$outtasklayout,$dt_columns(only iftrain_task/predict_task/get_state/transformare not overwritten). Therefore, these names are "reserved" and should not be set by the class inheriting byPipeOpTaskPreproc[Simple]. Even thoughPipeOp$statecan be anything, it is recommended to also keep it a named list. - Every change done by the
$train()method must be reflected by the$statevariable. I.e.
must leavepo2 = po1$clone(deep = TRUE) po1$train(input) po2$state = po1$state po1 = po1$clone(deep = TRUE)po1andpo2identical. (The lastclonecall is necessary to mirror effects done bypo2 = po1$clone()) $predict()must be idempotent, i.e.
must leavepo2 = po1$clone(deep = TRUE) po1$predict(input1) po1$predict(input2) po2$predict(input3) po1 = po1$clone(deep = TRUE)po1andpo2identical. (The lastclonecall for the same reason as above.)