Filter Expressions Testing for Train Dataset or Eval Dataset - ShifuML/shifu GitHub Wiki
In Shifu, the filter expression are supported to filter training dataset and eval dataset. The filter expression follows the standard - http://commons.apache.org/proper/commons-jexl/reference/syntax.html. But the expression couldn't be verified until user run some steps - like stats, norm, eval. If the expression format is incorrect, or the variable in expression doesn't exists, it may bring unexpected result. For example, user may find logs like below:
Output(s):
Successfully stored 0 records (2180 bytes) in: "hdfs://.../..."
Counters:
Total records written : 0
Total bytes written : 2180
...
Since shifu-0.12.x, a test command is added to test the filters in training dataset and eval dataset. The command is like
$ shifu test -fitler [EvalSetNames] [-n numOfRecords]- If no
EvalSetNamesis specified, it will test the filter for training dataset - If need to test filters for multi eval set, just specify evalSet names with comma as delimiter -
EvalTest1,EvalTest2,EvalTest3 - By default,
testcommand will test the filter expression against 100 records. If need to test on more records, use -n to change it. *could be used as EvalSetNames. In that way, Shifu will test all possible filters in ModelConfig.json.
- If no
By leveraging the shifu test command, the filter expression could be validated in very early stage.