CSV Format Support in Shifu - ShifuML/shifu GitHub Wiki

By using Shifu, header path and header delimiter should be specified in ModelConfig.json. While for csv format data,Sshifu Supports well.

How to run Shifu using CSV format file

   dataSet" : {
     "source" : "HDFS",
     "dataPath" : "train.csv",
     "dataDelimiter" : "|",
     "headerPath" : "",
     "headerDelimiter" : "",
     ...

After such configuration with empty 'headerPath' setting, header will be parsed from the first line of 'dataPath'. In training and data processing, the first line of 'train.csv' will also be ignored.

In 'eval' step, csv format data are also supported well.

  "evals" : [ {
      "name" : "Eval1",
      "dataSet" : {
         "source" : "HDFS",
         "dataPath" : "test.csv",
         "dataDelimiter" : "|",
         "headerPath" : "",
         "headerDelimiter" : "",
         ...

Please be noticed, such csv format data are supported since Shifu 0.10.0.