Loading data: ratings, ranks and partial orders - TAPeri/pl-toolbox GitHub Wiki

#Datasets: basic concepts A dataset to build a preference model needs to contain two elements: a set of objects and the relation or order among them.

##Objects

  • In PLT, all objects have to be represented by the same list of features or attributes and be placed in the object file.

  • Each line of the file contains the features of one object separated by a single character (comma by default).

  • (Optional: the first line of the file can contain the name of the features)

  • (Optional: the first feature of each line can be used as object ID. ID values need to be unique integers)

  • Features that contain only numbers in integer (e.g. 1), floating point (e.g. 0.01) or scientific (e.g 1e-10) format are interpreted as numeric.

  • Features with at least one value that is not numeric will be treated as nominal.

  • The preprocessing tab can be later used to transform numeric features into nominal if needed.

##Ranks and ratings: total order

  • When the available order among objects is total (i.e. the relation between any pair of objects is known) and given as a numeric value assigned to each object, this value can be included as the last feature in the object file.
  • Only numeric values in integer (e.g. 1), floating point (e.g. 0.01) or scientific (e.g 1e-10) format are permitted.

##Pairwise preferences or ranked subset: partial order

  • When the available order among objects is partial (i.e. only the relation between some pairs of objects is known) this information should be included in a separate order file.
  • Each line of the order file contains a list of object IDs, sorted from higher to lower level of preference. Note that when the object file does not contain object IDs, the line number is used as ID (starting at 0 and excluding the optional labels line).

##Example datasets