Preprocessing - nave91/miner GitHub Wiki
Data can be preprocessed according to Menzies
mark | type | nump | wordp | ------------|-------|------|-------|------------------- ? | | | | a column to ignore ------------|-------|------|-------|------------------- dep = | klass | | X | a label to predict + | more | X | | a goal to maximize - | less | X | | a goal to minimize ------------|-------|------|-------|------------------- indep $ | num | X | | non-goal number else | term | | X | non-goal non-number
Example if we consider weather dataset:
#data/weather1.csv outlook, # forecast ?+$temperature, # degrees Farenheit, -$humidity, # % of dewpoint windy, # boolean =play # goal ################################################# sunny ,85 ,90 ,FALSE ,no sunny ,80 ,90 ,TRUE ,no overcast ,83 ,86 ,FALSE ,yes rainy ,70 ,96 ,FALSE ,yes rainy ,68 ,80 ,FALSE ,yes rainy ,65 ,? ,TRUE ,no overcast ,64 ,65 ,TRUE ,yes sunny ,72 ,? ,FALSE ,no sunny ,69 ,70 ,FALSE ,yes rainy ,75 ,80 ,FALSE ,yes sunny ,75 ,70 ,TRUE ,yes overcast ,72 ,90 ,TRUE ,yes overcast ,81 ,75 ,FALSE ,yes rainy ,71 ,90 ,TRUE ,no
The header can be formatted in form of:
outlook, # forecast ?+$temperature, # degrees Farenheit, -$humidity, # % of dewpoint windy, # boolean =play # goal