7 Library API - adaa-polsl/RuleKit GitHub Wiki

RuleKit was implemented in Java 1.8 programming language on the basis of RapidMiner API. Therefore, it extends the Rapidminer class hierarchy. The main classes provided by our library are ExpertRuleGenerator which inherits from RapidMiner AbstractLearner and RulePerformanceEvaluator derived from AbstractPerformanceEvaluator.

The JavaDoc documentation for RuleKit can be found here

7.1. Running an experiment

In the following example we present employing RuleKit API for performing classification analysis on deals dataset, which concerns a problem of predicting whether a person making a purchase will be a future customer. The set is divided into separate training (deals-train.arff) and testing (deals-test.arff) subsets.

The analysis is preceded by importing all neccessary RuleKit and Rapidminer packages:

import adaa.analytics.rules.logic.quality.ClassificationMeasure;
import adaa.analytics.rules.operator.ExpertRuleGenerator;
import adaa.analytics.rules.operator.RulePerformanceEvaluator;
import adaa.analytics.rules.utils.RapidMiner5;

import com.rapidminer.RapidMiner;
import com.rapidminer.example.Attributes;
import com.rapidminer.operator.IOContainer;
import com.rapidminer.operator.IOObject;
import com.rapidminer.operator.Model;
import com.rapidminer.operator.ModelApplier;
import com.rapidminer.operator.OperatorCreationException;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.operator.learner.AbstractLearner;
import com.rapidminer.operator.performance.AbstractPerformanceEvaluator;
import com.rapidminer.operator.performance.PerformanceVector;
import com.rapidminer.operator.preprocessing.filter.ChangeAttributeRole;
import com.rapidminer.tools.OperatorService;
import com.rapidminer5.operator.io.ArffExampleSource;

The first step is to link operators in the Rapidminer process:

RapidMiner.init();
			
// create all operators
ArffExampleSource trainArff = RapidMiner5.createOperator(ArffExampleSource.class);
ArffExampleSource testArff = RapidMiner5.createOperator(ArffExampleSource.class);

ChangeAttributeRole trainRoleSetter = (ChangeAttributeRole)OperatorService.createOperator(ChangeAttributeRole.class);
ChangeAttributeRole testRoleSetter = (ChangeAttributeRole)OperatorService.createOperator(ChangeAttributeRole.class);

AbstractLearner ruleGenerator = RapidMiner5.createOperator(ExpertRuleGenerator.class);
ModelApplier applier = OperatorService.createOperator(ModelApplier.class);
AbstractPerformanceEvaluator evaluator = RapidMiner5.createOperator(RulePerformanceEvaluator.class);

// configure process workflow
com.rapidminer.Process process = new com.rapidminer.Process();
process.getRootOperator().getSubprocess(0).addOperator(trainArff);
process.getRootOperator().getSubprocess(0).addOperator(testArff);
process.getRootOperator().getSubprocess(0).addOperator(trainRoleSetter);
process.getRootOperator().getSubprocess(0).addOperator(testRoleSetter);
process.getRootOperator().getSubprocess(0).addOperator(ruleGenerator);
process.getRootOperator().getSubprocess(0).addOperator(applier);
process.getRootOperator().getSubprocess(0).addOperator(evaluator);

// training set is passed to the role setter and then to the rule generator
trainArff.getOutputPorts().getPortByName("output").connectTo(trainRoleSetter.getInputPorts().getPortByName("example set input"));	
trainRoleSetter.getOutputPorts().getPortByName("example set output").connectTo(ruleGenerator.getInputPorts().getPortByName("training set"));

// testing set is passed to the role setter and then to the model applier as unlabelled data
testArff.getOutputPorts().getPortByName("output").connectTo(testRoleSetter.getInputPorts().getPortByName("example set input"));	
testRoleSetter.getOutputPorts().getPortByName("example set output").connectTo(applier.getInputPorts().getPortByName("unlabelled data"));

// trained model is applied on unlabelled data
ruleGenerator.getOutputPorts().getPortByName("model").connectTo(applier.getInputPorts().getPortByName("model"));

// labelled data together are used for performance evaluation 
applier.getOutputPorts().getPortByName("labelled data").connectTo(
		evaluator.getInputPorts().getPortByName("labelled data"));

// model characteristics are also passed to the evaluator
ruleGenerator.getOutputPorts().getPortByName("estimated performance").connectTo(
		evaluator.getInputPorts().getPortByName("performance"));

// return model and performance from the process
evaluator.getOutputPorts().getPortByName("performance").connectTo(
		process.getRootOperator().getSubprocess(0).getInnerSinks().getPortByIndex(0));
applier.getOutputPorts().getPortByName("model").connectTo(
		process.getRootOperator().getSubprocess(0).getInnerSinks().getPortByIndex(1));

After that the operator parameters are set:

// set names of the input files
trainArff.setParameter(ArffExampleSource.PARAMETER_DATA_FILE, "../data/deals/deals-train.arff");
testArff.setParameter(ArffExampleSource.PARAMETER_DATA_FILE, "../data/deals/deals-test.arff");

// use "Future Customer" as the label attribute
trainRoleSetter.setParameter(trainRoleSetter.PARAMETER_NAME, "Future Customer");
trainRoleSetter.setParameter(trainRoleSetter.PARAMETER_TARGET_ROLE, Attributes.LABEL_NAME); 	
testRoleSetter.setParameter(testRoleSetter.PARAMETER_NAME, "Future Customer");
testRoleSetter.setParameter(testRoleSetter.PARAMETER_TARGET_ROLE, Attributes.LABEL_NAME);

// configure rule induction algorithm
ruleGenerator.setParameter(ExpertRuleGenerator.PARAMETER_MIN_RULE_COVERED, "8");
ruleGenerator.setParameter(ExpertRuleGenerator.PARAMETER_INDUCTION_MEASURE, ClassificationMeasure.getName(ClassificationMeasure.BinaryEntropy));
ruleGenerator.setParameter(ExpertRuleGenerator.PARAMETER_PRUNING_MEASURE, ClassificationMeasure.getName(ClassificationMeasure.UserDefined));
ruleGenerator.setParameter(ExpertRuleGenerator.PARAMETER_USER_PRUNING_EQUATION, "2 * p / n");
ruleGenerator.setParameter(ExpertRuleGenerator.PARAMETER_VOTING_MEASURE, ClassificationMeasure.getName(ClassificationMeasure.C2));

Finally, the process is executed and its results are collected:

IOContainer out = process.run();
IOObject[] objs = out.getIOObjects();

PerformanceVector performance = (PerformanceVector)objs[0];	
Model model = (Model)objs[1];

System.out.print(performance);
System.out.print(model);

The entire Java file can be found here.

7.2. Developing a new algorithm

The base class of all rule induction algorithms which work according to separate-and-conquer heuristic is AbstractSeparateAndConquer class contained in adaa.analytics.rules.logic.induction package.In the package there are three non-abstract classes derived from it which allow generation of classification, regression, and survival rules:

  • ClassificationSnC,
  • RegressionSnC,
  • SurvivalSnC.

These classes represent general algorithm pipelines and require additional objects describing how to grow and prune a single rule. These are contained in:

  • ClassificationFinder,
  • RegressionFinder,
  • SurvivalFinder,

classes, all derived from AbstractFinder.

In order to develop a new induction algorithm, one can derive directly from AbstractSeparateAndConquer class or use existing separate and conquer scheme and provide own finder module derived from AbstractFinder (or use both). All aforementioned classes have variants which take into account user's knowledge:

  • ClassificationExpertSnC,
  • RegressionExpertSnC,
  • SurvivalExpertSnC.
  • ClassificationExpertFinder,
  • RegressionExpertFinder,
  • SurvivalExpertFinder.

They can be derived when user-guided induction is required (note, that if none is specified, they just invoke base class members).

After implementing own induction algorithm, one has to integrate it into adaa.analytics.rules.operator.ExpertRuleGenerator class. In learn() method, there is a piece of code reponsible for creating proper algorithm variants (classification/regression/survival) depending on the type of the input data:

if (exampleSet.getAttributes().findRoleBySpecialName(SurvivalRule.SURVIVAL_TIME_ROLE) != null) {
    // survival problem
    params.setInductionMeasure(new LogRank());
    params.setPruningMeasure(new LogRank());
    params.setVotingMeasure(new LogRank());
    finder = new SurvivalLogRankExpertFinder(params);
    snc = new SurvivalLogRankExpertSnC((SurvivalLogRankExpertFinder)finder, params, knowledge);
} else if (exampleSet.getAttributes().getLabel().isNominal()) {
    // expert mode in classification problems
    finder = new ClassificationExpertFinder(params, knowledge);
    snc = new ClassificationExpertSnC((ClassificationExpertFinder)finder, params, knowledge);
} else {
    // expert mode in regression problems
    finder = new RegressionExpertFinder(params);
    snc = new RegressionExpertSnC((RegressionExpertFinder)finder, params, knowledge);
}

This fragment can be modified by adding newly-implemented algorithm (e.g., on the basis of the value of some parameter of the operator).