6 User guided induction - adaa-polsl/RuleKit GitHub Wiki

RuleKit suite allows user-guided rule induction which follows the scheme introduced by the GuideR algorithm (Sikora et al, 2019).

6.1. Defining user's knowledge

The user's knowledge is specified by the following parameters:

<parameter_set name="paramset_1">
  	...
  	<param name="use_expert">true</param>
  	<param name="extend_using_preferred">...</param>
  	<param name="extend_using_automatic">...</param>
  	<param name="induce_using_preferred">...</param>
  	<param name="induce_using_automatic">...</param>
  	<param name="preferred_conditions_per_rule">...</param>
  	<param name="preferred_attributes_per_rule>...</param>
   	<param name="consider_other_classes">...</param>
  	<param name ="expert_rules">
		<entry name="rule-0">...</entry>
		<entry name="rule-1">...</entry>
		...
  	</param>
  	<param name ="expert_preferred_conditions">
		<entry name="preferred-condition-0">...</entry>
		<entry name="preferred-condition-1">...</entry>
		...
  	</param>
  	<param name ="expert_forbidden_conditions">
		<entry name="forbidden-condition-0">...</entry>
		<entry name="forbidden-condition-1">...</entry>
		...
  	</param>
</parameter_set>

Parameter meaning (symbols from the GuideR paper are given in parentheses):

  • use_expert - boolean indicating whether user's knowledge should be used,
  • expert_rules(R) - set of initial rules,
  • expert_preferred_conditions(C, A) - multiset of preferred conditions (used also for specifying preferred attributes by using special value Any),
  • expert_forbidden_conditions(C, A) - set of forbidden conditions (used also for specifying forbidden attributes by using special valye Any),
  • extend_using_preferredpref)/extend_using_automaticauto) - boolean indicating whether initial rules should be extended with a use of preferred/automatic conditions and attributes,
  • induce_using_preferredpref)/induce_using_automaticauto) - boolean indicating whether new rules should be induced with a use of preferred/automatic conditions and attributes,
  • preferred_conditions_per_rule(KC)/preferred_attributes_per_rule(KA) - maximum number of preferred conditions/attributes per rule,
  • consider_other_classes - boolean indicating whether automatic induction should be performed for classes for which no user's knowledge has been defined (classification only).

Let us consider the following user's knowledge (superscripts next to C, A, C, and A symbols indicate class labels):

  • R = { (IF gimpuls < 750 THEN class = 0), (IF gimpuls >= 750 THEN class = 1)},
  • C0 = { (seismic = a) },
  • C1 = { (seismic = bseismoacoustic = c)5 },
  • A1 = { gimpulsinf },
  • C0 = { seismoacoustic = b },
  • A1 = { ghazard }.

The XML definition of this knowledge is presented below.

<param name ="expert_rules">
	<entry name="rule-1">IF [[gimpuls = (-inf, 750)]] THEN class = {0}</entry>
	<entry name="rule-2">IF [[gimpuls = &lt;750, inf)]] THEN class = {1}</entry>
</param>
<param name ="expert_preferred_conditions">
	<entry name="preferred-condition-1">1: IF [[seismic = {a}]] THEN class = {0}</entry>
	<entry name="preferred-condition-2">5: IF [[seismic = {b} AND seismoacoustic = {c}]] THEN class = {1}</entry>
	<entry name="preferred-attribute-1">inf: IF [[gimpuls = Any]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
	<entry name="forbidden-condition-1">IF [[seismoacoustic = b]] THEN class = {0}</entry>
	<entry name="forbidden-attribute-1">IF [[ghazard = Any]] THEN class = {1}</entry>
</param>

Please note several remarks:

  • Inifinity is represented as inf string (rule-1, preferred-attribute-1 ).
  • Conditions based on continuous attributes are represented as intervals. Left-closed intervals are specified using &lt; symbol as < is reserved by XML syntax (rule-2).
  • Multiplicity is specified before multiset element (preferred-condition-1 and preferred-condition-2),
  • Preferred/forbidden attributes are defined as conditions with special value Any (preferred-attribute-1, forbidden-attribute-1).

User's guided induction may also be executed from RapidMiner plugin and R package. In the former case, convinent wizards are provided for specifying expert rules, preferred conditions/attibutes, and forbidden conditions/attributes (Figure 6.1). However, the traditional, parameter-based method of defining expert's knowledge may also be used.

Figure 6.1. RapidMiner wizard for specifying user's rules, preferred conditions/attributes, and forbidden conditions/attributes.

6.2. Examples from GuideR paper

The datasets investigated in GuideR study are:

  • classification: seismic-bumps - forecasting high energy seismic bumps in coal mines,
  • regression: methane - predicting methane concentration in a coal mine,
  • survival analysis: bmt - analyzing factors contributing to the patients’ survival following bone marrow transplants.

In the following subsections we present all examined guided-induction scenarios with relevant XML parameters. The entire XML experimental files for test cases discussed in the GuideR paper can be found here.

Classification

guided-c1 The model consists of two initial rules:

  • IF gimpuls < 750 THEN class = 0
  • IF gimpuls >= 750 THEN class = 1
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name ="expert_rules">
	<entry name="rule-0">IF [[gimpuls = (-inf, 750)]] THEN class = {0}</entry>
	<entry name="rule-1">IF [[gimpuls = &lt;750, inf)]] THEN class = {1}</entry>
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
</param>	

guided-c2 Attribute gimpuls is used in rules for both classes at least once:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">true</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
	<entry name="preferred-attribute-0">1: IF [[gimpuls = Any]] THEN class = {0}</entry>
	<entry name="preferred-attribute-1">1: IF [[gimpuls = Any]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>

guided-c3 Every rule contains at least two out of gimpuls, genergy, and senergy attributes:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name="preferred_attributes_per_rule">2</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
	<entry name="preferred-attribute-0">inf: IF [[genergy = Any]] THEN class = {0}</entry>
	<entry name="preferred-attribute-1">inf: IF [[senergy = Any]] THEN class = {0}</entry>
	<entry name="preferred-attribute-2">inf: IF [[gimpuls = Any]] THEN class = {0}</entry>
	<entry name="preferred-attribute-3">inf: IF [[genergy = Any]] THEN class = {1}</entry>
	<entry name="preferred-attribute-4">inf: IF [[senergy = Any]] THEN class = {1}</entry>
	<entry name="preferred-attribute-5">inf: IF [[gimpuls = Any]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>	

guided-c4 At least one of seismic, seismoacoustic, and ghazard attributes is used in each rule, with an additional requirement on value sets - class 0 may use values a, b, class 1 may use values b, c, d:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">true</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name="consider_other_classes">false</param>
<param name="preferred_conditions_per_rule">1</param>
<param name ="expert_rules">	
</param>
<param name ="expert_preferred_conditions">
	<entry name="preferred-condition-01">inf: IF [[seismic = {a}]] THEN class = {0}</entry>
	<entry name="preferred-condition-02">inf: IF [[seismic = {b}]] THEN class = {0}</entry>
	<entry name="preferred-condition-03">inf: IF [[seismoacoustic = {a}]] THEN class = {0}</entry>
	<entry name="preferred-condition-04">inf: IF [[seismoacoustic = {b}]] THEN class = {0}</entry>
	<entry name="preferred-condition-05">inf: IF [[ghazard = {a}]] THEN class = {0}</entry>
	<entry name="preferred-condition-06">inf: IF [[ghazard = {b}]] THEN class = {0}</entry>
	
	<entry name="preferred-condition-11">inf: IF [[seismic = {b}]] THEN class = {1}</entry>
	<entry name="preferred-condition-12">inf: IF [[seismic = {c}]] THEN class = {1}</entry>
	<entry name="preferred-condition-13">inf: IF [[seismic = {d}]] THEN class = {1}</entry>
	<entry name="preferred-condition-14">inf: IF [[seismoacoustic = {b}]] THEN class = {1}</entry>
	<entry name="preferred-condition-15">inf: IF [[seismoacoustic = {c}]] THEN class = {1}</entry>
	<entry name="preferred-condition-16">inf: IF [[seismoacoustic = {d}]] THEN class = {1}</entry>
	<entry name="preferred-condition-17">inf: IF [[ghazard = {b}]] THEN class = {1}</entry>
	<entry name="preferred-condition-18">inf: IF [[ghazard = {c}]] THEN class = {1}</entry>
	<entry name="preferred-condition-19">inf: IF [[ghazard = {d}]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>

guided-c5 Attributes gimpuls, goimpuls, ghazard, and seismoacoustic are forbidden:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">true</param>
<param name="consider_other_classes">false</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
	<entry name="forb-attribute-00">1: IF [[seismoacoustic = Any]] THEN class = {0}</entry>
	<entry name="forb-attribute-01">1: IF [[gimpuls = Any]] THEN class = {0}</entry>
	<entry name="forb-attribute-02">1: IF [[goimpuls = Any]] THEN class = {0}</entry>
	<entry name="forb-attribute-03">1: IF [[ghazard = Any]] THEN class = {0}</entry>

	<entry name="forb-attribute-10">1: IF [[seismoacoustic = Any]] THEN class = {1}</entry>
	<entry name="forb-attribute-11">1: IF [[gimpuls = Any]] THEN class = {1}</entry>
	<entry name="forb-attribute-12">1: IF [[goimpuls = Any]] THEN class = {1}</entry>
	<entry name="forb-attribute-13">1: IF [[ghazard = Any]] THEN class = {1}</entry>
</param>

guided-c6 Attributes from nbumps family as well as senergy, maxenergy, and seismic are forbidden: analogous to guided-c5.

Regression

guided-r1 The model contains PD = 0 and PD = 1 conditions, both appearing in three rules:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
	<entry name="preferred-condition-0">3: IF PD = &lt;0.5, inf) THEN MM116_pred = {NaN}</entry>
	<entry name="preferred-condition-1">3: IF PD = (-inf, 0.5) THEN MM116_pred = {NaN}</entry>
</param>	

guided-r2 The conjunction PD = 1 AND MM116 < 1 appears in five rules:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
	<entry name="preferred-condition-0">5: IF PD = &lt;0.5, inf) AND MM116 = (-inf, 1.0) THEN MM116_pred = {NaN}</entry>
</param>

guided-r3 The conjunction PD = 0 AND MM116 > 1 appears in five rules: analogous to guided-r2.

guided-r4 Attributes DMM116, MM116, and PD appear in every rule:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
	<entry name="preferred-attribute-0">inf: IF PD = Any THEN MM116_pred = {NaN}</entry>
	<entry name="preferred-attribute-1">inf: IF MM116 = Any THEN MM116_pred = {NaN}</entry>
	<entry name="preferred-attribute-2">inf: IF DMM116 = Any THEN MM116_pred = {NaN}</entry>
</param>	

Survival analysis

guided-s1 Every rule contains CD34 and does not contain ANCRecovery and PLTRecovery attributes:

<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name="preferred_attributes_per_rule">1</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
	<entry name="attr-preferred-0">inf: IF [CD34kgx10d6 = Any] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_forbidden_conditions">
	<entry name="condition-forbidden-0">IF ANCrecovery = Any THEN survival_status = {NaN}</entry>
	<entry name="condition-forbidden-1">IF PLTrecovery = Any THEN survival_status = {NaN}</entry>
</param>	

guided-s2 The model consists of four initial rules:

  • IF extcGvHD = No AND CD34 < 10 THEN ...
  • IF extcGvHD = No AND CD34 >= 10 THEN ...
  • IF extcGvHD = Yes AND CD34 < 10 THEN ...
  • IF extcGvHD = Yes AND CD34 >= 10 THEN ...
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name ="expert_rules">
	<entry name="rule-0">IF [[CD34kgx10d6 = (-inf, 10.0)]] AND [[extcGvHD = {0}]] THEN survival_status = {NaN}</entry>
	<entry name="rule-1">IF [[extcGvHD = {0}]] AND [[CD34kgx10d6 = &lt;10.0, inf)]] THEN survival_status = {NaN}</entry>
	<entry name="rule-2">IF [[CD34kgx10d6 = (-inf, 10.0)]] AND [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
	<entry name="rule-3">IF [[CD34kgx10d6 = &lt;10.0, inf)]] AND [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
</param>	

guided-s3 Similarly as in the previous case, but CD34 ranges may be altered and rules can be extended with automatic conditions:

<param name="use_expert">true</param>
<param name="extend_using_preferred">true</param>
<param name="extend_using_automatic">true</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name="preferred_attributes_per_rule">1</param>
<param name ="expert_rules">
	<entry name="rule-0">IF [[extcGvHD = {0}]] THEN survival_status = {NaN}</entry>
	<entry name="rule-1">IF [[extcGvHD = {0}]] THEN survival_status = {NaN}</entry>
	<entry name="rule-2">IF [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
	<entry name="rule-3">IF [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_preferred_conditions">
	<entry name="attr-0">4: IF [CD34kgx10d6 = Any]  THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>	

guided-s4 The model consists of two initial rules:

  • IF CD34 < 10 THEN ...
  • IF CD34 >= 10 THEN ...
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name ="expert_rules">
	<entry name="rule-0">IF [[CD34kgx10d6 = (-inf, 10.0)]] THEN survival_status = {NaN}</entry>
	<entry name="rule-1">IF [[CD34kgx10d6 = &lt;10.0, inf)]] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
</param>	
⚠️ **GitHub.com Fallback** ⚠️