Learning Algorithm - SmartDataAnalytics/DL-Learner GitHub Wiki

The implemented algorithms vary from very simple (and usually inappropriate) algorithms to sophisticated ones.

Brute Force

This algorithm tests all class expressions up to a specified length, which you can set using e.g. bruteForce.maxlength = 7.

Random Guesser

This algorithm randomly generates class expressions. To do this, it creates trees, which can be mapped to class expressions. Its main parameter is the number of created trees, which you can set using e.g. random.numberOfTrees = 5.

Genetic Programming (GP)

GP is a well-known general problem solution method, which can be adapted to class expression learning. The adaption is straightforward. In DL-Learner, however, an additional genetic refinement operator was implemented, which has shown to improve GP performance\cite{hybrid_gp}. Some options are:

  • number of individuals: The individual count is the size of each generation in a GP algorithm. It is one of the most crucial parameters. Setting it to a higher value usually means investing more computational resource for increasing the likelihood that a solution will be found. Usage: gp.numberOfIndividuals = 100.
  • refinement probability: This is used to specify how likely the usage of the genetic refinement operator should be, e.g. gp.refinementProbability = 0.6 means that it will be selected 60% of the time. The GP algorithm has 15 more options documented in \verb|doc/configOptions.html|.

Refinement

This is a top down refinement operator approach, which is described in \cite{alc_learning_algorithm} and based on insights in \cite{property_analysis}. Some options include:

  • target language: The standard target language of this algorithm is ALCQ(D). However, you can change the language, i.e. you can exclude the universal quantification (∀) constructor by using refinement.useAllConstructor = false. Similar options exist for existential quantification (∃), negation (¬), cardinality restrictions (e.g. ≥ n), and boolean datatypes.

  • maximum execution time: If there is no perfect solution of a given problem, the algorithm can potentially run forever (in practice it will run out of memory). It is therefore often interesting to limit the execution time. You can use e.g. refinement.maxExecutionTimeInSeconds = 100 to say that the algorithm should run for at most 100 seconds. Often, it will run slightly longer than the maximum execution time since it waits for the next internal loop of the algorithm to stop gracefully.

    The algorithm supports a range of further options. For instance, one can specify which classes and properties must not occur in resulting class expressions.

Refexamples (OCEL)

The previous algorithm has been extended to make more sophisticated use of background knowledge~\cite{mlj} and therefore run more efficiently on many problems. It also supports double datatypes and hasValue restrictions (which again can be turned on or off as desired). It also includes explicit noise handling through the noisePercentage option. This is currently the default and recommend algorithm for learning from positive and negative examples. More than 30 options can be set to control its behaviour. However, apart from the target language the most important setting is noise, which should be optimised for the given problem.

Class Expression Learning for Ontology Engineering (CELOE)

Currently CELOE is the best class learning algorithm available within DL-Learner. It uses the same refinement operator as OCEL, but a completely different heuristics. Furthermore, it guarantees that the returned class expressions are minimal in the sense that one cannot remove parts of them without getting an inequivalent expression. Furthermore, it makes use of existing background knowledge in coverage checks. Statistical methods are used to improve the efficiency of the algorithm such that it scales to large knowledge bases. While it was originally designed for ontology engineering, it can also be used for other learning problems and might even be superior to the other algorithms in many cases (not well-tested yet). Note that many configuration options of OCEL were dropped for the sake of simplicity, but may be introduced if needed.

EL Tree Learner (ELTL)

This algorithm has EL as its target language, i.e. is specialized for this relatively simple language. There are two algorithms:

  • el learns EL concepts
  • disjunctiveEL learns disjunctions of EL concepts.