Skip to content
Richard Preen edited this page Nov 1, 2022 · 23 revisions

Welcome to the XCSF wiki!

Please feel free to help improve these pages.


Overview

XCSF is rule-based and maintains a population of classifiers where each classifier $cl$ consists of:

  • a condition component $cl.C$ that determines whether the rule matches input $\vec{x}$
  • an action component $cl.A$ that selects an action $a$ to be performed for a given $\vec{x}$
  • a prediction component $cl.P$ that computes the expected payoff for performing $a$ upon receipt of $\vec{x}$

XCSF thus generates rules of the general form:

IF matches $\leftarrow cl.C(\vec{x})$ THEN perform action $a \leftarrow cl.A(\vec{x})$ and EXPECT payoff $p \leftarrow cl.P(\vec{x})$.

For each step within a learning trial, XCSF constructs a match set $[M]$ composed of classifiers in the population set $[P]$ whose $cl.C$ matches $\vec{x}$. If $[M]$ contains fewer than $\theta_{\text{mna}}$ actions, a covering mechanism generates classifiers with matching $cl.C$ and random $cl.A$.

For each possible action $a_k$ in $[M]$, XCSF estimates the expected payoff by computing the fitness-weighted average prediction $P(a_k)$. That is, for each action $a_k$ and classifier prediction $p_j$ in $[M]$, the system prediction $P_k=\sum_j F_j p_j / \sum_j F_j$.

A system action is then randomly or probabilistically selected during exploration, and the highest payoff action $P_k$ used during exploitation. Classifiers in $[M]$ advocating the chosen action are subsequently used to construct an action set $[A]$. The action is then performed and a scalar reward $r \in \mathbb{R}$ received, along with the next sensory input.

Upon reaching a terminal state within the environment (as is always the case in single-step problems), each classifier $cl_j \in [A]$ has its experience incremented and fitness, error, and set size $as$ updated using the Widrow-Hoff delta rule with learning rate $\beta \in$ [0,1] as follows.

  • Error: $\epsilon_j \leftarrow \epsilon_j + \beta (|r - p_j| - \epsilon_j)$
  • Accuracy: $\kappa_j =$
    • 1 if $\epsilon_j < \epsilon_0$
    • $\alpha (\epsilon_j / \epsilon_0)^{-\nu}$ otherwise.
      With target error threshold $\epsilon_0$ and accuracy offset $\alpha \in$ [0,1], and slope $\nu$.
  • Relative accuracy: $\kappa_j' = (\kappa_j \cdot num_j) / \sum_j \kappa_j \cdot num_j$
    Where classifiers are initialised with numerosity $num = 1$.
  • Fitness: $F_j \leftarrow F_j + \beta(\kappa_j' - F_j)$
  • Set size estimate: $as_j \leftarrow as_j + \beta(|[A]| - as_j)$

Thereafter, $cl.C$, $cl.A$, and $cl.P$ are updated according to the representation adopted.

The evolutionary algorithm (EA) is applied to classifiers within $[A]$ if the average time since its previous execution exceeds $\theta_{\text{EA}}$. Upon invocation, the time stamp $ts$ of each classifier is updated. Two parents are chosen based on their fitness via roulette wheel selection (or tournament) and $\lambda$ number of offspring are created via crossover with probability $\chi$ and mutation with probability $\mu$.

Offspring parameters are initialised by setting the error and fitness to the parental average, and discounted by reduction parameters for error $\epsilon_R$ and fitness $F_R$. Offspring $exp$ and $num$ are set to one. If subsumption is enabled and the offspring are subsumed by either parent with sufficient accuracy $(\epsilon_j < \epsilon_0)$ and experience $(exp_j > \theta_{\text{sub}})$ it is not included in $[P]$; instead the parents' $num$ is incremented.

The resulting offspring are added to $[P]$ and the maximum (micro-classifier) population size $N$ is enforced by removing classifiers selected via roulette with the deletion vote.

The deletion vote is set proportionally to the set size estimate $as$. However, the vote is increased by a factor $\overline{F}/{F_j}$ for classifiers that are sufficiently experienced $(exp_j < \theta_{\text{del}})$ and with small fitness $(F_j < \delta\overline{F}$) where $\overline{F}$ is the $[P]$ mean fitness, and typically $\delta = 0.1$. Classifiers selected for deletion have their (micro-classifier) $num$ decremented, and in the event that $num < 1$, are removed from $[P]$.

In multi-step problems, the previous action set $[A]_{-1}$ is updated after each step using a $\gamma \in$ [0,1] discounted reward (similar to Q-learning) and the EA may be run therein.

Schematic

A schematic illustration of XCSF is shown above. For supervised learning, a single (dummy) action can be used such that $[A] = [M]$ and the system prediction made directly accessible to the environment.

A number of interacting pressures have been identified. A set pressure provides more frequent reproduction opportunities for more general rules. In opposition is a fitness pressure which represses the reproduction of inaccurate and over-general rules. Many forms of $cl.C$, $cl.A$, and $cl.P$ have been used for classifier knowledge representation since the original ternary conditions, integer actions, and scalar predictions. Some of these are implemented here.

Related Literature:


Features

This project implements both supervised learning via the updating of match set predictions directly and reinforcement learning via the updating of action set predictions with an environment reward. All mutation rates self-adapted.

Python Library

See example notebooks and Python Library API.

Conditions

  • Always matching dummy conditions
  • Hyperrectangles (center-spread and unordered bound)
  • Hyperellipsoids (center-spread)
  • Neural networks
  • GP trees
  • Dynamical GP graphs
  • Ternary bitstrings (binarises inputs)
  • Both conditions and actions in single dynamical GP graphs
  • Both conditions and actions in single neural networks

Actions

  • Integers
  • Neural networks

Predictions

  • Piece-wise constant
  • Linear least squares
  • Quadratic least squares
  • Linear recursive least squares
  • Quadratic recursive least squares
  • Stochastic gradient descent neural networks