symbolic_regression_part4 - morinim/ultra GitHub Wiki

Symbolic regression - Custom Evaluator and Teams

Evolving multiple programs at the same time is great, but my problem requires multiple variables. How should I proceed?

Preliminary note

If you only need multiple variables (without multiple programs) src::search is enough. Stop reading and turn back to wiki / source.

In general, try to use src::search because it directly supports model metrics and validation strategies.

Complex problem

$$ \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix} = \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{nn} \end{bmatrix} \cdot \begin{pmatrix} \boldsymbol{f_1}(x_1,x_2,x_3) \\ \boldsymbol{f_2}(x_1,x_2,x_3) \\ \vdots \\ \boldsymbol{f_n}(x_1,x_2,x_3) \end{pmatrix} $$

The case of multiple variables and multiple programs cannot be supported in a unique way. The user must customize the generic ultra::search class to match his requirements.

Setting up code

A painstaking extension of the previous example is technically viable, but we have a better option.

Instead of a user-defined terminal (c), we can use the predefined ultra::variable terminal. Variables are convenient placeholders filled at the beginning of program/individual execution with user-provided values.

In the main() function:

prob.sset.insert<c>();

has been replaced with:

prob.sset.insert<vita::variable>(0, "x1");
prob.sset.insert<vita::variable>(1, "x2");
prob.sset.insert<vita::variable>(2, "x3");

The constructor of a variable takes two parameters:

  1. an index used to retrieve the value of the variable at execution time (e.g. 0). More about this point follows below;
  2. the name of the variable (e.g. "x1");

A training case/example can be represented with a simple structure:

example(const std::vector<double> &ex_a, const ultra::matrix<double> &ex_b,
        const std::vector<double> &ex_x)
  : a(ex_a), b(ex_b)
{
  std::ranges::copy(ex_x, std::back_inserter(x));
}

std::vector<double>         a;
ultra::matrix<double>       b;
std::vector<ultra::value_t> x {};

x contains the value of the variables for a given example (x[i] is the value of the i-th variable).

Our problem crunches real numbers so the constructor takes vectors of doubles.

Ultra, however, tries to support many use cases by adopting ultra::value_t for storing/passing values. This necessitates a conversion from a vector of doubles (ex_x) to a vector of value_ts (x).

std::ranges::copy performs the conversion once and for all (delaying the conversion at parameter-passing-time is less efficient).


The training set is a collection of examples:

using training_set = std::vector<example>;

Almost any iterable container could be used (e.g. std::list instead of std::vector).


Now we can take advantage of the existing sum_of_errors_evaluator (see src/evaluator.h) class to quickly write your evaluator.

sum_of_errors_evaluator is a template class that, given an error functor (F) and a training set (T):

  • calculates the sum of the errors of a model/program over the training set;
  • converts the total error in a standardized fitness.
template<Individual P, class F, class D = multi_dataset<dataframe>>
requires ErrorFunction<F, D>
class sum_of_errors_evaluator : public evaluator<D>
{
public:
  explicit sum_of_errors_evaluator(D &);

  [[nodiscard]] auto operator()(const P &) const;
  
  // ...
};

The error functor object (F) acquires a program via its constructor and calculates the error on a specific example:

const F err_fctr(prg);

auto error(err_fctr(example));

Implementing F::operator() isn't hard since the code from the previous example is already good:

class error_functor
{
public:
  error_functor(const candidate_solution &s) : s_(s) {}

  double operator()(const example &ex) const
  {
    using namespace ultra;

    std::vector<double> f(N);
    std::ranges::transform(s_, f.begin(),
                   [&ex](const auto &prg)
                   {
                     const auto ret(run(prg, ex.x));

                     return has_value(ret) ? std::get<D_DOUBLE>(ret) : 0.0;
                   });

    std::vector<double> model(N, 0.0);
    for (unsigned i(0); i < N; ++i)
      for (unsigned j(0); j < N; ++j)
        model[i] += ex.b(i, j) * f[j];

    double delta(std::inner_product(ex.a.begin(), ex.a.end(),
                                    model.begin(), 0.0,
                                    std::plus{},
                                    [](auto v1, auto v2)
                                    {
                                      return std::fabs(v1 - v2);
                                    }));

    return delta;
  }

private:
  candidate_solution s_;
};

Note that run(prg) has been changed to run(prg, ex.x), thus enabling the passage of values from the training case to the variables.

(for your ease all the code is in the examples/symbolic_regression05.cc file)

⚠️ **GitHub.com Fallback** ⚠️