symbolic_regression_part2 - morinim/vita GitHub Wiki

Symbolic regression - Custom evaluator

...that is great BUT my problem needs a particular evaluator / requires a unique data access technique / has a peculiar way of doing things.

No problem at all, you can customize the evaluator!

Toy problem

Given a, b and c find a function f such that $a = b * f(c)$.

Probably this is not of immediate interest, yet is useful to illustrate a trait that may be shared by other, more complicated, problems and as a way to explain a more general problem solving technique.

Setting up code

const double a = vita::random::between(-10.0, 10.0);
const double b = vita::random::between(-10.0, 10.0);

a and b get two fixed, random values.

c is somewhat different: it's a terminal. Terminal and function sets are the alphabet of the to-be-evolved-program (f). The terminal set consists of the variables and the constants.

For our problem c is the only terminal required (in general we also add some numbers):

class c : public vita::terminal
{
public:
  c() : vita::terminal("c") {}

  vita::value_t eval(vita::symbol_params &) const override
  {
    static const double val(vita::random::between(-10.0, 10.0));
    return val;
  }
};

The constructor (c() : vita::terminal("c") {}) sets the name of the terminal (used for displaying purpose).

The eval function returns a fixed random value.


int main()
{
  vita::problem prob;

  // SETTING UP SYMBOLS
  prob.insert<c>();
  prob.insert<vita::real::add>();
  prob.insert<vita::real::sub>();
  prob.insert<vita::real::mul>();

  // ...
}

Note how the base problem class is used instead of the derived src_problem. src_problem has a lot of ready-to-be-used functionalities (dataframes for training and validation, evaluator functions for scoring the goodness of a candidate solution...) but problem is more general and adaptable to different tasks (not only symbolic regression).

Besides the terminal c we use the functions add, sub, mul as building blocks (function set).


Now what is missing is the evaluator (aka fitness function):

using candidate_solution = vita::i_mep;

// Given an individual (i.e. a candidate solution of the problem), returns an
// score measuring how good it is.
class my_evaluator : public vita::evaluator<candidate_solution>
{
public:
  vita::fitness_t operator()(const candidate_solution &x) override
  {
    const auto ret(vita::run(x));

    const double f(vita::has_value(ret) ? std::get<vita::D_DOUBLE>(ret)
                                        : 0.0);

    const double model_output(b * f);

    const double delta(std::fabs(a - model_output));

    return {-delta};
  }
};

candidate_solution is just an alias for i_mep; i_mep (Multi Expression Programming) is a kind of linear representation used for Genetic Programming.

A line by line description of the evaluation process follows:

const auto ret(vita::run(x));

Simply gets and stores the output of the candidate_solution.

ret is a std::variant (see vita::value_t for further details).

Variants allow efficient manipulation of different data types: here we are working with real numbers but Vita also supports integers and strings.

const double f(vita::has_value(ret) ? std::get<vita::D_DOUBLE>(ret)
                                    : 0.0);

std::get<vita::D_DOUBLE>(ret) extracts the real number from the variant.

The user must check the variant is not empty (vita::has_value(ret)): it's required since the evolution process generates many nefarious individuals that could blow up for specific input values.

const double model_output(b * f);

const double delta(std::fabs(a - model_output));

delta is a measure of the error based on the absolute value. Different norms may give better results (problem dependent).

return {-delta};

The last instruction can be confusing, so let's see some details:

  • -delta instead of delta. Vita uses standardized fitness (greater is better) not raw fitness. See the comments in fitness.h;
  • {-delta} instead of -delta. Fitness is a vector type. Here it's just a one dimensional vector; in general it could have more dimensions (side note, by default evolution uses a lexicographic comparison for fitness).

https://xkcd.com/534/


All that remains is to put the pieces together:

int main()
{
  // ...

  // AD HOC EVALUATOR
  vita::search<candidate_solution> s(prob);
  s.training_evaluator<my_evaluator>();

  // SEARCHING
  const auto result(s.run());

  std::cout << "\nCANDIDATE SOLUTION\n"
            << vita::out::c_language << result.best.solution
            << "\n\nFITNESS\n" << result.best.score.fitness << '\n';
}

The search object (s) is instructed to use our evaluator (s.training_evaluator<my_evaluator>()) before being launched (s.run()).

(for your ease all the code is in the examples/symbolic_regression03.cc file)

GO ON TO PART 3 →