Evolution strategies - davidar/scholarpedia GitHub Wiki
Evolution Strategies (ESs) are a sub-class of nature-inspired direct search (and optimization) methods belonging to the class of Evolutionary Algorithms (EAs) which use mutation, recombination, and selection applied to a population of individuals containing candidate solutions in order to evolve iteratively better and better solutions. Its roots date back to the mid 1960s when P. Bienert, I. Rechenberg, and H.-P. Schwefel at the Technical University of Berlin, Germany, developed the first bionics-inspired schemes for evolving optimal shapes of minimal drag bodies in a wind tunnel using Darwin's evolution principle.
Evolution Strategies can be applied in all fields of optimization including continuous, discrete, combinatorial search spaces <math>\mathcal{Y}</math> without and with constraints as well as mixed search spaces. Given the optimization problem
- <math>
The canonical versions of the ES are denoted by
- <math>
- <math>
The conceptual algorithm of the <math>(\mu/\rho \; \stackrel{+}{,} \;\lambda)</math>-ES is given below:
<math>(\mu/\rho \; \stackrel{+}{,} \; \lambda)</math>-Self-Adaptation-Evolution-Strategy
- Initialize parent population <math>\mathbf{P}_\mu = \{ \mathbf{a}_1, \ldots, \mathbf{a}_{\mu} \}\ .</math>
- Generate <math>\lambda</math> offspring <math>\tilde{\mathbf{a}}</math> forming the offspring population <math>\tilde{\mathbf{P}}_\lambda = \{ \tilde{\mathbf{a}}_1, \ldots, \tilde{\mathbf{a}}_\lambda\}</math> where each offspring <math>\tilde{\mathbf{a}}</math> is generated by:- Select (randomly) <math>\rho</math> parents from <math>\mathbf{P}_\mu</math> (if <math>\rho = \mu</math> take all parental individuals instead).
- Recombine the <math>\rho</math> selected parents <math>\mathbf{a}</math> to form a recombinant individual <math>\mathbf{r}\ .</math>
- Mutate the strategy parameter set <math>\mathbf{s}</math> of the recombinant <math>\mathbf{r}\ .</math>
- Mutate the objective parameter set <math>\mathbf{y}</math> of the recombinant <math>\mathbf{r}</math> using the mutated strategy parameter set to control the statistical properties of the object parameter mutation.
 
- Select new parent population (using deterministic truncation selection) from either - the offspring population <math>\tilde{\mathbf{P}}_\lambda</math> (this is referred to as comma-selection, usually denoted as "<math>(\mu,\lambda)</math>-selection"), or
- the offspring <math>\tilde{\mathbf{P}}_\lambda</math> and parent <math>\mathbf{P}_\mu</math>population (this is referred to as plus-selection, usually denoted as "<math>(\mu + \lambda)</math>-selection")
 
- Goto 2. until termination criterion fulfilled.
Depending on the search space and objective function <math>f(\mathbf{y})\ ,</math> the recombination and/or the mutation of the strategy parameters may or may not occur in specific instantiations of the algorithm. For example, a <math>(\mu/1 + \lambda)</math>-ES, or equivalently <math>(\mu + \lambda)</math>-ES, does not use recombination. It draws its new <math>\mu</math> parents for the next generation from both the old <math>\mu</math> parents and the <math>\lambda</math> offspring (generated from these parents) by taking the best <math>\mu</math> individuals (with respect to the observed <math>F(\mathbf{y})</math>).
Evolution Strategies of type <math>(\mu/\rho + 1)</math> are also referred to as steady state ESs, i.e., strategies without a generation gap: They produce only one offspring per generation. After evaluating its fitness <math>F(\mathbf{y})\ ,</math> the worst individual is removed from the population. Strategies of this type are especially useful on parallel computers when the times for calculating the individuals' fitnesses are non-constant, thus allowing for asynchronous parallel processing.
As to the termination condition, distance measures in the fitness and or search space as well as the maximum number of generations can be used.
Mutation and recombination operators in ESs are problem-specifically designed. As an example, consider the <math>(\mu/\mu_I, \lambda)</math>-Self-Adaptation-ES for unconstrained real-valued search spaces <math>\mathbb{R}^n\ .</math> An individual is defined as <math>\mathbf{a} = (\mathbf{y}, \sigma, F(\mathbf{y}))\ .</math> The ES generates <math>\lambda</math> offspring according to
- <math>
- <math>
Simple implementations of the <math>(\mu/\mu_I, \lambda)</math>-<math>\sigma</math>-Self-Adaptation-ES for Mathematica and Matlab/Octave can be found here.
While the <math>(\mu/\mu_I,\lambda)</math>-ES in Eq. (I) uses isotropically distributed mutations for the object parameter vector <math>\mathbf{y}\ ,</math> more advanced ESs use covariance matrix adaptation (CMA) techniques (CMA-ESs) to allow for correlated mutations in real-valued search spaces. Furthermore, hierarchically organized ESs (also referred to as Meta-ESs or Nested-ESs) as well as predator-prey ESs are in use. For example, simple Meta-ESs can be defined by
- <math>
The performance of an ES on a specific problem class depends crucially on the design of the ES-operators (mutation, recombination, selection) used and on the manner in which the ES-operators are adapted during the evolution process (adaptation schemes, e.g., <math>\sigma</math>-self-adaptation, covariance matrix adaptation, etc.). Ideally they should be designed in such a manner that they guarantee the evolvability of the system throughout the whole evolution process. Here are some principles and general guide lines:
- Selection is done by population truncation similar to that what breeders are doing when breeding animals or plants.
- Typical truncation ratios <math>\mu/\lambda</math> in strategies with comma-selection in continuous search spaces are in the range from 1/7 to 1/2.
- Using plus-selection (some kind of elitist selection) in conjunction with variation operators that allow for reaching any point in finite discrete search spaces in finite time guarantees stochastic convergence to the global optimizer. However, since this is a result that holds for infinite running time only, one cannot draw general conclusions regarding the finite time behavior of the ESs.
- Using plus-selection is recommended for combinatorial optimization problems.
- Evolution in ESs is usually modeled on the phenotypic level. The problem to be optimized is usually presented in its natural problem representation trying to respect the principle of strong causality. This means that the variation operators (mutation and recombination) should perform search steps in such a manner that small search steps result in small fitness changes and vice versa.
- Variation operators (mutation and recombination) should be designed on the basis of the 
- Mutation is the main source of variation, i.e., an ES always contains a mutation operator. Mutation should respect the reachability condition: Each (finite distant) point of the search space should be reachable in a finite number of steps. Mutations should be scalable (if possible) in order to allow for self-adaptation. The maximum entropy principle suggests the use of normally distributed mutations in <math>\mathbb{R}^n</math> search spaces and geometrically distributed mutations in <math>\mathbb{Z}^n</math> (integer) search spaces.
- Recombination is applied whenever possible and useful. It uses <math>\rho=2</math> or more parental individuals to generate a single recombinant (the case <math>\rho > 2</math> is referred to as multi-recombination). The main goal of recombination is the conservation of common components of the parents, i.e., the transfer of (beneficial) similarities to the next generation and to dampen the effect of malicious components of the parents' genes (genetic repair effect).
Consider an optimization problem <math>\mathbf{y}^* = \mathrm{argopt}_{\mathbf{y}} f(\mathbf{y})</math> where <math>\mathbf{y}</math> is a vector describing the permutation of <math>n</math> components. For example, <math>\mathbf{y} = (1, 3, 9, 2, \ldots)</math> describes the ordering of components, e.g., the ordering of city numbers a salesperson visits consecutively (Traveling Salesperson Problem), or the ordering of jobs in a job shop scheduling problem such that the overall cost <math>f(\mathbf{y})</math> are minimal. In ESs, this optimization problem is usually represented in its natural problem representation, i.e., the variation operators act directly on the ordering <math>\mathbf{y}\ .</math> An individual is defined as <math>\mathbf{a} = (\mathbf{y}, F(\mathbf{y}))\ .</math> The ES generates <math>\lambda</math> offspring according to
- <math>
 
They represent elementary move steps defining certain search neighborhoods (number of states that can be reached by one step). Unlike mutations in continuous search spaces, there exists always a minimal search step (representing the smallest mutation possible). The performance of the different permutation operators depends on the optimization problem to be solved. Trying to ensure the strong causality principle (i.e., small steps should result in small fitness changes) is one of the main design and selection principle for such operators.
CMA-ESs represent the state-of-the-art in evolutionary optimization in real-valued <math>\mathbb{R}^n</math> search spaces. The CMA-ES has been proposed by A. Gawelczyk, N. Hansen, and A. Ostermeier in the mid 1990s. Its basic difference to the <math>(\mu/\mu_I, \lambda)</math>-<math>\sigma</math>-Self-Adaptation-ES example is the shape of mutation distribution which is generated according to a covariance matrix <math>\mathbf{C}</math> which is adapted during evolution. Thus, the mutations can adapt to the local shape of the fitness landscape and convergence to the optimum can be increased considerably. It uses special statistics cumulated over the generations to control endogenous strategy-specific parameters (the covariance matrix <math>\mathbf{C}</math> and the global step size <math>\sigma</math>). This is in contrast to the (mutative) <math>\sigma</math>-self-adaptation approach considered before. A simplified (but well working) instantiation of the offspring update formulas of the <math>(\mu/\mu_I, \lambda)</math>-CMA strategy for small population sizes <math>\lambda</math> (small, compared to the search space dimensionality <math>n</math>) reads
<math>(\mu/\mu_I, \lambda)</math>-CMA-ES
- <math>\mbox{(L1):} \quad
- <math>\mbox{(L2):} \quad
- <math>\mbox{(L3):} \quad
- <math>\mbox{(L4):} \quad
- <math>\mbox{(L5):} \quad
- <math>\mbox{(L6):} \quad
In (L1), <math>\lambda</math> offspring <math>\mathbf{y}_l</math> are generated by transforming standard normally distributed random vectors using a transformation matrix <math>\sqrt{\mathbf{C}}</math> to be obtained, e.g., by Cholesky decomposition of the covariance matrix <math>\mathbf{C}\ ,</math> i.e., <math>\sqrt{\mathbf{C}} \sqrt{\mathbf{C}}^T = \mathbf{C}\ ,</math> and the global step size factor <math>\sigma\ .</math> In (L2) the best <math>\mu</math> mutations are recombined according to Equation (II), thus, forming the recombinant <math>\mathbf{y}</math> (center of mass individual) for the next generation. Algorithmically, there is only one recombinant (to be used as the final solution when the algorithm terminates). Vector <math>\langle \mathbf{w} \rangle</math> connects the recombinants of two consecutive generations, thus, <math>\langle \mathbf{w} \rangle/\sigma</math> represents the tendency of evolution in the search space. This information is cumulated in the <math>\mathbf{s}</math> vector (<math>\mathbf{s} = \mathbf{0}\ ,</math> initially chosen) exponentially decaying with time constant <math>\tau=\sqrt{n}\ .</math> The direction vector <math>\mathbf{s}</math> is used to update (rank one update) the covariance matrix <math>\mathbf{C}</math> (<math>\mathbf{C} = \mathbf{1}\ ,</math> initially chosen) with time constant <math>\tau_{\mathrm{c}} \propto n^2\ .</math> The remaining (L5) and (L6) are used to control the global step size <math>\sigma</math> using the cumulated step size adaptation (CSA) technique with time constant <math>\tau_\sigma = \sqrt{n}</math> (<math>\mathbf{s}_\sigma = \mathbf{0}\ ,</math> initially chosen). The recombinant <math>\langle \mathbf{N}(\mathbf{0}, \mathbf{1}) \rangle</math> is calculated using Equation (II).
Simple implementations of the CMA-ES for Mathematica and Matlab/Octave can be found here.
CMA-ES for large population sizes differ basically in the way how the covariance matrix <math>\mathbf{C}</math> is updated in (L4). To this end, (L4) is extended by a rank <math>\mu</math> update proportional to
- <math>
- Beyer, H.-G. and Schwefel, H.-P. (2002). Evolution Strategies: A Comprehensive Introduction. In Natural Computing, 1(1):3-52.
- Beyer, H.-G. (2001). The Theory of Evolution Strategies. Natural Computing Series. Springer, Berlin 2001.
- Hansen, N. and Ostermeier, A. (2001). Completely Derandomized Self-Adaptation in Evolution Strategies. In Evolutionary Computation, 9(1):159-195.
- Hansen, N. and Müller, S.D. and Koumoutsakos, P. (2003). Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). In Evolutionary Computation, 11(1):1-18.
- Rechenberg, I. (1994). Evolutionsstrategie '94. Frommann-Holzboog Verlag, Stuttgart, (in German).
- Schwefel, H.-P. (1995). Evolution and Optimum Seeking. Wiley, New York, NY.
- Jan A. Sanders (2006) Averaging. Scholarpedia, 1(11):1760.
- Tomasz Downarowicz (2007) . Scholarpedia, 2(11):3901. 
- Rob Schreiber (2007) MATLAB. Scholarpedia, 2(7):2929.
- Frank Hoppensteadt (2006) Predator-prey model. Scholarpedia, 1(10):1563.
- Hans-Georg Beyer's Website
- Evolutionary_Algorithms - Terms and Definitions
- Evolution Strategy Related Website of Bionics Chair at the Technical University Berlin
- Nikolaus Hansen's Website with CMA-ES Related Stuff
- H.-P. Schwefel's Two-Phase Nozzle Evolved by an (1+1)-ES
- Following a Moving Optimum by a (15,100)-ES
- A Collection of Evolution Strategy Demonstrations (to be found below)
- Evolution Strategies in Action: Animations Using MS Windows
Evolutionary Algorithms, Evolutionary Computation, Evolutionary Programming, Evolutionary Robotics, Evolvable Hardware, Evolving Intelligent Systems, Genetic Algorithms, Genetic Programming
Category:Computational Intelligence Category:Evolutionary Computation