bibliography - morinim/vita GitHub Wiki

1

Multi Expression Programming→
M. Oltean, 2006-06-04, [Online]

Multi Expression Programming (MEP) is a Genetic Programming variant that uses a linear representation of chromosomes. MEP individuals are strings of genes encoding complex computer programs. When MEP individuals encode expressions, their representation is similar to the way in which compilers translate C or Pascal expressions into machine code. A unique MEP feature is the ability of storing multiple solutions of a problem in a single chromosome. Usually, the best solution is chosen for fitness assignment. When solving symbolic regression or classification problems (or any other problems for which the training set is known before the problem is solved) MEP has the same complexity as other techniques storing a single solution in a chromosome (such as GP, CGP, GEP or GE). Evaluation of the expressions encoded into a MEP individual can be performed by a single parsing of the chromosome. Offspring obtained by crossover and mutation are always syntactically correct MEP individuals (computer programs). Thus, no extra processing for repairing newly obtained individuals is needed.

2

Steady-State ALPS for Real-Valued Problems→
G.S. Hornby, 2009, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp. 795-802
DOI: https://doi.org/10.1145/1569901.1570011

The objectives of this paper are to describe a steady-state version of the Age-Layered Population Structure (ALPS) Evolutionary Algorithm (EA) and to compare it against other GAs on real-valued problems. Motivation for this work comes from our previous success in demonstrating that a generational version of ALPS greatly improves search performance on a Genetic Programming problem. In making steady-state ALPS, some modifications were made to the method for calculating age and the method for moving individuals up age layers. To demonstrate that ALPS works well on real-valued problems we compare it against CMA-ES and Differential Evolution (DE) on five challenging, real-valued functions and on one real-world problem. While CMA-ES and DE outperform ALPS on the two unimodal test functions, ALPS is much better on the three multimodal test problems and on the real-world problem. Further examination shows that, unlike the other GAs, ALPS maintains a genotypically diverse population throughout the entire search process. These findings strongly suggest that the ALPS paradigm is better able to avoid premature convergence then the other GAs.

3

Genetic Programming with Adaptive Representations→
J.P. Rosca and D.H. Ballard, 1994, Technical Report University of Rochester, Rochester, NY, USA

Machine learning aims towards the acquisition of knowledge based on either experience from the interaction with the external environment or by analyzing the internal problem-solving traces. Both approaches can be implemented in the Genetic Programming (GP) paradigm. Hillis [1990] proves in an ingenious way how the first approach can work. There have not been any significant tests to prove that GP can take advantage of its own search traces. This paper presents an approach to automatic discovery of functions in GP based on the ideas of discovery of useful building blocks by analyzing the evolution trace, generalizing of blocks to define new functions and finally adapting of the problem representation on-the-fly. Adaptation of the representation determines a hierarchical organization of the extended function set which enables a restructuring of the search space so that solutions can be found more easily. Complexity measures of solution trees are defined for an adaptive representation framework and empirical results are presented.

4

Greedy Recombination and Genetic Search on the Space of Computer Programs
W.A. Tackett, 1994, Proceedings of the Third Workshop on Foundations of Genetic Algorithms. Estes Park, Colorado, USA, July 31 - August 2 1994
DOI: https://doi.org/10.1016/B978-1-55860-356-1.50017-0

Many natural organisms overproduce zygotes and subsequently decimate the ranks of offspring at some later stage of development. The basic purpose of this behavior is the reduction of parental resource investment in offspring which are less fit than others according to some metabolically cheap fitness measure. An important insight into this process is that for single-pair matings all offspring are products of the same parental genotypes: the selection taking place therefore seeks the most fit recombination of parental traits. This paper presents the Greedy Recombination operator RB(n) for genetic programming, which performs greedy selection among potential crossover sites of a mating pair. The properties of RB(n) are described both from a statistical standpoint and in terms of their effect upon search; comparisons are drawn to existing methods. We formulate the class of constructional problems, which allow precise control over the fitness structure in the space of expressions being searched. The constructional approach is used to create simple GP analogies of the “Royal Road” problems used to study classical GA. The effects of RB(n) upon search properties, fitness distributions, and genotypic variations are examined and contrasted with effects of selection and recombination methods.

5

Trivial Geography in Genetic Programming→
L. Spector and J. Klein, 2006, Genetic Programming Theory and Practice III. Genetic Programming, vol 9. Springer, Boston, MA
DOI: https://doi.org/10.1007/0-387-28111-8_8

Geographical distribution is widely held to be a major determinant of evolutionary dynamics. Correspondingly, genetic programming theorists and practitioners have long developed, used, and studied systems in which populations are structured in quasi-geographical ways. Here we show that a remarkably simple version of this idea produces surprisingly dramatic improvements in problem-solving performance on a suite of test problems. The scheme is trivial to implement, in some cases involving little more than the addition of a modulus operation in the population access function, and yet it provides significant benefits on all of our test problems (ten symbolic regression problems and a quantum computing problem). We recommend the broader adoption of this form of “trivial geography” in genetic programming systems.

6

Adapting Crossover in Evolutionary Algorithms
W.M. Spears, 1995, Proceedings of the Fourth Annual Conference on Evolutionary Programming, MIT Press, pp. 367-384

One of the issues in evolutionary algorithms (EAs) is the relative importance of two search operators: mutation and crossover. Genetic algorithms (GAs) and genetic programming (GP) stress the role of crossover, while evolutionary programming (EP) and evolution strategies (ESs) stress the role of mutation. The existence of many different forms of crossover further complicates the issue. Despite theoretical analysis, it appears difficult to decide a priori which form of crossover to use, or even if crossover should be used at all. One possible solution to this difficulty is to have the EA be self-adaptive, i.e., to have the EA dynamically modify which forms of crossover to use and how often to use them, as it solves a problem. This paper describes an adaptive mechanism for controlling the use of crossover in an EA and explores the behavior of this mechanism in a number of different situations. An improvement to the adaptive mechanism is then presented. Surprisingly this improvement can also be used to enhance performance in a non-adaptive EA.

7

Dynamic training subset selection for supervised learning in Genetic Programming→
C. Gathercole and P. Ross, 1994, Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: Parallel Problem Solving from Nature (PPSN III), Yuval Davidor, Hans-Paul Schwefel and Reinhard Männer (Eds.). Springer-Verlag, London, UK, pp. 312-321

When using the Genetic Programming (GP) Algorithm on a difficult problem with a large set of training cases, a large population size is needed and a very large number of function-tree evaluations must be carried out. This paper describes how to reduce the number of such evaluations by selecting a small subset of the training data set on which to actually carry out the GP algorithm. Three subset selection methods described in the paper are:

Dynamic Subset Selection (DSS), using the current GP run to select difficult and/or disused cases,

Historical Subset Selection (HSS), using previous GP runs,

Random Subset Selection (RSS).

Various runs have shown that GP+DSS can produce better results in less than 20percent of the time taken by GP. GP+HSS can nearly match the results of GP, and, perhaps surprisingly, GP+RSS can occasionally approach the results of GP. GP+DSS also produced better, more general results than those reported in a paper for a variety of Neural Networks when used on a substantial problem, known as the Thyroid problem.

8

An Efficient Constraint Handling Method for Genetic Algorithms→
Kalyanmoy Deb, 2000, Computer Methods in Applied Mechanics and Engineering, vol. 186, issue 2-4, pp. 311-338
DOI: https://doi.org/10.1016/S0045-7825(99)00389-8

Many real-world search and optimization problems involve inequality and/or equality constraints and are thus posed as constrained optimization problems. In trying to solve constrained optimization problems using genetic algorithms (GAs) or classical optimization methods, penalty function methods have been the most popular approach, because of their simplicity and ease of implementation. However, since the penalty function approach is generic and applicable to any type of constraint (linear or nonlinear), their performance is not always satisfactory. Thus, researchers have developed sophisticated penalty functions specific to the problem at hand and the search algorithm used for optimization. However, the most difficult aspect of the penalty function approach is to find appropriate penalty parameters needed to guide the search towards the constrained optimum. In this paper, GA's population-based approach and ability to make pair-wise comparison in tournament selection operator are exploited to devise a penalty function approach that does not require any penalty parameter. Careful comparisons among feasible and infeasible solutions are made so as to provide a search direction towards the feasible region. Once sufficient feasible solutions are found, a niching method (along with a controlled mutation operator) is used to maintain diversity among feasible solutions. This allows a real-parameter GA's crossover operator to continuously find better feasible solutions, gradually leading the search near the true optimum solution. GAs with this constraint handling approach have been tested on nine problems commonly used in the literature, including an engineering design problem. In all cases, the proposed approach has been able to repeatedly find solutions closer to the true optimum solution than that reported earlier.

9

Genetic Programming - An Introduction
Wolfgang Banzhaf, Peter Nordin, Robert E. Keller and Frank D. Francone, 1998, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA
ISBN:1-55860-510-X

Since the early 1990s, genetic programming (GP)-a discipline whose goal is to enable the automatic generation of computer programs-has emerged as one of the most promising paradigms for fast, productive software development. GP combines biological metaphors gleaned from Darwin's theory of evolution with computer-science approaches drawn from the field of machine learning to create programs that are capable of adapting or recreating themselves for open-ended tasks. This unique introduction to GP provides a detailed overview of the subject and its antecedents, with extensive references to the published and online literature. In addition to explaining the fundamental theory and important algorithms, the text includes practical discussions covering a wealth of potential applications and real-world implementation techniques. Software professionals needing to understand and apply GP concepts will find this book an invaluable practical and theoretical guide.

10

Differential Evolution: A simple evolution strategy for fast optimization
Kenneth Price, Rainer Storn, Dr. Dobb's Journal #264 Volume 22, Number 4, April 1997
ISSN: 1044-789X

A new heuristic approach for minimizing possibly nonlinear and non differentiable continuous space functions is presented. By means of an extensive testbed, which includes the De Jong functions, it is demonstrated that the new method converges faster and with more certainty than both Adaptive Simulated Annealing and the Annealed Nelder&Mead approach. The new method requires few control variables, is robust, easy to use, and lends itself very well to parallel computation.

11

Artificial Intelligence: A Modern Approach
Stuart J. Russell, Peter Norvig, Prentice Hall, 2009 (3rd edition)
ISBN: 0-13-604259-7

Artificial Intelligence: A Modern Approach (AIMA) is a university textbook on artificial intelligence, written by Stuart J. Russell and Peter Norvig. It was first published in 1995 and the third edition of the book was released 11 December 2009. It is used in over 1350 universities worldwide and has been called the most popular artificial intelligence textbook in the world.It is considered the standard text in the field of artificial intelligence.

The book is intended for an undergraduate audience but can also be used for graduate-level studies with the suggestion of adding some of the primary sources listed in the extensive bibliography.

12

Multiclass Object Classification Using Genetic Programming→
Mengjie Zhang, Will Smart, 2004, Victoria University of Wellington, Technical Report CS-TR-04/2
DOI: https://doi.org/10.1007/978-3-540-24653-4_38

We describe an approach to the use of genetic programming for multi-class object classification problems. Rather than using fixed static thresholds as boundaries to distinguish between different classes, this approach introduces two methods of classification where the boundaries between different classes can be dynamically determined during the evolutionary process. The two methods are centred dynamic class boundary determination and slotted dynamic class boundary determination. The two methods are tested on four object classification problems of increasing difficulty and are compared with the commonly used static class boundary method. The results suggest that, while the static class boundary method works well on relatively easy object classification problems, the two dynamic class boundary determination methods outperform the static method for more difficult, multiple class object classification problems.

13

Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification→
Mengjie Zhang, Will Smart, 2006, Victoria University of Wellington, Technical Report CS-TR-05/5
DOI: https://doi.org/10.1016/j.patrec.2005.07.024

This paper describes a new approach to the use of Gaussian distribution in genetic programming (GP) for multiclass object classification problems. Instead of using predefined multiple thresholds to form different regions in the program output space for different classes, this approach uses probabilities of different classes, derived from Gaussian distributions, to construct the fitness function for classification. Two fitness measures, overlap area and weighted distribution distance, have been developed. Rather than using the best evolved program in a population, this approach uses multiple programs and a voting strategy to perform classification. The approach is examined on three multiclass object classification problems of increasing difficulty and compared with a basic GP approach. The results suggest that the new approach is more effective and more efficient than the basic GP approach. Although developed for object classification, this approach is expected to be able to be applied to other classification problems.

14

Strongly Typed Genetic Programming→
David J Montana, 2002, Evolutionary Computation vol. 3 n. 2
DOI: https://doi.org/10.1162/evco.1995.3.2.199

Genetic programming is a powerful method for automatically generating computer programs via the process of natural selection (Koza, 1992). However, in its standard form, there is no way to restrict the programs it generates to those where the functions operate on appropriate data types. In the case when the programs manipulate multiple data types and contain functions designed to operate on particular data types, this can lead to unnecessarily large search times and/or unnecessarily poor generalisation performance. Strongly typed genetic programming (STGP) is an enhanced version of genetic programming that enforces data-type constraints and whose use of generic functions and generic data types makes it more powerful than other approaches to type-constraint enforcement. After describing its operation, we illustrate its use on problems in two domains, matrix/vector manipulation and list manipulation, which require its generality. The examples are (1) the multidimensional least-squares regression problem, (2) the multidimensional Kalman filter, (3) the list manipulation function NTH, and (4) the list manipulation function MAPCAR.

15

Discovery of Subroutines in Genetic Programming→
Justinian P. Rosca, Dana H. Ballard, 1996, Advances in genetic programming, volume 2, pages 177–201
DOI: https://doi.org/10.7551/mitpress/1109.003.0014

A fundamental problem in learning from observation and interaction with an environment is defining a good representation, that is a representation which captures the underlying structure and functionality of the domain. This chapter discusses an extension of the genetic programming (GP) paradigm based on the idea that subroutines obtained from blocks of good representations act as building blocks and may enable a faster evolution of even better representations. This GP extension algorithm is called adaptive representation through learning (ARL). It has built-in mechanisms for (1) creation of new subroutines through discovery and generalization of blocks of code; (2) deletion of subroutines. The set of evolved subroutines extracts common knowledge emerging during the evolutionary process and acquires the necessary structure for solving the problem. ARL was successfully tested on the problem of controlling an agent in a dynamic and non-deterministic environment. Results with the automatic discovery of subroutines show the potential to better scale up the GP technique to complex problems.

16

Evolving Teams of Predictors with Linear Genetic Programming→
Markus Brameier, Wolfgang Banzhaf, 2001, Genetic Programming and Evolvable Machines, volume 2, pages 381–407
DOI: https://doi.org/10.1023/A:1012978805372

This paper applies the evolution of GP teams to different classification and regression problems and compares different methods for combining the outputs of the team programs. These include hybrid approaches where (1) a neural network is used to optimize the weights of programs in a team for a common decision and (2) a real numbered vector (the representation of evolution strategies) of weights is evolved with each term in parallel. The cooperative team approach results in an improved training and generalization performance compared to the standard GP method. The higher computational overhead of team evolution is counteracted by using a fast variant of linear GP.

17

Recombination, selection, and the genetic construction of computer programs→
Walter Tackett, 1994, thesis for Ph.D.
DOI: https://doi.org/10.13140/RG.2.2.36282.18880

Computational intelligence seeks, as a basic goal, to create artificial systems which mimic aspects of biological adaptation, behavior, perception, and reasoning. Toward that goal, genetic program induction “Genetic Programming” has succeeded in automating an activity traditionally considered to be the realm of creative human endeavor. It has been applied successfully to the creation of computer programs which solve a diverse set of model problems. This naturally leads to questions such as: Why does it work? How does it fundamentally differ from existing methods? What can it do that existing methods cannot? The research described here seeks to answer those questions. This is done on several fronts. Analysis is performed which shows that Genetic Programming has a great deal in common with heuristic search, long studied in the field of Artificial Intelligence. It introduces a novel aspect to that method in the form of the recombination operator which generates successors by combining parts of favorable strategies. On another track, we show that Genetic Programming is a powerful tool which is suitable for real-world problems. This is done first by applying it to an extremely difficult induction problem and measuring performance against other state-of-the-art methods. We continue by formulating a model induction problem which not only captures the pathologies of the real world, but also parameterizes them so that variation in performance can be measured as a function of confounding factors. At the same time, we study how the effects of search can be varied through the effects of the selection operator. Combining the lessons of the search analysis with known properties of biological systems leads to the formulation of a new recombination operator which is shown to improve induction performance. In support of the analysis of selection and recombination, we define problems in which structure is precisely controlled. These allow fine discrimination of search performance which help to validate analytic predictions. Finally, we address a truly unique aspect of Genetic Programming, namely the exploitation of symbolic procedural knowledge in order to provide “explanations” from genetic programs.