Zelinka Ivan
Tomas Bata University Zlin, Faculty of Technology Institute of Information
Technologies
Mostní 5139, 760 01 Zlín, Czech Republic, Phone: +420 577 603 102, E-Mail:
zelinka@ft.utb.cz
Keywords: analytic programming, genetic
programming, grammar evolution, evolution algorithms, symbolic
regression
Term "symbolic regression" (SR) represents process during which
are measured data fitted by suitable mathematical formula like
"
",
, etc. This process is amongst
mathematician quite well known and used when some data of unknown
process are obtained. For long time SR was domain only of humans
but for a few last decades it is also domain of computers. Idea
how to solve various problems by SR by means of evolutionary
algorithms (EAs), come from John Koza who used genetic algorithm
(GA) in so called genetic programming (GP) [1], [2]. Genetic
programming is basically symbolic regression which is done by
evolutionary algorithms instead of by humans. Ability to solve
really hard problems were proved during time many times and GP is
today on such level of performance that it is able to synthesise
(for example) extremely sophisticated electronic circuits [3]. It
is clear that importance of symbolic regression will increase due
to increasing complexity of solved problems in science and
industry.
Main attribute of symbolic regression is that it is
executed by evolutionary algorithms and whole evolutionary process
is working with so called functional set and terminal set.
Functional set is a set of all user defined or used functions
while terminal set is a set of all constant or variables.
In this participation are discussed three different
methods for symbolic regression - genetic programming, grammar
evolution (GE) and so called analytic programming (AP), which is
novelty tool for symbolic regression, based on different
principles in regard of GP or GE. Higher attention will be
dedicated to AP because of its novelty.
Symbolic regression is in fact based on existence of so called
evolutionary algorithms. This class of algorithms is based on
Darwinian theory of evolution and one of its main attributes is
that there is no calculated only one solution, but a class of
possible solutions at once. This class of possible and acceptable
solutions is called "population". Members of this populations are
called "individuals" and mathematically said, they represent
possible solution, i.e. solution which can be realised in real
world application. Main aim of evolutionary algorithms is to find
during evolutionary process the best solution of all. Evolutionary
algorithms differ among themselves in many points of view like for
example individual representation (binary, decimal) or offspring
creation (standard crossover, aritmetic operations, vector
operations, etc...). They also differ in philosophical
background on which they were developed and usually they are named
according to this point of view.
Symbolic regression is based on evolutionary algorithms
and its main aim is to "synthetise" in an evolutionary way such
"program" (mathematical formulas, computer programs, logical
expresssions, etc...) which will solve user defined problem as
well as possible. While domain of EAs is of numerical nature
(real, complex, integer, discrete), domain of symbolic regression
is of functional nature, i.e. it consist of function set like
(sin(), cos(), gamma(), MyFunction(),...) and so called
terminal set (t, x, p, ...). From mix of both sets is then
synthetised final program, which can be quite complicated in point
of view of its structure.
In the novadays there are three methods allowing to do
that: genetic programming, grammar evolution and analytic
programming.
Genetic programming [1], [2] is oldiest method of automatic
programme creation by means of genetic algorithm, which was
developed by american informatic J. R. Koza (see also
www.genetic-programming.org). This method is based on computer
language Lisp which is able to manipulate with symbolic
expressions. During existence of GP there was done by GP numerous
examples like data fitting, logical expressions synthesis, robot
trajectory optimisation, synthesis of a programme for artificial
ant movement, system identification, etc. Main principle of GP is
such that programs are represented in chromosomes like syntactic
trees. Based on GA principles, trees are cutted by crossover
operation and cutted parts are exchanged between themselves.
In this way there are created new individuals
(offsprings), whose represents in fact a new programs which are
evaluated by fitneess as is common in EAs. During creation of new
individual-program there are also applied other operators like
mutation etc.
Genetic programming also employ a new techniques like
automatically defined functions (ADF), which can be specified like
automatically defined iterations (ADI), automatically defined
loops (ADL), automatically defined recursions (ADR), etc. Term
"automatic" means that some functions created during evolution are
automatically included into basic set of functions.
Grammar evolution is the second method for symbolic regression
which basically stems from GP. According to the author (C. Ryan,
www.gramatical-evolution.org) GE is symbolic regression, which can
be done by arbitrarry programme language like Lisp, C++, Java,
XML, Perl, Fortran, etc. In contrary with GP, grammar evolution is
basically its extension which does not use direct symbolic
representation in Lisp, but use so called Backus-Naur form (BNF).
Based on BNF is GE able to do symbolic regression in above
mentiones computer languages.
During existence of GE there were done some comparative
simulations based on problems solved by Koza in GP. For full texts
please see www.gramatical-evolution.org.
Analytic programming (I. Zelinka, www.ft.utb.cz/people/zelinka/soma)
was inspired by two existing methods: by Hilbert spaces and by GP. Principles
and general philosophy of analytic programming (AP) stem from these two methods.
Into AP an idea about evolutionary creation of symbolic solutions is taken from
GP while from Hilbert spaces idea of functional spaces and building of resulting
function by means of searching process is adopted into AP. This process is usually
done by numerical methods like Ritz or Galerkin methods are [4]. Core of AP
is based on set of functions, operators and so-called terminals, which are usually
constants or independent variables as well as in GP and GE. Main aim of AP is
to synthesise suitable program (mathematical formulas, for example see results
of AP (1)-(4)) which would fit measured data as well as possible. For this reason
a discrete set handling idea was adopted into AP.
Discrete set handling (DSH) was proposed in [5], [6]. In analytic programming
DSH is used to create integer index, which is used in evolutionary process like
alternate individual, handled in EA by method of integer handling [5], [6].
An individual is created automatically in population like integer individual
whose parameters range cannot exceed the cardinality of used set of functions
and terminals.
 |
|
|
(1) |
 |
|
|
(2) |
 |
|
|
|
 |
|
|
(3) |
Today AP exists in three versions. All three versions
need for program synthesis the same sets of functions, terminals,
etc as Koza use in GP [1], [2]. The first basic version called
is basic version of AP. It use constants from
terminal set to synthetise programs in the same way like Koza [1].
The second one -
is modification in the sense of
constant estimation. In comparing with
in
only one nonspecified constant "K" is generated instead of
randomly generated constants. Constant "K" is after synthesising
indexed so that
,
, ...,
, are in formula
obtained and then all
are estimated by different or by the
same evolutionary method. Because EA "works under" EA (i.e.
program
K indexing
estimation of
) then this version is
called AP with metaevolution -
. Last modification is
which is based on
is such that instead of
other EA is K estimated by suitable non-linear fitting
method.
To verify that AP is viable there were done simulations
- experiments that were for each case many times repeated, see
[7], [8] and [9]. All simulations, especially the last one
comparative, has showed that AP is able to solve the same problems
like GP or GE at the same level of quality.
The method of symbolic regression described here are of three
kind. The first one, genetic programming, is the oldiest one and
can be used only by genetic algorithms and Lisp programme
language. The second one, grammar evolution is "unfolding" of a
genetic programming so that instead of Lisp there can be used
different computer programme languages. Hovewer, grammar evolution
still use binary representation of individuals and crossover
operations like genetic algorithms use. The third method, called
analytic programming, is independent on programme language and can
be used by any evolutionary algorithm, does not matter how
new
offspring are calculated.
It can be stated that all three algorithms can be used
for symbolic regression tasks and their hierarchy is such that on
the lowest level is genetic programming (due to the ability to be
used only by Lisp and genetic algorithm), on the higher level is
grammar evolution (due to the possibility use different computer
languages) and on the highest level is analytic programming (due
to ability to be used not only different computer languages but
also by any evolutionary algorithm like differential
evolution
(DE), simulated annealing (SA),...).
For complete information about all three methods it is
recommended to see literature at the end or visit homepages of all
three methods, i.e. for genetic programming:
www.genetic-programming.org, for grammar evolution:
www.gramatical-evolution.org and for analytic programming:
www.ft.utb.cz/people/zelinka/soma .
- 1
- Koza J.R. Genetic Programming, MIT Press, ISBN 0-262-11189-6, 1998
- 2
- Koza J.R., Bennet F.H., Andre D., Keane M.,
Genetic Programming III, Morgan Kaufnamm pub.,
ISBN 1-55860-543-6, 1999
- 3
- J. R. Koza, M. A. Keane, M. J. Streeter, Evolving Inventions,
ScientificAmerican, February 2003, p. 40-47, ISSN 0036-8733, see
also www.sciam.com
- 4
- Rektorys, Karel, Variational methods in Engineering
Problems and Problems of Mathematical Physics. Czech Ed., 1999,
ISBN 80-200-0714-8. English edition is also available on the
Internet, see www.amazon.com
- 5
- Lampinen Jouni, Zelinka Ivan. New Ideas in Optimization -
Mechanical Engineering Design Optimization by Differential
Evolution. Volume 1. London : McGraw-Hill, 1999. 20 p. ISBN
007-709506-5.
- 6
- Zelinka, Ivan, Artificial Intelligence in The Problems of Global
Optimization (Czech Ed.) BEN, 2002, 190 p. ISBN 80-7300-069-5
- 7
- Zelinka I.: Analytic Programming by Means of Soma Algorithm. Mendel '02, In: Proc. 8th International
Conference on Soft Computing Mendel'02,
Brno, Czech Republic,
2002, 93-101., ISBN 80-214-2135-5
- 8
- Zelinka I.: Analytic Programming by Means of Soma Algorithm. ICICIS'02, First International Conference on Intelligent Computing and Information Systems, Egypt, Cairo, 2002,
ISBN 977-237-172-3
- 9
- Zelinka I., Oplatkova Z.: Analytic Programming - Comparative Study. CIRAS'03,
The second International Conference on Computational Intelligence,
Robotics, and Autonomous Systems, Singapore, 2003