Using Genetic Algorithms for Solving the Comparison-Based Identification Problem of Multifactor Estimation Model ()
1. Introduction
Identification of the object mathematical model is to determine its parameters based on experimental investigation of the object. Identification is the most time-consuming and very important operation in the synthesis model.
The classical problem of identification is to determine the mathematical model y = F(x) of the object which consists of determining the transformation rules of the input x into output y or more precisely the form and parameters of operator F. Such identification is called direct because it is based on direct quantitative measurement of input and output signals of the object. However, in some cases, there is a need to identify an object, when the researcher has no direct access to information about the output signal. The objects considered in this paper, are assumed to be of this type.
In different situations, estimates given by the person to one or other properties of an object are subjective and cannot be directly measured by any physical devices. In such cases, the classical methods of the direct identification are not applicable. Alternative methods are indirect identification. The most convenient and widely used among these methods is the comparison-based identification [1].
2. Statement of the Problem
Suppose we have a set of alternatives (solutions) X = {xj}, , each of which is characterized by a set of individual criteria (characteristics) ki,. The values of individual criteria are clearly defined. Based on the analysis of this information a person shall select the most preferred solution from the set of solutions X, for example, i.e. he sets strict order relation on the set of alternatives X:
.
It means that, according to the utility theory [2], which postulates the existence of scalar quantify the preference of any alternative we can write:
, (1)
where—individual scalar evaluation of the usefulness of the alternatives.
On the basis of this information it is necessary to synthesize the mathematical model of individual choice of the decision maker, i.e, a model of generalized utility formation.
Currently, the most widely used two forms of utility functions are: the additive:
(2)
and multiplicative:
(3)
where λi isomorphism coefficients indicating dimension, significance, possible values range, partial criteria that lead to the isomorphism type.
The most informative situation is one in which the coefficients of isomorphism are given numerically. Since λi is a constant, then (3) can be rewritten as follows:
(4)
Analysis of (4) shows that the multiplicative estimation does not take into account the “weights” of partial criteria, since the product is a constant scaling multiplication factor and does not affect the relationship of different solutions. Therefore, additive utility function is more universal and widely used.
Equation (2) makes sense only if takes into account the importance of individual criteria and are at the same time isomorphism coefficients. Most often, defining such coefficients is a big problem, so it was decided to represent the additive utility function in the form:
(5)
where—dimensionless relative weight coefficients that satisfy the following restrictions:
, , (6)
and is normalized, i.e. transformed to the isomorphic type partial criteria.
The normalization is performed by the following formula:
(7)
where—value of the private criteria;,— the best and worst value (accordingly) of the private criteria that is among the domain of admissible values.
In such a way the problem of utility function synthesis reduces to the parametric identification of the relative importance coefficients. Expert evaluation methods or comparison-based identification methods are used for this purpose [3].
An additive utility function disadvantage is that it does not consider the possible nonlinear dependence of the utility function on the individual criteria absolute values and their mutual influence.
Great theoretical and practical interest is the solution of the general structure-parametric identification problem of the individual evaluation model under less restrictive assumptions about the structure of the model.
For this purpose the Kolmogorov-Gabor polynomial is suggested as a possible structure class.
, (8)
and genetic algorithms as a method for solving the general structure-parametric identification problem.
This approach allows us to describe any nonlinear dependence and does not impose any apriority restrictions on the additive or multiplicative utility functions, since polynomial (8) contains in its composition the first as well as higher degrees of characteristics and all their possible combinations.
3. Optimal Complexity Model Definition
The aim of the solution the comparatory structure-parametric identification problem is to synthesize an optimal complexity model, which provides the minimum error of approximation criteria of experimental data output model.
Any sequence of N experimental data can be accurately approximated by an N − 1 degree polynomial by solving a system of normal algebraic equations. However, this approximation does not mean that an adequate, high accuracy model with good prognostic features is synthesized. This is due to the fact that experimental data contain measurement and other uncontrolled random errors. Therefore, the polynomial of high complexity, not only approximates the desired signal, but random errors of experimental data as well. To overcome this drawback in [3,4] splitting the sample of experimental data into two sets of data: training and testing is proposed. The first subset is used for the synthesis of the model and determine its characteristics, for example the method of least squares, and the second—to check the accuracy of the model. It was found that increasing the complexity of the model improves the accuracy of approximation of the test sequence of the experimental data until it reaches a minimum, and then begins to decline due to the inclusion of “harmful” random components. Model, which gives a minimum test sequence approximation error, was named as the optimal complexity model [3].
This raises the problem of choosing criteria of accuracy evaluation of the mathematical model. In the case of the classical identification the most commonly used criteria is the least squares, for the implementation of which numerical input and output experimental data is necessary. In the case of identification of multifactor estimation model, as noted above, quantitative information about the output effects is not available. In this regard, a number of specific problems, considered below, came to the surface.
4. Solving the Comparatory Identification Problem by the Genetic Algorithms Method
In the above formulation, the comparatory structure-parametric identification problem can be solved by different methods and algorithms. But common to all of them is the need to implement a sequence of procedures:
• generation of the model structure;
• defining the quantitative values of its parameters;
• assessing the quality of the model.
Various combinations of algorithms, of different precision, complexity, versatility, for the first and second stages are possible. To obtain perfectness and versatility evaluation criteria in the field of the methods application there is a need for their investigation. With the help of computer experiments genetic programming algorithms were synthesized and investigated.
Genetic Algorithms (GAs) are based on the mechanisms of natural selection and implement a scheme of “survival of the fittest” among the considered structures, shaping and changing the search algorithm based on modeling the evolution of search. In each generation a new set of artificial sequences is created using part of the old set and the addition of new parts with “good properties” [5-9].
GA starts with a random set of solutions called population. Each element of the population is called a chromosome and represents a solution to the problem. The chromosomes evolve over multiple iterations, bearing the name of generations. In the process of iteration chromosome is estimated using the fitness-function [6,7,10].
In solving the problem of structure-parametric identification on the first step a population of chromosomes, describing the structure of the model is created. This is done by selecting a class of admissible structures. Kolmogorov-Gabor polynomial, taking out the free term and limiting it to only linear and quadratic terms (squares and pair wise combinations of variables), was chosen as this class. Then the polynomial will be written as follows:
(9)
It means that for n partial criteria the complete polynomial will have
(10)
terms, where is the number of combinations and is equal to:
(11)
Consequently, each chromosome of the population must contain N bits.
The validity of imposing such limitations on the complexity of the polynomial is based on the fact that after the normalization by formula (7) all partial characteristics have values. Squaring these numbers or the multiplication of any two of them lead to a rapid decrease in the values. In addition, each term of the polynomial is multiplied by a coefficient
(because).
Based on the fact that the calculation of utility function and weights with accuracy higher than two decimals is impractical, it can be concluded that it is impractical as well to include terms higher than the second order.
After the generation of the chromosomes population, which describes the model structure, in the second stage, for each of them, a parametric identification is provided by one of the following possible methods:
• Method of determining the Chebyshev point on the polyhedron described by the system of inequalities , [5];
• The genetic algorithms method.
The first of these methods is described in [2,5] and will not be considered. The implementation of the genetic algorithm is as follows. For each chromosome of the population, which characterizes a model structure, we determine the number of coefficients equal to the number of units in the chromosome. By definition, the coefficients must satisfy the following conditions:
, (12)
and is expressed to two decimal places. Hence the number of bits that must contain the chromosome of each coefficient is equal to [8,10]:
, (13)
where [ai, bi]—interval range of, pointed in (12), and resultant chromosome of all the coefficients ai, is
(14).
For each chromosome of the structure population we form population of chromosomes coefficients ai and solve the problem of the genetic selection of these coefficients values that maximize the match function. As a match function the number of satisfied inequalities of (9) can be taken. If necessary it can be divided into training and testing sets.
On the established populations an iterative procedure of genetic selection on the first and second populations to achieve the best value of the match function is implemented.
Example: Let us assume a situation where a decision maker has to choose the best option among five alternatives of computer systems with four partial criteria: processor frequency, memory size, hard disk capacity and price.
The decision maker represents the situation by constructing Table 1.
After that the maximum and minimum values of each criterion are defined and the quantitative normalized partial criteria are calculated by formula 7.
As a result of the above mentioned operations we get table 2 which represents the set of the alternatives with their normalized partial criteria.
Next, the DM selects the best, in his opinion, alternative. Let us assume that the DM chooses the fourth alternative. Next step is to calculate the additive utility function. First the weight coefficients are calculated and then inserted in the linear Kolmogorov-Gabor polynomial. Thus the additive utility function is expressed as:
(15)
We start the procedure of genetic algorithms.
Let there be two chromosomes: a parent that contains the complete Kolmogorov-Gabor polynomial (Figure 1), and a child that contains only the components of the first term (Figure 2).
Child chromosome will be for us the resultant, which is the shortest polynomial satisfying the condition
(16)
This condition (16) will be the criterion on which we will carry out the selection.
Next, using the above mentioned method of genetic algorithms to solve the comparison-based structural-parametric identification we get variant of a child chromosome that meets criterion (16).
The next step is to choose an optimal length utility function represented as the Kolmogorov-Gabor polynomial. That is, the shortest polynomial that satisfies (16)
Table 1. The set of alternatives with their partial criteria.
Table 2. Alternatives with normalized criteria.
and at the same time has the maximum utility function. To do this, we introduce one more condition:
, (17).
Thus, after finding the optimal length KolmogorovGabor polynomial satisfying (16), we check that polynomial. For the chosen alternative substituting partial criteria we obtain the following lengths:
(18).
Hence the utility functions of the different alternatives are: R4 = 1; R3 = 0.429; R2 = 0.4277; R5 = 0.04; R1 = 0.04.
The utility function of alternative R4 is maximum, and other alternatives are worse and thus the problem is correctly solved.
5. Conclusion
The use of nonlinear methods for the identification of the multifactor estimation model showed that the use of a new technique, using as a utility function the nonlinear Kolmogorov-Gabor polynomial and the use of the genetic algorithms to calculate the weights gives a considerable saving in time and accuracy performance. It is as well simpler and more evident for the decision maker than other methods.