1. Introduction
As a widely used statistical method, regression analysis is playing a more and more important role in model establishment and prediction evaluation. The traditional regression analysis requires accurate data, but in practice, the information obtained in many cases is not accurate. In this regard, Zadeh [1] founded the fuzzy set theory in 1965, which provided theoretical support for the fuzzy regression analysis. In 1980, Dubois [2] proposed to take LR fuzzy numbers to represent the input, output or parameters of the system. On this basis, scholars combined with the extension principle to promote the application of fuzzy linear regression (FLR) model in practical problems.
For the first time, Tanaka [3] set up an FLR model with parameter that was fuzzy number, which adopted linear programming method for parameter estimation with the criterion of minimum of fuzziness index. They regard the fuzzy parameters as the reflection of the estimation deviations in the linear function. However, Tanaka’s model can’t be widely applied, because the model requires crisp input. But in some case, the input is fuzzy number. Thus this approach was later improved by Sakawa [4] , they proposed the multi-objective linear regression analysis of fuzzy input and fuzzy output, and established an interactive decision-making method based on linear programming to solve the multi-objective linear programming problem.
In 1988, Diamond [5] set up an FLR model with triangular fuzzy number, which adopted least squares method for parameter estimation with the criterion of minimum of deviations between the observed values and the estimated values. In 2002, Wu [6] proposed the method for obtaining the fuzzy least squares estimators based on extension principle in fuzzy sets theory. According to the usual least squares estimators, he also constructed the membership functions of fuzzy least squares estimators. In the same year, Yang [7] combined linear programming based methods with fuzzy least squares methods, and proposed two methods for estimating fuzzy parameters: approximate-distance and interval-distance fuzzy least-squares. Modarres [8] regarded the predictability of linear programming approach which is not so satisfactory and the computation of the fuzzy least squares method is complicated. Therefore, they develop three mathematical programming models called risk-neutral, risk-averse and risk-seeking problems.
In recent years, scholars have improved and promoted the fuzzy regression method, but some methods are not rigorous when it comes to error estimation and sometimes there are some special requirements for the object of study, such as the observations should be symmetric fuzzy triangular fuzzy number, constraints that the regression parameters should be non-negative. Moreover, in multivariate fuzzy linear regression, their model results are not so satisfactory due to the huge difference between the two inputs. Therefore, Roldan [9] proposed the concept of family of fuzzy semi-distances between fuzzy numbers and combined with least squares method to obtain regression parameters. Then G. Alfonso [10] introduced a fuzzy regression procedure involving a class a fuzzy numbers defined by some level sets called finite fuzzy numbers.
The main purpose of this paper is to introduce a fuzzy regression method using the fuzzy distance between left right fuzzy numbers. In order to do this, in Section 2 some preliminary theories of left-right fuzzy distance and partial order are given. In Section 3 some properties of LR fuzzy distance are studied and the concrete formula for calculating the fuzzy distance is determined. In Section 4, the fuzzy distance is used as the mean fuzzy error, then the minimized mean fuzzy error is used as the objective function and the stepwise regression is used to solve it. In Section 5, we apply the proposed model and other previous models in four different types of examples, and compare the models by SSE and mean fuzzy error based on the partial order.
2. Preliminaries
We will use the following notion about fuzzy number. Let
,
and
.
Definition 2.1. (Aguilar [9] ) A fuzzy set on
is a map
. A fuzzy number (for short FN) on
is a fuzzy set
on
such that, for all
, the α-level set (or α-cut)
is a non-empty, closed subinterval of
. The kernel of a FN
is ker
and we will only consider FNs with compact support, its support is the closure
.
Let
be the set of all FNs (with compact support). Thus, for each
the α-level set
of
is a compact subinterval of
that can be expressed as
, where
is the inferior extreme and
is the superior extreme of the interval
. Following this notation, we will also denote the support of
by
. The number
is the center of the FN
, and its radius is
.
Proposition 2.1. (Wu [6] ) Let
and
be two fuzzy numbers. Then
and
are fuzzy numbers. Furthermore, we have
(1)
(2)
Definition 2.2. (Dubois [2] ) A (generalized) left-right fuzzy number (LRFN) is an FN
, where
(corners of
),
, defined by
(3)
where
are strictly increasing, continuous mappings such that
and
. Clearly, the kernel of
is
and its support is
. Let
be the family of all LRFN.
Triangular fuzzy number are special cases of LRFN (denote them by
) with
for all
and
. To be short, we will denote triangular fuzzy number by
. Let
be the family of all TFN.
Proposition 2.2. Given
and
are strictly increasing, continuous mapping.
, then there exists a unique LRFN
such that
(4)
(5)
Definition 2.3. (Alfonso [10] ) A function
is a left-right fuzzy random variable if its representation
is a random vector. The expected value of a left-right fuzzy variable
is the unique fuzzy set
in
whose representation is
.
A partition of the interval
is a set
such that
. The simplest partition of
is
. If
is a partition of
and
is a mapping defined on
, we will denote, for all
,
(6)
Therefore, for each
the α-level set
of
, we have
(7)
(8)
Definition 2.4. (Hierro [11] ) Let
be a point of a set S provided with a partial order
. Consider the set
and let
be a mapping. A distance function on
(or a metric) is a mapping
verifying, for all
, we have
1)
;
2) if
, then
;
3)
;
4)
.
we also say that
is a metric space w.r.t. the partial order
. The function
is :
1) a pseudometric if it satisfies 1), 3) and 4);
2) a semimetric (on
) if it satisfies 1) - 3);
3) a pseudosemimetric (on
) if it satisfies 1) and 3).
Definition 2.5. (Hierro [11] ) For
, define
w.r.t.
only if
,
,
and
for all
.
Theorem 2.1. (Hierro [11] ) For
, if
, then
and
, for all
.
Theorem 2.2. (Hierro [11] ) The relationship
on
w.r.t.
is reflexive, transitive and antisymmetric, then
is a partial order on
.
Definition 2.6. (Hierro [11] ) Let
be two standard negation on
and let
,
,
and
be pseudo-semimetrics on their respective domains. For
and
, define
(9)
(10)
be the only LRFN determined by its α-cuts.
3. On Several Characterization of Fuzzy Distance and a Distance Measure Between LRFN
Theorem 3.1. If
are the standard negation,
, then
. In addition, if
, then
is a semimetric on
.
Proposition 3.1. (Hierro [11] ) If
, and
are the standard negation, then D verifies the following properties for
:
1)
;
2) if
, then
;
3)
;
4)
.
whatever the metrics
and the partition
.
Therefore, D is a metric on
.
Proposition 3.2. Let D be a metric on
,
, then we have:
1)
;
2) If
when
where
, then
,
;
3) If
where
, then
.
Proposition 3.3. If
satisfy
w.r.t.
, only if
,
,
,
.
Theorem 3.2. Assume that, Definition 2.6, we choose
,
are the standard negation,
, and
, for all
in their respective domains.
,
, then we define the distance measure
as:
(11)
where
(12)
Proof. If
and
, we deduce from (9)-(10) that, first of all, notice that
(13)
(14)
On the other hand, by (7)
(15)
Similarly by (8)
(16)
Then we have
(17)
(18)
Finally, by (17)-(18) we have
(19)
4. The Fuzzy Regression Procedure Based on LRFN
In this section, the regression methodology of minimizing the mean fuzzy error as the objective function is introduction. The regression parameters is obtained by the least square method based on Theorem 3.2, and Proposition 3.3 to contrast which is the model with least fuzzy error.
Let
is a random vector which have N LR fuzzy input, then we get a crisp random vector
when we put the four corners of each fuzzy input as explanatory variables, so we have a new crisp random vector
. The LR fuzzy response variable is
(suppose there are n samples). We will analyze the relationship between
and X. The regression model we consider can be formalized as:
(20)
where
are the residuals (i.e., real-valued random variables such that
) and the estimated variables
be formalized as:
(21)
there
(
represents the kth corner) are regression parameters for
and
respectively.
Consider the distance measure D defined in (11), we will minimize the distance between the observed and the estimated values as the objective function. In other words, we use the mean fuzzy error
as the objective function and it be formalized as:
(22)
that is, we are looking for a function
such that the mean fuzzy error
(22) is as small as possible w.r.t. the partial order
w.r.t. on
. If the objective regression function
is given by (21) then the mean fuzzy error (23) is:
(23)
where
As a consequence, we must minimize each component to obtain the optimal solution. To sum up,
, four corners of
can be relates to
.
For each of the possible combinations of
we calculate a mean fuzzy error using the previous equation. Finally, we sort the fuzzy models using the partial order
and, for the optimal solution of the fuzzy regression problem, we choose the fuzzy model with the lowest fuzzy error.
The main purpose of this paper is to develop a LR fuzzy methodology that can be considered easy to understand and powerful. Based on this, we will consider using the stepwise linear regression method to solve the problem. Stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. Eliminate unimportant explanatory variables to ensure that only significant variables are included in the regression equation before each new variable is introduced. The final set of explanatory variables is optimal. We propose to consider models obtained by this approach in the solution of the fuzzy problem.
Applying the previous methodology, we will obtain the fuzzy regression models
. To evaluate the goodness-of-fit of the different models, we considered two numerical estimations of the following statistics:
1) Mean fuzzy error
are given by (23);
2)
.
5. Numerical Example
Example 1. Triangular fuzzy observations
In this example, the fuzzy input-output data from Sakawa [4] are used. One explanatory variable and the dependent variable for all observations are represented as triangular fuzzy numbers, as listed in the left part of Table 1.
We use these data to regress the fuzzy response variable
about the fuzzy exploratory variable
. This problem can be reduced to one that can be solves with the methodology described in the previous sections and considering the crisp random vector
as the exploratory random variable.
In order to search for a suitable fuzzy regression model capable to express the statistical relationship between
and X, we consider the methodology explained previously. We used the stepwise method to select the explanatory variables which are needed to build the model.
Therefore, proposed method (PM) with the lowest error is given by
Table 1. Comparison of the estimations errors from various models and criteria for Example 1.
,
following Sakawa [4] ’s (
) work, their model (SYM) is constructed as
,
Yang [7] ’s model (YLM) as
,
Kao [12] ’s model (KCM) as
,
Chen [13] ’s model (CHM) as
,
Wu [6] ’s (
) model (WuM) as
,
Modarres [8] ’s model (MEM) as
.
Figure 1 depicts the observed values and the fuzzy estimated values of each model. The X axis represents the number of the input, not the input value. The Y axis represents the value of output
. In view of the fact that the fuzzy number can’t be represented as a crisp in a coordinate system, we use a line segment with three vertices to represent a fuzzy number, in which the line segment from the top to the bottom represents the right support value(
), the
Figure 1. The predictive value of each model by Example 1.
kernel value(
), and the left support value(
), respectively. Therefore, when the graph of the model is closer to the output, the fitting effect of the model is better. Right part of Table 1 also lists the estimation errors from the earlier models based on the two criteria,
and SSE. To ranking
by Proposition 3.3 and compare SSE, the table indicates that the performance of the proposed approach is satisfactory among the models in terms of total estimation error based on the two criteria.
Example 2. One crisp explanatory variable and non-triangular fuzzy response variable
From Example 1, it was found that if data were single variable symmetric triangular fuzzy numbers, the performance of these models were similar, except for the SYM. So there were a few changes in the data [12] of this example. The input was a single variable real number and the output was LR fuzzy number.
Since SYM perform badly in Example 1, it would no longer be added to the comparison.
Following Yang [7] ’s work, their model is constructed as
,
in addition, Wu [6] and Chen [13] respectively, built up their fuzzy regression models as
,
,
Kao [12] ’s model as
,
Proposed Method’s model as
.
It can be seem from Figure 2 that the estimated values of several models are almost the same, and in the right of Table 2 showed that PM and YLM, CHM, KCM have almost the same fuzzy error, but the fuzzy error of WuM is relatively large. And the estimated value of Wu’s model was still crisp in this case, this would be a consequence of that the parameter of WuM was crisp. PM model and YL et al. model parameters were all fuzzy numbers, thus PM model could be applied widely.
Example 3. Two explanatory and response variables both are asymmetrical
Since the input of the first two examples was single variable, two independent variables were taken to this example. The input and output were triangular fuzzy number, and the numerical difference between the two independent variables was larger. Consisted of 15 fuzzy observations with two fuzzy explanatory variables
,
and one fuzzy response variable
, which is listed in the left part of Table 3 from Chen [13] .
Figure 2. The predictive value of each model by Example 2.
Table 2. Comparison of the estimations errors from various models and criteria for Example 2.
Table 3. Comparison of the estimations errors from various models and criteria for Example 3.
In Example 2, YLM, CHM and PM model had the same fuzzy error, and KCM had larger fuzzy error. Therefore, KCM was not as good as PM. Thus, KCM was removed from the comparison. Following Yang [7] ’s work, their model is constructed as
,
in addition, Wu [6] and Chen [13] respectively, built up their fuzzy regression model as
,
.
We use these data to regress the fuzzy response variable
about the fuzzy exploratory variable
,
. This problem can be reduced to one that can be solved with the methodology described in the previous sections and considering the crisp random vector
as the exploratory random variable. Then we used the stepwise method to select the explanatory variables.
Proposed Method:
where
,
,
.
From the perspective of fuzzy error, PM was significantly better than other models. It can be seen from Figure 3 that there was little difference in the fitting kernal values of these models, but the estimated values are quite different from the observed values. As seen from Figure 3, the support value (x11, x14) of estimations of CHM, YLM and WuM were much longer than those of observations.
Example 4. Real-life data
When the input was a single crisp variable, PM performed quite equal to other models; when input was fuzzy multi-variable, PM is significantly better than
Figure 3. The predictive value of each model by Example 3.
other models. Thus, it would make sense to see whether our model would perform well when the input was crisp multi-variable. A set of real-life data are adopted in this example to demonstrate the proposed solution approaches for the fuzzy regression problem, which is listed in Table 4 from Wei [14] . Here we give a numerical example of the relationship between the heat released by a certain cement
and the two chemical components
,
.
Following Yang [7] ’s work, their model is constructed as
Chen [13] built up their fuzzy regression models as
In addition, Kao [12] ’s model as:
Proposed Method:
where
Table 5 lists the estimation errors from the earlier models based on the two criteria,
and SSE. Figure 4 also depicts the observed values and the fuzzy estimated values of each model. The same as that of Example 2 is that the input of this example is also crisp and the mean fuzzy error
and SSE of our model are slightly greater than or equal to the error of the YLM. What’s different from Example 2 is that this example has two independent variables, which distinguishes YLM and PM from CHM and KCM. The error of CHM and KCM is obviously larger than that of YLM and PM.
Table 4. Crisp input and fuzzy output data set for Example 4.
Table 5. Comparison of the estimations errors from various models and criteria for Example 4.
Figure 4. The predictive value of each model by Example 4.
6. Conclusion
The previous four different types of examples suggested that when the data structure was simple, for example, in Example 1 the data were single variable symmetric triangular fuzzy number, in Example 2 the input was a single variable crisp and the output was LR fuzzy number, in Example 4 the input was multivariate crisp and the output was LR fuzzy number, PM was not significantly better than any other model, but also not inferior to other models. When the data structure was complex, for example, in Example 3 the input was multivariate LR fuzzy number and the output also was LR fuzzy number, PM was significantly better than other models. This could be explained by those previous models which did not investigate the influence of the left and right values of input on the center value of output or the center value of input to the left and right values of output. In practice, there is more than one factor that affects the dependent variable, which indicates that our model could have a wider range of application.