Robust Regression Analysis with LR-Type Fuzzy Input Variables and Fuzzy Output Variable

In this paper, we propose a fuzzy linear regression model with LR-type fuzzy input variables and fuzzy output variable, the fuzzy extent of which may be different. Then we give the iterative solution of the proposed model based on the Weighted Least Squares estimation procedure. Some properties of the estimates are proved. We also define suitable goodness of fit index and its adjusted version useful to evaluate the performances of the proposed model. Based on the Least Median Squares-Weighted Least Squares (LMS-WLS) estimation procedure, we give robust estimation steps for the proposed model. Compared with the well-known fuzzy Least Squares method, the effectiveness of our model on reducing the outliers influence is shown by using two examples.


Introduction
Fuzzy linear regression analysis is a well-known method for seeking the fuzzy relationship between inputs and output data.Fuzzy linear regression is useful in a fuzzy domain where model parameters and/or data are fuzzy, or imprecise, or vague.The main approaches of fuzzy linear regression are Possibilistic concepts introduced by Tanaka et al. [1] and Least-Squares (LS) approach that extends the LS criterion to fuzzy setting [2].The probabilistic approaches mainly involve a linear mathematical programming method and their aim is to cover the spreads of the output up to an h-level [3].On the other hand, in the least squares, the objective is to maximize the model fitting measure between the estimated outputs from the estimated model and the observed outputs.For contributions on this subject see Refs.[2] [4]- [10].The LS method has several theoretical and applicative advantages, but it has a critical drawback, because it is extremely sensitive to the presence of outliers.In the fuzzy regression literature, the outlier problem has been solved with regard to both outlier detection criteria and robust estimation procedures.In the following, we briefly illustrate some contributions on robust estimation procedures.
Watada and Yabuuchi [11] propose a robust fuzzy regression model based on a hyperelliptic function.Chang and Lee [5] have suggested generalized fuzzy weighted least squares method for an outlier condition, making weighted with degree of membership and lean on an interaction with the decider.For a simple regression, Yang and Ko [12] suggest weighted fuzzy at the least squares of analyzed iterative algorithm, which has two stages.Oussalsh and Schutter [13] make use of Least Trimmed Squares (LTS) and Least Median Squares (LMS) for the fuzzy regression model, and study the performance of the proposed model when data is contaminated by outliers.Yang and Liu [14] have suggested the fuzzy least squares for models of fuzzy interaction linear regression.This algorithm is robust against the outlier for simple regression.Şanli and Apaydin [15] propose a robust estimation procedure for fuzzy linear regression model with fuzzy input-output based on the least median squares.
In recent years, there is a growing literature that is related to robust fuzzy regression in a fuzzy domain.Varga [16] presents two robust estimations of unknown fuzzy parameters in fuzzy regression model, and investigates the relationship between the proposed models both for fuzzy and non-fuzzy regression analysis.Choi and Buckley [17] utilize the Least Absolute Deviations (LAD) for the fuzzy regression model, and investigate the performance of the model when data contains fuzzy outliers.Kula and Apaydin [18] propose a robust fuzzy regression analysis based on the ranking of fuzzy sets.On the basis of Modarres et al. [19] [20], a robust nonlinear fuzzy regression model using multilayered feedforward neural networks where weights, biases, input and output variables are assumed to be fuzzy is presented by Nasrabadi and Hashemi [21].Hu [22] suggests a genetic-algorithm-based method for determining two functional-link nets for the robust nonlinear interval regression model.A robust version of a spline-based estimate is presented by Maronna and Yohai [23], which has the form of an MM-estimate.D'Urso et al. [24] propose a robust fuzzy linear regression model with crisp inputs and fuzzy outputs based on the least median squares-weighted least squares estimation procedure.Based on the least trimmed squares estimation, Chachi and Roozbeh [25] propose a estimation procedure for determining the coefficients of the fuzzy regression model for crisp input-fuzzy output data (see also D' Urso et al. (2011) for a list of possible references on the topic of robust fuzzy regression analysis).
The rest of the paper is organized as follows.In Section 2, we set up the fuzzy regression model for fuzzy input variables (explanatory variables or independent variables) and fuzzy output variable (dependent variable or response variable) according to Refs.[8] [9].Then, in Section 3, the estimation procedure is described.This is based on the Weighted Least Squares (WLS) principle.WLS objective function is defined (Section 3.1).An iterative WLS solution is shown in section 3.2 and some relevant properties of this solution are proved in Section 3.3, while in section 3.4 special case of model is discussed.In Section 4, we introduce some goodness of fit indices to assess model fitting.In Section 5, by considering the Least Median Squares and the Weighted Least Squares (LMS-WLS) approach, we give steps of the LMS-WLS estimation procedure with fuzzy output variable and fuzzy input variables.Section 6 reports an example and a simulation study to illustrate the effectiveness of our model in presence of outlier.Finally, Section 7 contains concluded remarks.

The Linear Regression Model with LR-Type Fuzzy Input Variables and Output Variable
Let consider a fuzzy output variable Y and p fuzzy input variables { } , where m is the center, l and u the left spread and right spread respectively; j X is also a LR-type fuzzy variable: , where j S is the center, j V and j Z the left spread and right spread of the jth LR-type fuzzy input variable.Let , m l and u be the vectors of the observed centers, left spreads and right spreads, respectively.Firstly, we model the observed centers and lower and upper boundary of the response variable, as sums of unknown theoretical values and of their respective residuals: where S , V and Z are ( ) ( ) × + matrices composed by the unit column and the centers, left spreads and right spreads of the fuzzy input variables, respectively; r , a and s are column ( ) 1 p + -vectors con- taining the regression parameters relevant to the centers, left spreads and right spreads of the fuzzy explanatory variables, finally 1 denotes the column ( )

Distance and Objective Function
In some cases, it may happen that the membership functions of the dependent variable vary across the observation units.This can occur if we allow for different levels of uncertainty associated with each response: for instance, a person might be extremely sure about her/his opinion, but another one might be rather uncertain.These levels of uncertainty may then correspond to square root and parabolic membership functions, respectively.The very common triangular membership function can be seen as expressing a medium level of uncertainty.Based on the above consideration, according to the WLS criterion, once weights are determined, the parameters of the model (2) should be estimated by the minimizing the weighted squared distance between the observed values of the response variable Y, and the corresponding estimated values where the influence of the shape of the relationship function on the distance is embodied in the matrices Λ and P , Λ and P are diagonal matrices of order n, whose diagonal elements are ( ) . w is the weighted norm and W is a diagonal matrix, whose elements are the weights i w .On the basis of distance, we can set the WLS objective function in terms of the parameters , , , , , , b d g h a r s of the model.

Iterative Weighted Least Squares Solution
In order to solve minimize (4), we equate to zero the partial derivatives of   ) ) An iterative solution of the above system can be based on the following set of equations, orderly derived from Equations ( 5)- (11).

Properties of the WLS Solution of the Proposed Model
In this section we will prove some propositions showing useful properties of the WLS solution illustrated in Section 3.2.

Special Case of the Model
In the symmetric case ( ) Therefore we iterate the procedure described in section 3.2.We derive the following symmetric iterative weighted least squares solutions.

Goodness of Fit
In this section, in order to measure the goodness of fit for a multiple regression model with LR-type fuzzy output variable and fuzzy input variables, we define the coefficient of determination and its adjusted version.Definition 1 For the LR-type fuzzy output variable , we define: The total weighted deviation of fuzzy output variable, given by the weighted total sum of squares: where m , l and u are the weight mean values of , m l and u , respectively, ( ) ( ) ( ) The weighted deviation "explained" from the model, given by the weighted regression sum of squares: The residuals weighted deviation, i.e. the deviation not explained from the model, given by the weighted sum of squares of errors:

+ m Pu
to its three squared norms, respectively: To prove the decomposition (43), we have to verify that the following term is null: After a little algebra, we can write (44) as which is null, taking into account the finding of proposition 1, proposition 2, (20) and (21).
Definitions 2 The goodness of fit index for the model (2) estimated by WLS is defined as follows: Given the relationship between SST W , SSR W and SSE W , we also have that: . When 2 R 0 = , the model does not explain any of the variability of LR-type fuzzy response variable.Conversely, we have 2  R 1 = when the model interpolates perfectly all the observations.Therefore, an estimated model is satisfactory, in the sense of the fit to the observed data, when 2 R 1 ≈ .Definition 3 The adjusted coefficient of determination is defined as follows: ( ) The adjusted 2 R contains a correction factor based on the number of regression coefficients.The adjusted 2 R can be negative, and its value is always less than or equal to 2 R .

Steps of the LMS-WLS Estimation Procedure
In this section, we illustrate the steps of the suggested robust estimation procedure based on the Least Median Squares-Weighted Least Squares (LMS-WLS) [24], LMS is used to give the initial solution of WLS to ensure robustness of the model: 1.Given n observations on one LR-type fuzzy dependent variable and fuzzy independent variables, we randomly select a sub-sample of ( ) , , , , and  b d g  h  a r s are estimated based on the selected sub-sample, by means of ( 12)-( 18) when setting = W E. 3. At the first step, the estimators * * * * * * * And then to compute the squared residuals: 4. Finally, we compute the median of the estimated squared residuals: In order to enhance these estimates, we employ the WLS procedure, assigning to each observation a weight.A simple way to weight observations on the basis of residuals [24] is: where 2 i r is the ith (squared) residuals obtained from LMS: ( ) 2 î r σ are the standardized residuals, and c is a constant (usually,

c =
). WLS requires several iterations of solution ( 12)- (18).To initialize the recursive solution, we take the optimal estimates obtained with LMS as the starting points.

Numerical Experiment
In order to evaluate the proposed model, we show two examples.As for the WLS phase of the estimation procedure, weights (48) are assigned to data, putting 2.5 c = .

Example 1
In this example, we consider a fuzzy linear regression model, in which we consider fuzzy output data and fuzzy input data with a triangular fuzzy number, putting 0.5 , 0.5 = = Λ E P E .We have randomly generated two column ( ) , V S from two uniform distributions defined on the intervals [ ] 0,10 and [ ] 0,50 , re- spectively.Then, the fuzzy input and output variable are generated as follows: ( ) N 0,1 b d , and On the sample of 8 units we have simulated a fuzzy output variable and a fuzzy input variable ( ) Y, X , we have contaminated the dataset with one or more outliers, in the centers and/or spreads of fuzzy input variable and/or output variable.The various situations are showed in Figures 1-6.In Figures 1-6, X-axis, Y-axis and Z-axis represent the spread of input variable, the center of input variable and the center of output variable, successively.The panel shows the model of the centers.If the estimates are very good, all points should be on the panel or close to the panel.And Table 1 is reported LS and LMS-WLS estimates, in the first and second column, respectively.
Figure 1 shows the results of the fuzzy regression model obtained with the original dataset, respectively with LS (left panel) and LMS-WLS (right panel).The results are very similar, as can be seen from the value of 2 R and the parameter estimates, reported in the case (a) of Table 1.
It can be noticed that the presence of whatsoever kind of outliers does not affect LMS-WLS estimates, as can be seen from Figures 2-6 and Table 1.
On the contrary, outliers heavily distort LS estimates.For example, in Figure 2(a) we see that the presence of a single outlier in m has troublesome effect on the fitting of the centers model to the data, and produces a large    bias in the parameter estimates of the centers models, as can be seen from the case (b) of Table 1.However, the parameter estimates for the models on the spreads are only slightly affected.
Figure 3(a) illustrates that the presence of single outlier in the spreads of the fuzzy response variable has little effect on the fitting of the centers model to the data, while the LS estimates for the spreads are heavily affected (Table 1, the case (c)).Note that the model fit to the data decreases to a lesser extent than in other situations, since, in the computation of 2 R and 2 R , the weights of the spreads, given by 0.5 = = Λ P E, are lower than the weight of the centers, which is equal to E.
The overall pattern of results remains the same also in the cases where there is an outlier in the spreads or centers of the fuzzy explanatory variable.When we contaminate data with single outlier in center or spread of input variables, LS estimates are distorted for the models of the centers (see Figure 4(a) and Figure 5(a)).We can see from the cases (d) and (e) of Table 1 that the presence of single outlier in the centers of fuzzy input has bigger impact on the parameter estimates.Finally, in Figure 6 we consider the more general situation embodies all previous cases.Both the models of centers and the models of spreads are strongly affected.As a consequence, also the fit performance of the model is quite poor.
As said before, Table 1 reports the parameter estimates for all the cases considered, both for the LS and LMS-WLS model.

Example 2
This example consists of 14 fuzzy observations with two fuzzy explanatory variables and one fuzzy response variable from Wu [27], which is listed in Table 2.In this example, we set three different fuzzy numbers in the dataset, respectively, higher fuzzy extent, median fuzzy extent and lower fuzzy extent [28].The setting is as follows: 2 1, , 5 3 1 6, ,10 2 1 11, ,14 3 LS and LMS-WLS estimates are reported in Table 3, in the first and second column respectively, obtained in correspondence to different types of outliers in the datasets.
LMS-WLS estimates do not noticeably change regardless of the absence or presence of outliers that is the same as the previou-s example, thus proving the effectiveness of the estimation procedure proposed.
If there is an outlier in the centers of output variable (Table 3, the case (b)), LS estimate of the coefficient vectors , , a r s are strongly biased, and the estimates respectively for , , , b d g h are also marginally affected.As a consequence, the goodness of fit is rather low.When we contaminate data with outliers in centers of both input variables and output variable (Table 3, the case (e)), the results are similar.
If we contaminate the vector X (Table 3, the case (c)) with a single outlier, LS produces biased estimates respectively for , , a r s , while the estimates for the model of the spreads are unaffected.If there is an outlier in the left spreads of output variable (Table 3, the case (d)), the estimates for , b d are affected.Similar conclusions a-are drawn when we contaminate the vector of the right spreads of output variable (Table 3, the case (f)).
When we consider the more general cases (Table 3, the case (g) and Table 3, the case (h)), LS estimates are strongly biased with respect to the estimates obtained with the original dataset.As a consequence, also the fit performance of model is quite poor.

Conclusions
The main problem that is investigated in this paper is to give a suitable method to deal with fuzzy data contaminated by outliers, the fuzzy extent of which may be different.In this regard, a fuzzy regression model with fuzzy output and fuzzy inputs has been proposed.Then on the basis of the Least Median Squares-Weighted Least ε ， L ε and U ε are the vectors of residuals and * m , * l and * u are the vectors of the estimated values of the centers and spreads of the response variable.These values are then reparamethrized in terms of the regression model, as follows: r S P WSr s Z P WZs a V P WSr a V P WZs r S P WZs Va Sr Zs PW m Va Sr Zs Pu P1

≡
where the LL-type fuzzy input variables are indentified by the two parameters and similarly, LL-type fuzzy output variable is identified by the two parameters m and l,

Propositions 3 Proof
The total weighted deviations of ( )Y , , LR m l u =, SST W is equal to the weighted regression sum of squares, SSR W , and the weighted sum of squares of residuals, SSE W : The expression concerning SST W can be developed as follows by adding and subtracting 4 are repeated until convergence is achieved.At the kth iteration, we obtain, the estimators If the median of the estimated squared residuals at the kth iteration is lower than the one obtained at the of LMS procedure.
is the robust estimate of the scale of resi- duals[26],

Figure 1 .Figure 2 .
Figure 1.Estimated model of the centers on the original dataset with LS (a) and LMS-WLS (b).

Figure 3 .
Figure 3.Estimated model of the centers with LS (a) and LMS-WLS (b) after contamination of 4 35 l = .

Figure 4 .
Figure 4.Estimated model of the centers with LS (a) and LMS-WLS (b) after contamination of 72 50 v = .

Figure 5 .
Figure 5.Estimated model of the centers with LS (a) and LMS-WLS (b) after contamination of 42 60 s = .
Squares estimation procedure, we introduce a robust version of the proposed model.In order to analyze the performance of our model, we also suggest a suitable goodness of fit index, and its adjusted version, which is effective for the model selection.The proposed model was applied in two examples, the results of which show that

Table 1 .
Estimated coefficients, 2 R and 2 R of the models with LS and LMS-WLS in the uncontaminated and contaminated cases.

Table 3 .
Estimated coefficients, 2 R and 2 R of the models with LS and LMS-WLS in the uncontaminated and contaminated cases.