^{1}

^{*}

^{2}

The graphical representation method, Robust CoPlot, is a robust variant of the classical CoPlot method. CoPlot is an adaptation of multidimensional scaling (MDS), and is a practical tool for visual inspection and rich interpretation of multivariate data. CoPlot enables presentation of a multidimensional dataset in a two dimensions, in a manner that relations between both variables and observations to be analyzed together. It has also been used as a supplemental tool to cluster analysis, data envelopment analysis (DEA) and outlier detection methods in the literature. However, this method is very sensitive to outliers. When a multidimensional dataset contains outliers, this can lead to undesirable consequences such as the inaccurate representation of the variables. The motivation is to produce Robust CoPlot that is not unduly affected by outliers. In this study, we have presented a new MATLAB package RobCoP for generating robust graphical representation of a multidimensional dataset. This study serves a useful purpose for researchers studying the implementation of Robust CoPlot method by providing a description of the software package RobCoP; it also offers some limited information on the Robust CoPlot analysis itself. The package presented here has enough flexibility to allow a user to select an MDS type and vector correlation method to produce either classical or Robust CoPlot results.

CoPlot method, introduced by [

Among the wide spectrum of graphical techniques for the treatment of multidimensional dataset, CoPlot method has attracted much attention in recent years in a wide range of areas for various purposes. CoPlot is used for geome- trical representation of multi-criteria decision problems [

Although it is increasingly popular for applications involving multidimen- sional datasets, CoPlot method is sensitive to the outliers. To obtain reliable results, a graphical representation is needed that accounts for the presence of outliers. If the dataset contains outliers, the representation of the variables may deviate strongly from those obtained from the clean data in CoPlot method. Aim of Robust CoPlot method is to reduce impact of outliers and try to fit the bulk of the data [

In this paper, we present the RobCoP package for MATLAB [

The paper is organized as follows: Section 2 briefly introduces the Robust CoPlot algorithm, and Section 3 gives details about RobCoP written as a set of MATLAB functions. In Section 4, two examples are provided for the application of the package.

The Robust CoPlot method mainly consists of three steps. In order to obtain Robust CoPlot graphs, an MDS embedding of the dataset should be generated. The first step in the algorithm is to obtain standardized data; otherwise, variables measured at different scales do not contribute equally to the analysis [

z i j = x i j − med ( x j ) MAD ( x j ) (1)

where z i j is the i -th row and j -th column element of the standardized matrix Z n × p , x j is the j -th column of data matrix X n × p , med ( . ) is the median function, and MAD ( x j ) = 1.4826 med ( | x j − med ( x j ) | ) stands for the median absolute deviation.

In the second step, the p -dimensional dataset is mapped onto a two- dimensional space by taking account of the dissimilarity metric obtained from the standardized data matrix. To find a proper embedding of the dataset, metric (classic) or non-metric (ordinary) MDS is used in the literature. Although non-metric MDS (NMDS) can be considered in order to overcome the existence of outliers, Spence and Lewandowsky [

f ( O , Y ) = ∑ i < j [ δ i j − d i j ( Y ) − o i j ] 2 + λ ∑ i < j | o i j | (2)

where δ i j is the dissimilarity metric among i -th and j -th row of the standardized matrix Z n × p , Y n × 2 is the coordinate matrix for two-dimensional space, d i j ( Y ) shows the Euclidean distance between i -th and j -th row of coordinate matrix Y n × 2 , λ > 0 is the parameter that controls the assumed number of outliers, and the i -th row j -th column element of the outlier matrix O is o i j = sgn ( δ i j − d i j ( Y ) ) max ( 0 , | δ i j − d i j ( Y ) | − λ / 2 ) , which repre- sents the outlier variable.

In the last step of the Robust CoPlot method, vectors representing the variables are located on the obtained robust MDS map. Robust CoPlot decides the direction and magnitude of a vector using the median absolute deviation correlation coefficient (MADCC), ρ j , MADCC , given by [

ρ j , MADCC = MAD 2 ( u j ) − MAD 2 ( k j ) MAD 2 ( u j ) + MAD 2 ( k j ) . (3)

Here, u j and k j are the robust principal variables given as follows:

u j = z j − med ( z j ) MAD ( z j ) + ν j − med ( ν j ) MAD ( ν j ) k j = z j − med ( z j ) MAD ( z j ) − ν j − med ( ν j ) MAD ( ν j ) . (4)

In (4), z j stands for the j -th column of standardized data matrix Z n × p , and ν j represents the projection values of all n points in the MDS map on the j -th variable vector for a specific direction. For each degree of 360 ∘ , the ρ j , MADCC correlation between the actual values of the variable j and their projection on the vector, ν j , is calculated. The direction of the vector is determined so that the calculated ρ j , MADCC value attains maximum.

The RobCoP package contains just one main function, RobustCoPlot(), and many auxiliary functions. RobustCoPlot() has one input argument, InStrct, and one output argument, OutStrct. Each argument is a MATLAB structure with different fields. The RobustCoPlot() function can perform NMDS, RMDS analysis with many options for dissimilarity distance function, data stand- ardization type, and MDS initialization method. In addition to MDS analysis, classical and Robust CoPlot analyses can also be performed. The desired analysis is determined by the field values of the input structure, InStrct.

To generate an input structure according to the desired type of analysis,

possible choices are “PCA” for principal component analysis and “Random” for randomly selected starting points. “NMDS” or “RMDS” selection for the MDS analysis is done by using the InStrct.MDSMethod field. If “RMDS” is selected, InStrct.OutlierRatio field should also be defined. The InStrct.OutlierRatio field can take values from ( 0 , 1 ) interval, and represents the assumed outlier ratio for RMDS analysis. The InStrct.DrawGraph is an optional field which can take values “Shepard”, “MDS”, and “CoPlot”. If this field is not defined, the RobustCoPlot() performs the MDS analysis in silence mode and returns the coordinates of the obtained embedding. “Shepard” option draws the Shepard Diagram only, “MDS” draws the MDS graph, and the “CoPlot” option performs CoPlot analysis. To see all of the graphs, the “ALL” value should be used. If the “CoPlot” option is selected for the InStrct.DrawGraph, the vector correlation method for CoPlot should also be selected by using the InStrct.VecCorrMethod field. If “PCC” is selected the representation of vectors is implemented by the Pearson correlation coefficient; if “MADCC” is selected, representation is implemented by the median absolute deviation correlation coefficient.

The fields of the output structure, OutStrct, vary by the MDS analysis type selected. The following two fields, OutStrct.StressValue and OutStrct.Embedding, are the returned fields regardless of the MDS method selected. The OutStrct. StressValue field returns the Kruskall stress value of the obtained resultant MDS embedding. The Kruskall stress value shows the quality of the obtained two-dimensional mapping of the multivariate data, and a smaller value means good representation. The OutStrct.Embedding field returns the coordinates of the data points found by the selected MDS method. If “RMDS” is selected as the InStrct.MDSMethod, then OutStrct contains an additional field, OutStrct.Outlier, containing non-zero elements showing the distances that are deemed as outliers during the RMDS analysis.

Robust CoPlot method considers all the variables as well as the observations simultaneously to obtain two dimensional map. Correlations among the va- riables, relations among the observations and mutual relationship among the observations and their measuring variables can be seen by a single graphical representation. Besides possible outliers which are located far from the bulk of the data can easily been detected.

In this section, we present and illustrate the use of the RobCoP package on the dataset frequently used in the DEA analysis to show the economic performance of China’s cities [

The RobustCoPlot() function takes the CSV file as an input dataset. The first line of the input data file should contain the names of the variables, and the number of columns in the file should be equal to the number of variable names. In other words, the input file should not contain any unnamed columns. The first few lines of the CSV file used in the examples are given in

DMU(1), | ILF(2), | WF(3), | INV(4), | GIOV(5), | P&T(6), | RS(7), | COLOR(8) |
---|---|---|---|---|---|---|---|

1 | 110.22 | 794,509 | 724,255 | 2,374,342 | 680,119 | 12,790 | 1 |

2 | 31.34 | 183,319 | 101,556 | 473,369 | 118,062 | 3460 | 2 |

3 | 18.12 | 99,307 | 83,395 | 255,540 | 50,355 | 2652 | 3 |

4 | 46.86 | 304,726 | 173,655 | 734,613 | 150,853 | 4381 | 2 |

5 | 77.39 | 443,862 | 210,947 | 1,037,584 | 189,878 | 5233 | 2 |

reference. After adding the package to MATLAB path, the following code is used for importing the input data file.

Then, ChineseCities.csv, which has 36 rows representing the name of variables and observations and 8 columns representing the variables and color values, is ready for the analysis.

The RobCoP package supports non-metric MDS analysis, which is used in classic CoPlot analysis, and RMDS, which is used in Robust CoPlot analysis. The first column of ChineseCities.csv file is excluded from the analysis because it contains the observation number. The last column, COLOR, is generated for coloring the resultant MDS embedding in which the numbers are given in a way to sort the profit and taxes (P&T) values at the sixth column of the dataset. The color value assignment is performed according to the defined ranges in

In order to allow comparisons among variables on different scales, RobCoP package standardizes the data. In this example to generate non-metric MDS embedding, “Mean” is selected for standardization type.

The MDS embedding of the dataset requires a set of distances between the observations. Although given example uses city-block distance, various distance metrics can be selected to create distance matrix in the RobCoP package.

For the starting point of the MDS embedding, “PCA” (Torgerson) is selected by using the InStrct.InitMethod field.

To produce non-metric MDS results, following code snippet can be used. To obtain NMDS map, InStrct.DrawGraph field is selected as “MDS”. Similarly, to

obtain Shepard diagram, it is entered as “Shepard”.

After preparing the input structure, a single command is required to perform analysis.

For the given example, the obtained non-metric MDS embedding of the dataset is shown in

The following code snippet can be used for robust MDS analysis of the same dataset. Only the InStrct.MDSMethod field of the input structure is changed to a

P&T(6) Value Range | Assigned Color Code |
---|---|

>500,000 | 1 |

100,000 - 499,999 | 2 |

0 - 99,999 | 3 |

“RMDS” value, and since robust MDS is selected, the InStrct.OutlierRatio value should be given. The outlier ratio for the example is assumed to be 10% [

increases contamination of the predicted proximities in NMDS solution in- creases.

The maps generated so far are the NMDS and RMDS maps without variables. In this section, a second map, superimposed on the first, consisting of vectors for each variable is generated. The following code snippet provides classical CoPlot analysis. The user needs to know that the data matrix standardization type and computation method of the vector correlation coefficients, InStrct.VecCorr- Method, should be chosen as “Mean” and “PCC” respectively to obtain classical analysis results (see

The following code snippet enables to draw Robust CoPlot. The data matrix standardization type and the computation method of the vector correlation coefficients have to be specified as “Median” and “MADCC” to obtain robust analysis results (see

In this paper, we present the RobCoP package for performing graphical display method of multivariate data in MATLAB. Our main objective while developing this package was to provide a useful tool for helping the researchers to depict the multivariate data in the presence of outliers. This paper makes an important

contribution by presenting a new software package that supplies the reader Robust CoPlot analysis as well as robust MDS and classical CoPlot analysis with open source code. Until recently there was no package for robust version of CoPlot analysis and robust MDS. The package presented in this paper addresses these issues. We believe that this package will be used in various areas, especially in applied statistics.

Atilgan, Y.K. and Atilgan, E.L. (2017) RobCoP: A Matlab Package for Robust CoPlot Analysis. Open Journal of Statistics, 7, 23-35. https://doi.org/10.4236/ojs.2017.71003