_{1}

^{*}

This paper proposes a new method to reduce the dimensionality of input and output spaces in DEA models. The method is based on Yanai’s Generalized Coefficient of Determination and on the concept of pseudo-rank of a matrix. In addition, the paper suggests a rule to determine the cardinality of the subset of selected variables in a way to gain the maximal discretionary power and to suffer a minimal informational loss.

The DEA (Data Envelopment Analysis) model is a nonparametric method for estimating production frontiers. The DEA involves the solution of a set of linear programming (LP) problems to determine a production frontier against which the technical efficiency of the Decision Making Units (DMU) will be calculated. The basic DEA model was originally proposed by [

The basic CCR model proposal aims to maximize the ratio between a weighted sum of outputs and a weighted sum of inputs. The weights of these sums are chosen according to the feasibility conditions and assuming a hypothesis of constant returns to scale. Charnes, Cooper and Rhodes have previously transformed the fractional CCR model into a linear model whose dual is commonly referred to in the literature as the DEA (the details of this procedure can be find in [

The CCR model and its variants have been increasingly applied following the original description by [

One of the most frequent problems associated with the CCR model is the lack of discrimination among DMUs when the number of inputs and outputs is rather large in relation to the number of DMUs. A large number of variables relative to the number of observations may entail a large number of efficient DMUs in the sample, thus reducing the model’s ranking capability. This represents a characteristic of the CCR model: the lower the number of DMUs, the less active the restrictions imposed on the maximum efficiency multipliers. In fact, according to [

^{1}See [

Several alternatives have been proposed to increase the ranking capacity of the CCR model (see, eg, [

cone-ratio and assurance-region approaches^{1}.

In the present work, we propose a simple and objective method to address a situation in which the original inputs and outputs have been correctly selected but the low number of observations has translated into low discrimination power. This approach is intended to reduce the dimensions of the inputs and outputs spaces and therefore does not require additional information. Additionally, this method does not require post-estimation procedures (as in the case of the super-efficiency and cross-efficiency approaches). Our proposed approach relies on multivariate statistical techniques, namely, Principle Components Analysis and the use of a correlation matrix. Again, it should be emphasized that this approach is not intended for the selection of inputs or outputs. Instead, it is a support tool used to increase the discriminatory power without a significant loss of information, i.e., the discriminatory power is increased regardless of knowledge of which variables are essential for the model. The method is a tour de force of the linear algebra when applied to find similarity between subspaces by exploring the special structure of the space of positive semidefinite matrices.

Following this introduction, the remainder of the work is structured as follows: Section 2 briefly reviews multivariate variable selection in the DEA models, Section 3 introduces the proposed methodology, Section 4 discusses applications of the proposed method to the CCR model, and finally, Section 5 presents conclusions and suggestions for future research.

One of the issues relating to the CCR model is that of the dimensions of inputs and outputs. A consequence of using a large number of variables relative to the number of observations is the loss of discrimination power due to the generation of a large number of efficient DMU’s. There is no consensus on the optimal number of inputs and outputs to be used. However, [

One of the most common ways of selecting variables in DEA is the use of correlation matrices for inputs and outputs. When two variables (inputs or outputs) are highly correlated, one is discarded, usually on the basis of ad hoc criteria. However, eliminating one or the other variable could have a dramatic impact on

^{2}In [

^{3}In fact the use of PCA for summarizes a data set is broadly applied in several areas range from image processing ( [

estimated efficiency^{2}.

In recent years, the application of multivariate statistical methods, especially Principal Components Analysis (PCA), has appeared as a satisfactory alternative for variable reduction^{3}. The formulation of a DEA model in which inputs and outputs are summarized as principal components is the focus of the work of [

There is also a practical difficulty. Even if the problem of negative PCs is resolved, the question remains of how to interpret the results in terms of the projection of inputs and outputs (that is, predicting quantities). The fact is that the only satisfactory way of accomplishing this is back transformation to original variables-which might require a considerable computational effort. Some authors select variables based upon their contribution to PCs. Specifically, the variables with the largest absolute linear combination coefficient are selected. Because it is common for the first few components to explain most data variance, the result is a considerably reduced subset of variables.

In [

The method we propose is based on the work of [

The number of PCs is determined prior to the generation of the correlation matrix, estimating the pseudo-rank for the matrix, as proposed by [

In the following section, a brief discussion of the geometrical structure of the cone of positive semi-definite matrices and of PC analysis is presented and the GCD is subsequently defined.

This section briefly describes basic concepts described in detail in the work of [

Assuming C p to be the cone of positive semidefinite matrices with dimension p × p provided with the Frobenius inner product 〈 ⋅ , ⋅ 〉 F : C p × C p → ℜ such that

〈 A , B 〉 F = t r ( A ′ B ) forany A , B ∈ C p (1)

The norm induced by (1) will be denoted by ‖ ‖ F . For any matrix V ∈ C p , the set

R a y ( V ) = { A ∈ C p ; A = λ V , λ ≥ 0 }

is called the ray associated with V. The ray associated with the identity matrix of dimension p is called the central ray of C p .

With the definitions above, it is possible to find the angle between the rays associated with any two matrices A and B. This angle is given by the arc whose cosine is

cos ( A , B ) = 〈 A , B 〉 F ‖ A ‖ F ‖ B ‖ F (2)

In [

Based on this observation, the author argues that the region close to the central ray contains only matrices of full rank, or at least with rank p − 1 (which can occur at the boundary of such regions). The farther away from the central ray, the lower the rank of matrices.

However, matrices of full rank are also found outside of the core. Because they have may have eigenvalues close to zero, they behave as low-rank matrices. The question then is how far (into the core of the cone) does one have to move to avoid such matrices? The answer to this question lies in the concept of the pseudo-rank of a matrix. According to [

cos ( V , I p ) ≤ k * p (3)

The author shows that this value is given by

k * = ⌈ t r ( V ) 2 t r ( V 2 ) ⌉ (4)

where ⌈ z ⌉ is the nearest to z greater integer. Letting V the covariance/correla- tion matrix of a data matrix A, with p variables and n observations, or V = n − 1 A ′ A . In this case the pseudo-rank of V corresponds to the number of components to be used as representative of all accumulated variance associated with A.

Yanai’s Generalized Coefficient of DeterminationLet A represent a data matrix with dimension (n × p), where p indicates the number of variables and n the number of observations for each variable. In this context, A may refer to a matrix of discretionary/nondiscretionary outputs/in- puts. It is important to keep in mind that in DEA models, the number of observations indicates the number of columns in the pertinent matrices. Thus, for the implementation of the proposed method, matrix A should be considered the transposed output/input matrix.

Given the covariance matrix/correlation of data S = n − 1 A ′ A , let Λ and P be the diagonal matrix of eigenvalues (arranged in decreasing order) and the matrix of normalized eigenvectors of S respectively. The PCs are the columns of the matrix (n × p) given by C = A P . Using the spectral decomposition of S, it is easy to show that the covariance matrix of C is exactly Λ, so that the variables in C are uncorrelated. For this reason the AP transformation is sometimes called data “decorrelation”.

Consider K to be a subset of indices associated with k ≤ p PCs arranged in decreasing order of eigenvalues (in general the first k’s). Similarly, let Q be defined as a subset of indexes associated with the q ≤ p original variables. The sets K and Q are the subspaces generated by vectors with indices K and Q respectively. The following matrices are then defined:

・ A K is the submatrix of A in which columns with indexes in K are maintained;

・ S K = n − 1 A ′ K A K is the covariance matrix associated with A K ;

・ Λ K is the matrix of the eigenvalues associated with S K ;

・ P K is the matrix of eigenvectors associated with eigenvalues in Λ K ;

Let us assume P K to be the matrix of orthogonal projection on the subspace K such that

P K = n − 1 A S K − 1 A ′

where S K − 1 is the Moore-Penrose generalized inverse of S K . Similarly, P Q is the matrix of orthogonal projection on the subspace Q defined as

P Q = n − 1 A I Q S Q − 1 I Q A ′

where I Q is the identity matrix of the submatrix obtained by selecting the q columns with indices in Q and S Q = n − 1 I Q A ′ A I Q .

Given the definitions above, Yanai’s GCD between subspaces Q and K is defined as:

G C D ( Q , K ) = 〈 P Q , P K 〉 F ‖ P Q ‖ F ‖ P K ‖ F (5)

Supposing that K = { 1 , 2 , ⋯ , k * } remains fixed, where k * is the pseudo- rank of the covariance matrix, in this case the selection of variables exhibiting the greatest contribution to the principal components selected in K is the set of indices Q ˜ such that

Q ˜ = arg max Q G C D ( Q , K )

^{4}The use of the CCR model is an example. The method can actually be applied to any DEA model.

A practical example of the proposed method is provided in this section. For that, the CCR model will be presented with the original variables (hereafter called the “general model” and denoted by CCRg), and with the subset of selected variables (“reduced model,” denoted by CCRr). For the sake of simplicity, only product-oriented models will be discussed below.

Let us suppose that there are J DMUs under study, each using a vector x ∈ ℜ + N of inputs to produce a vector y ∈ ℜ + M of outputs with a technology defined by

T C C R g = { ( x , y ) ; x ≥ X λ , y ≤ Y λ , λ ≥ 0 }

where X ( N × J ) , Y ( M × J ) and λ ( J × 1 ) are input and output matrices and the vector of intensities respectively.

Given a DMU j, its technical efficiency in the model CCRg, denoted by E T j g , is estimated by solving the following linear programming problem (the minimization of slacks is omitted for simplicity)

E T j g = { max θ , λ θ subject to ( x j , θ y j ) ∈ T C C R g (6)

Let us now suppose that a set of variables was selected following the procedure presented in Section 3. For simplicity’s sake, let us assume that only outputs were selected. Let us denote by y j q the vector product of a DMU j = 1 , 2 , ⋯ , J , with selection of q < N outputs, and let us denote by Y q the respective reduced output matrix. The technology in the CCRr model is defined as

T C C R g = { ( x , y q ) ; x ≥ X λ , y q ≤ Y q λ , λ ≥ 0 } .

Given a DMU j, its technical efficiency in the model CCRr, denoted by E T j r , is estimated by solving the following linear programming problem

E T j r = { max θ , λ θ subject to ( x j , θ y j q ) ∈ T C C R r (7)

The reduced model is obtained in three stages, as summarized in the

One issue not covered by the procedure presented above is that of defining the cardinality of the subset of selected variables. One suggestion would be to combine the gain in discrimination with some measure that would reveal loss of information. The gain in discrimination would be obtained by the difference between the percentage of efficient DMUs in the general model and in the reduced model. The complexity of this issue relates to what measure of informational loss

should be used. One possibility is to use the Kolmogorov-Smirnov statistic, which quantifies the difference between distributions. In this context, this statistic is given by

K S ( q ) = sup x | F ( x ) − F q ( x ) |

where F and F q are the empirical cumulative distribution functions of technical efficiency estimated by the general and reduced models (the latter with a subset q of selected variables) respectively.

Let K * and K q * be the number of efficient DMUs in the general and reduced models respectively; let δ q = ( K * − K q * ) / K such that δ q represents, in proportional terms, the gain in discrimination power of the reduced model in relation to the general model. Then the optimal cardinality would be given by q * such that

q * ∈ arg max q { δ q K S ( q ) ; K S ( q ) ≤ 1.36 2 K } (8)

in which K S ( q ) ≤ 1.36 2 / K is included so that optimal cardinality will depend on acceptance of the null hypothesis of the Kolmogorov-Smirnov test. The amount 1.36 2 / K represents the nullity condition of the Kolmogorov-Smir- nov test with a significance level of 0.05, such that if K S ( q ) > 1.36 2 / K the null hypothesis of equality between the distributions is rejected. It should be noted that 1.36 2 / K is generally valid for K ≥ 8 , otherwise it is necessary to consult tabulated values.^{5}

^{5}See [

^{6}See site www2.datasus.gov.br.

If there are multiple solutions to (8) the lowest maximizer q * is selected, so that

q * = min q ^ { q ^ ∈ arg max q { δ q K S ( q ) ; K S ( q ) ≤ 1.36 2 K } } (9)

An Application to Real DataThis example employs real-world data previously described by [_{i}, i = 1 , ⋯ , 12 ), representing health indicators available in the Ministry of Health’s Information System DATASUS.^{6} For the present example, to reduce the power of discrimination, only 12 of the 27 states studied by [

In this example, the Kolmogorov-Smirnov null hypothesis is accepted with a significance level of 0.05 for a value up to 0.5552, such that the condition to select the cardinality of the subset of selected outputs becomes

State | x | y_{1} | y_{2} | y_{3} | y_{4} | y_{5} | y_{6} | y_{7} | y_{8} | y_{9} | y_{10} | y_{11} | y_{12} |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

RO | 387.5 | 68.2 | 73.8 | 70.9 | 80 | 0.8 | 1.7 | 111.2 | 106.9 | 107.5 | 108.3 | 47.6 | 70 |

AC | 556.6 | 68.6 | 73.8 | 71.1 | 71 | 0.8 | 2 | 86.5 | 83.64 | 110.5 | 90.14 | 40.3 | 67.4 |

AM | 513.4 | 68.4 | 74.4 | 71.3 | 78 | 0.9 | 1.5 | 106.3 | 93.88 | 131.1 | 92.36 | 57.1 | 72.8 |

RR | 674.5 | 67.2 | 72.1 | 69.6 | 83 | 1.1 | 1.6 | 93.13 | 88.62 | 114.1 | 89.85 | 71.9 | 79.3 |

PA | 263.8 | 68.8 | 74.7 | 71.7 | 76 | 0.8 | 1.6 | 116.6 | 112.3 | 144.5 | 119.8 | 55 | 76.9 |

AP | 512.5 | 66.2 | 74.1 | 70.1 | 79 | 0.8 | 1.5 | 94.12 | 96.3 | 126.7 | 98.17 | 28.2 | 90.5 |

TO | 503.8 | 68.8 | 73.3 | 71 | 78 | 1.1 | 1.8 | 99.57 | 107.2 | 110.1 | 107.4 | 21 | 69.9 |

MA | 285.3 | 63.4 | 71.3 | 67.2 | 69 | 0.6 | 2.4 | 108.2 | 106.6 | 135.6 | 111.9 | 50.1 | 59 |

PI | 315.8 | 65.6 | 71.7 | 68.6 | 73 | 0.8 | 2.5 | 101.9 | 102.1 | 106 | 104.6 | 61.7 | 49.6 |

CE | 291.4 | 65.7 | 74.4 | 69.9 | 74 | 0.9 | 1.9 | 108.3 | 108.3 | 113.3 | 112.3 | 40.8 | 71.3 |

RN | 405 | 66.3 | 74.1 | 70.1 | 69 | 1.2 | 2.3 | 98.87 | 95.45 | 108.2 | 94.61 | 45.1 | 82.3 |

PB | 335.1 | 65.2 | 72.2 | 68.6 | 68 | 1.1 | 2.7 | 105.7 | 103.6 | 117.8 | 104 | 48.5 | 74.7 |

Max | 674.5 | 68.8 | 74.7 | 71.7 | 83 | 1.2 | 2.7 | 116.6 | 112.3 | 144.5 | 119.8 | 71.9 | 90.5 |

Min | 263.8 | 63.4 | 71.3 | 67.2 | 68 | 0.6 | 1.5 | 86.5 | 83.64 | 106 | 89.85 | 21 | 49.6 |

Mean | 420.4 | 66.9 | 73.3 | 70 | 75 | 0.9 | 2 | 102.5 | 100.4 | 118.8 | 102.8 | 47.3 | 72 |

SE | 130.1 | 1.73 | 1.18 | 1.33 | 4.8 | 0.2 | 0.4 | 8.529 | 8.772 | 12.64 | 9.732 | 13.9 | 10.6 |

Source: [

q * = min q ^ { q ^ ∈ arg max q { δ q K S ( q ) ; K S ( q ) ≤ 0.5552 } }

Note that in this example, arg max q { δ q / K S ( q ) ; K S ( q ) ≤ 0.5552 } = { 4 , 5 , 6 , 7 , 8 , 9 , 10 } and therefore q * = 4 . The last part of the example appears in

The results shown in

This paper proposes a method for reducing the dimension of input/output matrices used to estimate production frontiers through the CCR model (or its va-

riants). The method is based on Yanai’s Generalized Coefficient of Determination (GCD) and on the concept of pseudo-rank of a matrix. Additionally, a rule

State | GM | RM1 | RM2 | RM3 | RM4 | RM5 | RM6 | RM7 | RM8 | RM9 | RM10 |
---|---|---|---|---|---|---|---|---|---|---|---|

RO | 0.71 | 0.62 | 0.65 | 0.67 | 0.69 | 0.69 | 0.71 | 0.71 | 0.71 | 0.71 | 0.71 |

AC | 0.51 | 0.42 | 0.42 | 0.47 | 0.49 | 0.49 | 0.49 | 0.49 | 0.49 | 0.49 | 0.49 |

AM | 0.59 | 0.49 | 0.49 | 0.51 | 0.57 | 0.59 | 0.59 | 0.59 | 0.59 | 0.59 | 0.59 |

RR | 0.53 | 0.40 | 0.40 | 0.40 | 0.48 | 0.53 | 0.53 | 0.53 | 0.53 | 0.53 | 0.53 |

PA | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

AP | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 |

TO | 0.64 | 0.48 | 0.48 | 0.52 | 0.63 | 0.63 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |

MA | 1.00 | 0.71 | 0.86 | 0.87 | 0.87 | 0.87 | 0.88 | 0.88 | 0.87 | 0.87 | 0.87 |

PI | 1.00 | 0.54 | 0.73 | 0.80 | 0.85 | 0.94 | 0.94 | 0.94 | 0.94 | 0.94 | 0.94 |

CE | 1.00 | 0.84 | 0.84 | 0.88 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

RN | 0.88 | 0.70 | 0.70 | 0.70 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 |

PB | 1.00 | 0.76 | 0.76 | 0.76 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

dq | - | 0.33 | 0.33 | 0.33 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 |

KS(q) | - | 0.42 | 0.42 | 0.42 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 |

dq/KS(q) | - | 0.80 | 0.80 | 0.80 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

Source: Author’s estimates. ^{a}The initials GM and RM_{q} refer to general and reduced models with cardinality q, respectively.

is suggested to support the choice of cardinality of the subset of selected variables. This rule seeks to combine maximum gain in discriminatory power with minimal loss of information.

Through an example that employs real-world data, it was found that the pseudo-rank of the output correlation matrix indicates that the first four PCs should be maintained. The GCD for output subsets with cardinality from 1 to 10 was then calculated. The cardinality rule indicated the subset with four of the 12 original outputs. Finally, the estimation of densities of the general and reduced models suggested that the cardinality decision rule can support decisions concerning the number of variables required in the model to obtain maximum discrimination with minimal loss of information.

Further research is suggested on the cardinality decision rule taking into consideration various measures of loss of information. Also warranted are studies comparing the proposed method with other methods of summarization and selection.

Benegas, M. (2017) The Use of Yanai’s Generalized Coefficient of Determination to Reduce the Number of Variables in DEA Models. American Journal of Operations Research, 7, 187-200. https://doi.org/10.4236/ajor.2017.73013

Output | Description |
---|---|

y_{1} | Male life expectancy |

y_{2} | Female life expectancy |

y_{3} | Combined life expectancy |

y_{4} | Survival rate (%) |

y_{5} | Physicians per 1000 Inhabitants |

y_{6} | Hospital beds per 1000 inhabitants |

y_{7} | Coverage MMR (%) |

y_{8} | Tetravalent vaccine coverage (%) |

y_{9} | BCG vaccination coverage (%) |

Y_{10} | Polio vaccine coverage (%) |

Y_{11} | Sanitation coverage (%) |

Y_{12} | Garbage collection coverage (%) |

Source: [