An Examination of Very Low Efficiency Scores in Data Envelopment Analysis in the Pension Funds Industry

Data Envelopment Analysis (DEA) is a powerful analytical tool that is considered as one of the most useful techniques to measure the efficiency of Decision Making Units (DMUs) in certain industry segments. However, there is a scarcity of reported use to assess pension funds’ performance due to the complexities of such funds. The few papers that can be found in literature do not consider the main characteristics of pension funds such as uncontrollable variables for managers, regulations, and funds’ status (fully funded/underfunded pension plans). Regulations affect such investment vehicles in many ways from investment strategy, tax status, reporting requirements and others. Also, as the by-product of our past research in this field the authors ran into some unexpected outcomes where some funds had achieved an extremely low efficiency score. This is very highly unusual and invited additional research. There are very few papers in the literature on extremely low efficiency scores, and there is a paucity of cogent explanations on why this is the case. Therefore, while evaluating the pension funds’ performance through DEA we worked on this problem in some detail to uncover the reason(s) for such low minimum efficiency scores for pension funds. We found that the presence of very low efficiency scores phenomena is not uncommon in pension funds industry but is in other industry studies.


Introduction 1
DEA points out the efficient DMUs which can be considered as a target for the inefficient ones (Charnes et al., 1978). Therefore, DEA is widely used in the literature for banking corporations, branch networks, risk, fraud and many other financial institutions. However, there are only a few papers that used DEA to evaluate pension funds' performance. In 2005, Barrientos and Boussofiane studied the efficiency of pension fund managers in Chile by using CCR and BCC models in DEA for the period of 1982-1999. The results indicated that the Chilean companies exhibited significant inefficiency. There were changes over time but no continuous trends toward an efficiency improvement (Barrientos & Boussofiane, 2005). In 2006, Barros and Garcia evaluated Portuguese pension funds' performance from 1994-2003 by using different DEA models such as CCR, BCC, Cross-Efficiency and Super-Efficiency and compared the results. The results indicated that the majority of Portuguese firms displayed relatively high managerial skills, being VRS-efficient for the most part. However, there were some inefficient firms which could have improved much more. Also, their results showed that private pension funds managers were more efficient than those of public institutions. The researchers also tested institutions involved in mergers and acquisitions during the period and found that they were more efficient than those that were not involved in these processes. The results supported that small pension funds management companies which did not merge, had less efficient performance and their size disadvantage acted as a negative influence on their operations. Unfortunately, they studied only 12 pension funds and the authors cautioned the readers about this problem (Barros & Garcia, 2006). DEA requires a sufficient number of observations to allow good separation and discrimination amongst DMUs. A small sample size can reduce the accuracy of results. The rule of thumb is that the number of DMUs should be at least three times the total number of inputs plus outputs which are used in the model (Cooper et al., 2011).
In 2010, Garcia analyzed changes in the productivity of the same 12 Portuguese pension funds management institutions for a different timeline of 1994-2007. He used DEA and the Malmquist index. His results indicated that increasing the governance and transparency of the fund's management companies would increase their efficiency (Garcia, 2010). In 2011, Sathye estimated the production efficiency of pension funds in Australia for the years 2005 to 2009 using CCR and BCC models. The results indicated that Australia's pension funds' efficiencies were too low in some cases. They carried out regression analyses on the variables and found that certain fund characteristics, such as size and the pro-1 Note that some of the following parts had appeared in book chapters by the authors of this paper: Badrizadeh, M., "Pension Funds Insights with DEA", Book Chapter in "Data Envelopment Analysis in the Financial Services Industry. A Guide for Practitioners and Analysts Working in Operations Research Using DEA International Series in Operations Research & Management Science". Paradi, J. C., Sherman H. D., Tam F. K. Springer, 2018;and Badrizadeh, M., Paradi, J. C., "Mixed Datasets with Partially Deficient Variable Sets Embodied in Mixed Variable DEA (MV-DEA)", Book Chapter in "Data Science and Productivity Analytics", Charles, V., Aparicio, J., and Zhu, J., Springer, 2020. Journal of Service Science and Management portion of funds invested in low-risk opportunities had a positive association with performance. They also discovered that diversification and the financial crises, as one would expect, yielded a negative effect on efficiency (Sathye, 2011).
In 2015, Galagedera and Watson assessed pension funds in Australia by using DEA for year 2012. In the study the funds were classified under four categories: industry, public sector, corporate and retail. The results showed that retail funds were the best performers (Galagedera & Watson, 2015). However, each of these categories has its own specific characteristics and it might not be proper to consider all of them in one model without providing certain specific treatments. In 2017, Badrizadeh explained the importance of considering different characteristics of investment industries by comparing pension funds and mutual funds (Badrizadeh, 2017). In 2015, Zamuee evaluated Namibian pension funds using a CCR model for years 2010 to 2014. The results showed that the majority of these pension funds were found to have low efficiency scores and urgent management intervention was required to address the issues raised (Zamuee, 2015).
The previous studies on pension funds using DEA predominantly focused on comparing different DEA models instead of having a clear methodology and framework. Moreover, as mentioned above, DEA requires sufficient sample size to allow good separation and discrimination amongst DMUs. Most of these studies had very few DMUs considering the number of inputs and outputs which not only distorts their results, but the findings are to be treated with caution. Also, a major flaw of these studies is that none of them consider the effect of government regulations on these pension plans' performance that significantly restrict and impact the managers' control and differentiate them from other investment vehicles that do not have such restrictions. For instance, contribution amounts and benefit payments are the main variables here. However, there are government regulations for these two variables, and they are not completely under the managers' control. Furthermore, one of the important issues in the pension funds industry is their financial health with respect to their ability to meet their legal obligations to see whether pension plans are fully funded, or they have deficits and are underfunded. Therefore, fully funded plans are referenced only to their own category while the underfunded plans are referenced to their level as well as the higher level of hierarchy (fully funded plans). None of these studies considered these important characteristics of pension funds which distinguish them from other investment vehicles. As Badrizadeh and Paradi cautioned, these factors should be treated appropriately (Charles et al., 2020).
This research investigates the performance of different private pension plans by considering the effects of regulations on asset allocation as well as managers' authority on the contribution amounts and benefit payments as the two main variables in this industry, plus there are other factors. The issue in focus here is the very low efficiency/productivity results in the pension funds industry compared to other industries such as: • Agriculture: 0.2 to 0.4 (Blancard & Martin, 2013) and (Vlontzos et al., 2014) Journal of Service Science and Management • Airlines: 0.1 to 0.3 (Barros & Peypoch, 2009) and (Wanke & Barros, 2016) • Banking: 0.4 to 0.6 (Asmild et al., 2004) and (Chiu et al., 2008) • Mutual Funds: 0.1 to 0.3 (Basso & Funari, 2005) and (Premachandra et al., 2012) • Pension Funds: 0.00 to 0.008 (Sathye, 2011) and (Zamuee, 2015) The rest of the article is organized as follows: the methodology is explained in Section 2 and the results are discussed in Section 3. The conclusions are presented in Section 4.

Methodology
The problem of low efficiency for DMUs arose from work we did on efficiency comparisons between federally regulated pension funds and Canadian Mutual Funds (Badrizadeh, 2017). In this section, first the main characteristics of pension funds are investigated, calculated, and considered in the model. Then, the reason for low minimum efficiency scores for pension funds is investigated.

Evaluating Pension Funds' Performance
Canadian pension plans are mostly categorized into two groups: Defined Benefit Plans (DB) and Defined Contribution Plans (DC). DB plans offer an employee the security of knowing what to expect at retirement based on their salary during their working years. In a DB plan, the investment risk of guaranteed retirement income is taken on by the employer. DC plans specify the amount of employer and employee contributions. The amount available to provide a pension income in a DC plan is affected by how successfully contributions are invested. Therefore, the investment risk in a DC plan rests with the employee. The plan members who are not covered by DB or DC plans, are covered by other types of plans such as combination pension plans (Combo). Combo plans incorporate both defined benefit and defined contribution plans' concepts. This type of plan offers additional flexibility for plan members and employers by incorporating some positive elements from both plans. In a combination plan, a pension must be promised and from time to time, employers are able to use pension surplus to fund their defined benefit plan's current service costs. In this research, these three types of plans are studied.
One of the important issues in managing such funds is government regulations intended to protect the retirement income and maximize returns. Pension laws and regulations shape the unique legal investment environment in which pension funds operate. Managers consider the impact of regulations as a matter of course in their work. In general, regulations can be categorized into two types.
One type of regulation deals with administration of the various types of pension plans. These rules are numerous and change from one situation to another based on actuarial age considerations, mortality, conditions on fund transfer to spouse/common-law partner after death, etc. The second type of regulation deals with the allocation of assets. For instance, in Canada, according to the Financial Ser-Journal of Service Science and Management vices Commission of Ontario in compliance with Federal Investment Regulations rules, a maximum of 5% of the plan's assets may be invested directly or indirectly in any Canadian resource property. Similarly, a maximum of 10% of the assets may be invested in or loaned to any one person/associated person 2 (FSCO, 2004). Hence, the only part of the regulations that managers can somehow control is how to allocate their assets to different investment vehicles more effectively while observing these restrictions. As a result, if the effects of regulations on asset allocation can be quantified, then the managers' performance for allocating their assets can be assessed. In order to achieve this goal, the standard deviation of returns is calculated based on their asset allocation and added to the variables. As maximization of the benefits is an important objective, a Variable Returns to Scale (VRS) output-oriented model was chosen as the base model.
The VRS model was proposed by Banker, Charnes andCooper in 1984 (Banker et al., 1984). The VRS frontier does not pass through the origin as the Charnes, The envelopment form of the Non-Dis-VRS model is shown in Equation (2): Moreover, in order to have a clear insight into how the pension funds industry is managed, one has to consider the funds' status to see whether the plans meet their financial and actuarial obligations, hence fully funded or they have deficits, and these are underfunded. Therefore, the categorical variables are also considered in the Non-Dis-VRS model and signal whether an under/fully funded condition exists. To achieve this goal, fully funded plans are referenced only to their own category while the underfunded plans are referenced to other, similar plans as well as to the fully funded ones. It should be noted that all pension plans for this research are active plans. Therefore, if an active plan is flagged as underfunded it means that the plan will not meet its obligations based on the government tiers' deadlines (generally in the next 5 or 10 years). During these years, if the underfunded active plan's managers compensate for the financial deficits, the plan should move to the fully funded category. If not, after the deadline (5 or 10 years) the plan may be terminated. Only a few underfunded active plans are found efficient.

Investigating Low Minimum Efficiency Scores
Only DB and Combo plans' managers must file benefit payments in their annual From other studies using DEA, only two, Sathye (2011) and Zamuee (2015), mentioned the minimum efficiency scores problem we had found. They used various DEA models and different years and found some efficiency scores in the range of 0.00 to 0.08. As evident in these two articles as well as the results of this study, low minimums do occur in the pension funds industry.
Therefore, a hypothesis can be made as follows: Hypothesis A: Since both fully funded and underfunded active pension plans should be considered in the DEA model and most active pension plans are underfunded, very low minimum efficiency scores could be found in such studies.
In order to investigate this hypothesis, three different approaches were carried out. In the first approach, the efficient DMUs are removed from the analysis.
The efficient DMUs were divided as fully funded and underfunded plans. However, the former can be only referenced to like plans but the latter can be refe-Journal of Service Science and Management renced to either or both. The efficient DMUs are "peeled off" from the frontier until there are no more fully funded plans remain and the average and minimum efficiency scores are examined. In the second approach, the inefficient DMUs are counted to see how many of them are from underfunded plans. In the third approach, the models are run for fully funded and underfunded plans separately to check the minimum efficiency scores.
A detailed explanation of the data and the results of the Non-Dis-VRS model is presented in Section 3.

Data Preparation
Data from all Canadian DB plans, DC plans and Combo plans which are supervised federally and not terminated during year 2010 and have at least 100 or more plan members were studied. Although OSFI is the best source for pension funds data, extensive work was performed to validate the data. Moreover, standard deviation of returns which is an important variable was not provided by OSFI. Therefore, for each pension plan this was calculated from the available data for asset allocation. Also, outliers which would have unwanted effects on the results of a DEA model were removed. Therefore, the variables were selected carefully and validated through various techniques such as statistical tests, sensitivity analysis and outlier removal which are explained below.

Number of DMUs
DEA requires a sufficient number of DMUs to allow good separation and discrimination amongst them. A small sample size can reduce the accuracy of the results. As explained earlier, the rule of thumb is that the number of DMUs should be at least three times the total number of inputs plus outputs which are used in the model. Another similar rule is: , where n is the number of DMUs, m is the number of inputs and s is the number of outputs (Cooper et al., 2011). As presented later in this section, there are 4 inputs and 2 outputs for DB plans and Combo plans and 3 inputs and 1 output for DC plans. There are 90 DB plans, 37 DC plans and 46 Combo plans. Therefore, the number of pension plans is sufficient for this research.

Variable Selection 1) Efficiency Contribution Analysis
In this method, an input or output variable is included or excluded from a DEA model in order to determine the influence of each variable on the DEA scores (Smith, 1997). Therefore, first, each variable is removed, and the model is re-run with the rest of the variables. Then, the difference in average scores between the original model and the re-run model are compared to determine whether the variable has a significant impact on the DEA scores.

2) Correlation Analysis
The correlation coefficient used here which represents the relationship between Journal of Service Science and Management two variables is known as the Pearson correlation coefficient and can be calculated as shown in Equation (3): where , x y are the variables being compared; ( ) cov , x y is the covariance of x and y; Therefore, one of the two variables may be enough and one of them can be removed. However, both management's perspective and the research's objectives should be considered in this regard.

Outlier Detection 1) Manual Cleaning
A simple and necessary approach for removing outliers is checking the data manually. Although this technique is very time consuming, it is essential to examine if there are any outliers in the dataset (for instance, the investment expense for the pension fund is $2). Manual data cleaning together with other methods leads to a more precise and valuable dataset. For the purpose of this section, first the data was examined manually using the histogram of each variable and the meaningless values were pinpointed. Then, the methods which are explained below were run. The results of these methods showed that indeed the manually found data were actually outliers.

2) Stripping the Efficient Frontier Approach
Stripping the frontier is an alternative method to determine whether the efficient DMUs on the frontier are, in fact, outliers (Cooper et al., 2011). In order to examine this method, all of the efficient DMUs on the frontier should be removed and the DEA model rerun. The frontier stripping is a good way to distinguish the obvious outliers on the frontier. However, it is better to consider the results of this method alongside the results of other methods.

3) Super Efficiency Test
Another approach to sensitivity analysis in DEA is removing DMUs that are M. Badrizadeh, J. C. Paradi Journal of Service Science and Management referenced by many units as they might be super-efficient outliers that skew the frontier (Simar, 2003). Therefore, the supper-efficiency model was used to determine the impact of DMUs that skew the frontier and affect the DEA scores.

Sensitivity Analysis 1) Wilcoxon Rank-Sum Test
The Wilcoxon Rank-Sum test is a nonparametric statistical test that examines whether the two groups belong to the same population or whether they differ significantly. It can be used when the population is not normally distributed.
Since the theoretical distribution of the efficiency score in DEA is usually unknown, the Wilcoxon Rank-Sum test is a suitable method to test the DEA scores which are statistically independent (Cooper et al., 2007). For the purpose of this T has an approximately standard normal distribution and by using T, the null hypothesis that the two groups have the same population at a level of significance of α can be examined. The hypothesis is rejected if where 2 T α corresponds to the upper α/2 percentile of the standard normal distribution (Cooper et al., 2007).
The null hypothesis of Wilcoxon Rank-Sum Test is that the two groups have the same population. The result of the hypothesis test should be 1 or 0. If h = 1, this indicates rejection of the null hypothesis and if h = 0, this represents a failure to reject the null hypothesis at a significance level of α = 0.05. Also, ρ-value of the test is a positive scalar from 0 to 1 with both extreme values expressing the complete separation of the distributions and 0.5 demonstrating full overlap.
In this research, the datasets were carefully examined by using all the methods mentioned above. can be considered in one group.
The inputs and outputs for DB and Combo plans are presented in Table 1.
The variables are measured in Canadian dollars.
For DC plans the inputs and outputs are shown in Table 2.

Results for Evaluating Pension Funds' Performance
The output oriented Non-Dis-VRS model is used to evaluate the pension funds' performance by considering the effects of regulations on pension funds by calculating the standard deviation of returns, based on asset allocations, including controllable and uncontrollable variables. Also, the categorical variables are considered in the Non-Dis-VRS model to indicate the funds' status (fully funded and underfunded) for DB and Combo plans. Since DC plan managers are not obliged to provide defined benefit payments for their retirees, the funds' status is not an issue for DC plans. The results for the Non-Dis-VRS model for DB, Combo and DC plans are represented in Table 3 and Table 4. The main characteristics of pension plans such as the effects of regulations on asset allocation and managers' authority as well as the funds' status are accounted for in the DEA models for the first time. As shown in Table 3 and Table 4, the average of the lowest efficiency scores' quartile for DB plans (0.17) and for DB and Combo plans (0.15) are lower than the average of the lowest efficiency scores' quartile for DC plans (0.33). Further investigation into private pension funds is carried out in Section 3.3 to examine the reasons for the very low minimum efficiency scores. This should provide a better understanding of the pension funds industry. Journal of Service Science and Management

Results for Investigating Low Minimum Efficiency Scores
From 90 DB plans in the dataset, 29 plans are fully funded, and the rest of the plans (61 plans) are underfunded. From 46 Combo plans, only 8 plans are fully funded and 38 plans are underfunded.
In Figure 1 and Figure 2 the efficiency scores' distributions are presented; sorted in descending order of their efficiency scores (θ).
As shown in Figure 1, approximately 30% of DMUs have θ between 0.1 to 0.3 and all these DMUs are underfunded plans.
In Figure 2, 54 DMUs achieved θ between 0.06 to 0.3. Most of these DMUs are from underfunded plans.
The results for the three approaches that were mentioned in the methodology section are presented below.

Stripping Frontiers 1) DB Plans
There are 90 DB plans of which 29 are fully funded and 61 are underfunded: Journal of Service Science and Management   Table 3 in Section 3.2, the average efficiency scores and minimum efficiency score for the output oriented Non-Dis-VRS model considering funds' status for all 90 DB plans are 0.576 and 0.106 respectively. After removing the first layer of efficient DMUs, 57 DMUs remain that the average efficiency scores and minimum efficiency scores for the Non-Dis-VRS model with and without considering categorical situation 5 are presented in Table 5: 4 It should be noted that all pension plans for this research are active plans. Therefore, if an active plan is flagged as underfunded it means that the plan will not meet its obligations based on OSFI tiers' deadlines in the next 5 or 10 years. As a result, an underfunded active plan for the year of study for this research can be placed on the frontier as an efficient plan. However, only a few underfunded active plans become efficient.

5
Considering funds' status (fully funded/underfunded) provides a more realistic analysis of the results. By having categorical DMUs, fully funded plans are referenced only to their own category while the underfunded plans are referenced to their level as well as the higher level of hierarchy (fully funded plans). Therefore, the main change between the Non-Dis-VRS model with and without categorical DMUs is in the DMUs' reference sets. The results of the DEA model with categorical DMUs change slightly and one of the important aspects of pension funds is taken into consideration.  and 12 DMUs are underfunded (10 DB and 2 Combo plans). As indicated in Table 3 in Section 3.2, the average efficiency scores and minimum efficiency score for DB and Combo pension plans are 0.508 and 0.064 respectively. The results are shown in Table 7 after removing the first layer of efficient DMUs (39  Table 9. As expected, this approach shows that fully funded plans perform better than underfunded plans. Also, as the number of such plans decrease with each stripping process the remaining plans are more and more underfunded until all are underfunded, the minimum efficiency scores increase significantly.

Stripping DMUs with Low θ
In this section, the aim is to demonstrate how many of the inefficient DMUs with low θ are from underfunded plans and if the test is started from the bottom of the efficiency scores' ranking, after how many underfunded plans the next inefficient DMU is from fully funded plans.

1) DB Plans
When the DB plans are investigated from the bottom of the efficiency scores' ranking, the first inefficient fully funded plan is reached after removing 29 inefficient underfunded DMUs. As expected, this is another indication that fully funded plans perform better than underfunded plans and the reason for having some low θs is because both fully funded and underfunded plans are considered in the DEA models.
The results, after removing these 29 DMUs from the dataset (61 DMUs remain), are reported in Table 10.  As anticipated, this approach demonstrates from another perspective that most of underfunded plans have low efficiency scores.

Investigating Fully Funded and Underfunded Pension Plans Separately 1) DB Plans
In this section, each type of funds' status is examined separately to investigate and visualize the average efficiency scores and minimum efficiency scores for each category of fully funded and underfunded DB plans. The results are presented in Table 12. 2) DB and Combo Plans Journal of Service Science and Management The results for each category of fully funded and underfunded DB and Combo plans are presented in Table 13. The results show that when fully funded and underfunded plans are tested separately, the average and minimum efficiency scores are higher than when both fully funded and underfunded plans are added together.
In summary, all these three approaches indicate that when fully funded and underfunded plans are examined together the minimum efficiency score decreases significantly. Since most plans are underfunded and both fully funded and underfunded plans should be considered together, very low minimum efficiency scores are often found in this industry.

Conclusion
Government rules and restrictions are one of the main issues in the pension funds industry impacting asset allocation and managers' control. This study quantified the impact of the regulations on asset allocation and incorporated the controllability of the variables from the managers' perspective. As such, this research has successfully integrated the main characteristics of pension funds into the DEA model leading to a more realistic assessment of the performance of this financial investment vehicle. Also, one of the issues in the pension funds industry is that most plans are underfunded. To have a comprehensive study of this industry both fully funded and underfunded plans should be considered. An analysis was carried out to assess the fully funded and underfunded plans' positions on the efficient frontier. The results indicated that most fully funded plans are efficient and after two or at most three times of removing the efficient DMUs from the dataset, only underfunded plans remain. As expected, the minimum efficiency score increased significantly after the removal of each efficient frontier.
Also, most underfunded plans were located at the bottom of the efficiency scores' ranking. When fully funded and underfunded pension plans were evaluated sepa-Journal of Service Science and Management rately, the minimum efficiency score increased compared to when they were considered together. Therefore, a low minimum efficiency score is often found for the pension funds industry since both fully funded and underfunded pension plans should be considered in the model.