A Study on the Balanced Assignment of Allocating Large Group with Multiple Attributes into Subgroups *

In this paper, a balanced assignment methodology is proposed in case of entity with more than 2 attributes. The Mahalanobis distance is a basic idea of MTS method as a distance indicator considering the correlation of entities in quality control techniques. This can be applied to ensure the balanced assignment, and utilized the Signal to Noise Ratio as a measure of classifying large groups with entities into many subgroups. In order to analyze the balanced assignment, the simulation is performed for a given example. Validations against simulation data establish the effectiveness of this approach.


Introduction
Decision making process with multi-attributes is different from the well-known solution methodology, but the efforts are being made to improve it due to the difficulty of the optimization process. A mathematical model or an application methodology with a constraint such as a series of processes for finding an appropriate compromise among attributes that are in conflict with each other is modeled as a multi-criteria function, and its necessity is increasing. Among the methods of solving the multi-criteria function, the most commonly used approaches are the weighting method and the goal programming method. In the optimization process, weights or specific numerical goals should be set appropriately for each function using the mathematical programming approaches.

Y. Rhee
However, if the parameter is wrongly selected, the burden of not obtaining the Pareto optimal solution certainly exists.
In the meantime, as a mathematical approach to multi-criteria problems, the constrained problem by using the distance and fuzzy measure is solved by Barron and Schmidt [1]. For the more complicated model, Dyer and Sarin [2], and French [3] have proposed some approaches to solve this problem through the kinds of surveys, but the mixture of alternatives and attributes give rise to numerous solutions, which led to the question of the solution tracking. Since then, the well-known MTS (Mahalanobis Taguchi System) method has been introduced [4] [5]. The MTS method is to utilize the distance between an entity and the special space of the data group, that is, the Mahalanobis distance, by analyzing the characteristics of the attributes inherent in each entity without requiring any parameter setting. In other words, the MTS method defines the Mahalanobis space as unit space in multi-dimensional spaces and calculates how far an entity is from this space.
Clustering or assignment is one of the ways to classify large groups with many entities into many subgroups according to the given criteria. Clustering means a grouping method based on attributes of each entities and certain criteria, and the similarity among entities plays an important role. For example, if a group of students is divided into two groups based on height 180 centimeter, tall group and non-tall group, based on so called height attribute, a group is classified into a tall subgroup and a non-tall subgroup. And the classified two small groups are made a new characteristic, such as beanpole group and ordinary group.
In this study, the properties inherent in an entity are defined as attributes, and the properties representing a group are defined as characteristics. On the other hand, the assignment is a grouping method considering each attribute according to the purpose of subdividing them into many subgroups. The balanced assignment is a grouping method by matching the mean or variance of the attributes to make each subgroup similar. Therefore, the characteristics of the subgroup are disappearing. It is not difficult to solve the problem of the balanced assignment in case of an entity with less than or equal 2 attributes. However, to solve the balanced assignment problem where an entity has more than or equal to 3 attributes, is somewhat complicated, but various solution approaches can be proposed. In order to reduce the interaction due to the attribute in the entity which conflicts with each other, the first step is to simplify to the similar attributes taking into account the correlation between attributes, and it is a simple and surprisingly good way to expect good results. Secondly, to apply the suitable weights for each attribute is recommended as a good approach. Since the determination of weight is key issue, the opinions of experts in the field are required to estimate the weight.
In this study, the Mahalanobis distance instead of the Euclidean distance is applied to ensure the balanced assignment, and the SNR (Signal to Noise Ratio) is utilized as a measure of classifying large groups with many entities into subgroups. The Mahalanobis distance is a basic idea of the MTS method as a dis-Y. Rhee tance indicator considering the correlation of entities in quality control techniques. In addition, the SNR indicates the influence of each entities corresponding to the Mahalanobis space.
The paper is organized as follows. We review related work in Section 2. And the concept of balanced assignment is introduced in Section 3. In Section 4, an example to allocate multiple entities into many subgroups is presented and the result of the MTS method is checked by comparing it against the simulation result. Finally, Section 5 gives concluding remarks.

Related Studies
The solving approach to the balanced assignment is applying a special case of the well known clustering methodology. In this section, the MTS-related studies are introduced for the purpose of classifying large groups with multiple attributes into many subgroups.

Clustering
Clustering algorithm is an algorithm that aggregates similar entity without prior knowledge of the entity. That is, after measuring the degree of similarity among entity, the classification is performed so as to form a cluster among the entity having the highest degree of similarity. Completing the classification by clustering, the similarity of the entity in the group is maximized, and the similarity between the groups is minimized. On the other hand, the classification by the balanced assignment results in that the similarity of the entity in the group is minimized and the similarity between the groups is maximized. The nearness between the entities is usually measured by the specific distance measure such as the Mahalanobis distance and Euclidean distance [6].
The clustering methodologies can be divided into hierarchical, density based, grid based, and model based methods. The hierarchical method is a method of hierarchically classifying and clustering a given set of entities using the tree structure, and there are a bottom-up method and a top-down method. A density based method is about forming clusters based on density, which provides an efficient clustering method for objects with multidimensional and spatial attributes. The grid based method divides a set of entities into a finite number of cells to form a structure, and all clustering operations are performed on this grid structure. Since the performance of this method depends on the number of divided cells in each dimension, the time efficiency decreases as the dimensionality increases and the number of cells increases. Finally, the model-based method is a method that finds the optimal combination between the model and the given entity using a special mathematical model.

Mahalanobis Distance
The Euclidean distance as shown in Figure 1, which is often used as a distance measure of the space concept, is applied under the assumption that the properties Y. Rhee of the attributes that an object is inherent are consistent. The Euclidean distance is defined as the shortest distance connecting two points. For example, the distance of two points A (x 1 , y 1 ) and B (x 2 , y 2 ) in two dimensions is expressed as Simply, this is a basic distance measurement in which the correlation between attributes is not considered.
On the other hand, the Mahalanobis distance is already known as an effective way to simply compare between groups with well-known characteristics and those who are not familiar with the characteristics [5]. Characteristically, the Mahalanobis distance is calculated by taking into account the correlation of variables as a measure of the degree of dispersion of variables. Since the Mahalanobis distance is very sensitive to standardized variables, it leads to a large increasement, even though the standardized variable is slightly different for the reference group [3]. Applying this to all attributes on the entity, the Mahalanobis distance can be readjusted by considering the correlation between attributes. The description of the Mahalanobis distance is shown in Figure 2.
As seen in Figure 1, the Euclidian distance has the form of a circle, since it does not take into account the correlation between attributes. On the other hand, the Mahalanobis distance takes the form of an ellipse in consideration of the correlation, and is expressed as follows.
In (1), rs MD denotes the Mahalanobis distance between entity r and entity s.   Table   The

Signal to Noise Ratio
The SNR is the measurement used to describe how much desired sound is present in an audio recording, as opposed to unwanted sound (noise). This nonessential input could be anything like electronic static from your recording equipment, or external sounds from the noisy world around us, such as the rumble of traffic, or the murmur of voices in the background. In quality engineering, the SNR is used with the loss function of the Taguchi method. This loss function formulation is influenced by the type of quality characteristic under Y. Rhee consideration, that is, the smaller-the better, the larger the better, or the nominal the better. Furthermore, based on the selected type of quality characteristic, a performance measure is defined. Such performance measures like the SNR are used to determine optimal settings of the controllable factors.
In the Taguchi method [5] [6], the characteristics of the loss function are classified into three categories, such as the nominal the better, the higher the better, and the lower the better. The loss function is defined as where m is the target value of the product characteristic and y is the actual value.
The product has a good quality when the value of this loss function is made small. As a factor that causes variation in performance, the noise factor can be extracted from factors that can control the cause, and those that are difficult to find the cause and not easy to control. The loss function of the nominal the better is denoted by , since the loss function is determined by the given specified target value, such as length, weight, and so on. And the loss function of the higher the better is improved more in case of longer or larger, regardless of target value, such as lifetime or strength. The loss function is denoted by since the larger the value of attribute like quality, the better the product is. Finally, the loss function of the smaller the better is opposite to that of the higher the better. This means that the smaller the characteristic value such as the defective rate and vibration, the better. Generally, the loss function is given by The SNR for the smaller the better is sometimes used by attaching a negative number to convert into good value such as that of the nominal the better or the higher the better in case of having very small value.
In this section, the differences between the clustering methodology and the balanced assignment are explained with respects of mathematical objective. The Orthogonal array tables and SNR are needed in the process of applying the MTS method to the balanced assignment problem.

Basic Approach
To classify a large group with single attribute into several subgroups in terms of holding the similarity, it is possible to apply simple idea by distributing them sequentially after sorting the value in ascending or descending order. Even though many characteristic indicators to prove the equivalence of subgroups, the assigning criterion is to consider the average of subgroup and the variance representing quality index. A better result can be obtained by removing outlier before assigning.
In the case of having entity with two attributes, the method of solving the problem of balanced assignment is somewhat similar as the previous concept, but it is a bit more complex to approach in 2 dimensions. It is also effective to eliminate outlier in order to achieve the better balanced assignment. The quartile Y. Rhee is used to remove the outlier in the same way in entity with a single attribute. The only difference is that since an entity with two attributes is expressed as a point that meets in the X axis and Y axis on the coordinate, the point deviated from the quartile in each axes is regarded as outlier or anomalies. It is difficult to devise a method of balanced assignment that effectively distributes the outlier because the oddity cannot be assimilated with any group.
After removing outlier, data conversion is performed to each group of attribute. Data conversion is a statistical process to provide a kind of reference As seen in (2), . i x denote the average of the entity group i. The data of two groups seem to be different before converting data, but it can be proven that those are visually not exact but similar shape after the conversion. And also the minimum value of each attribute is subtracted from all attributes, so that it becomes a starting point on the X axis and the Y axis respectively after data conversion. As seen in Figure 3, the scatter plot for 2 attributes can be described in two dimensions: attribute 1 on the x-axis and attribute 2 on the y-axis. In order to classify a lot of points in the scatter plot into many small bundles, the line is drawn with regular intervals as seen in Figure 3. And all entities within a grid are regarded as homogeneous entities. The balanced assignment can be achieved by distributing each entity to subgroups one by one. In this way, the effective balanced assignment can be performed by adjusting the grid spacing.

MTS Basics
One of the methods on solving a complicate balanced assignment such as an entity having more than 2 attributes is to apply the concepts of the MTS method. Secondly, in order to obtain the Mahalanobis distance, it requires some mathematical processes such as attribute standardization, correlation matrix, and inverse correlation matrix applying (1) and (2). The Mahalanobis distance must be preceded by getting an inverse matrix of covariance using correlation analysis prior to data conversion. However, the correlation matrix can be easily obtained, since the data conversion has already been completed in order to obtain the Mahalanobis distance. The correlation coefficient between attribute i and attribute j is already known as

Mahalanobis Space Clustering
The Mahalanobis space is defined as a reference space to measure the Mahala-  In this study, the assignment should be made to ensure that the characteristics of the subgroups are similar, and that the attributes included in the characteristics are also similar after assignment, assuming that the balanced assignments should be made taking into account all attributes specified in the entity. And all entities in the big group must be distributed and formed the specified number of subgroups. The SNR is obtained to determine the influence between each experimental entity and the Mahalanobis space. The Mahalanobis distance between the Mahalanobis space and the remaining entities is applied as a scale in this study. And the quadratic loss function for the smaller the better is used as seen in (3), since the smaller distance between the Mahalanobis space and the entity means the closer it is.
The orthogonal array tables are also used to determine which objects are closer to the designated Mahalanobis space, before computing the SNR. The size of Y. Rhee the orthogonal array table is related with the number of entity given. On the orthogonal array table, 1 means that data is clustered and 0 means that the entity is not used. The fact that the lowest SNR value is selected to the cluster means that the closest entity is clustered. The smaller the SNR, the better it is. And by multiplying negative number, −1 to (3), it can be used as a feature of the higher the better.

Experimental Setup
In this section, an example to allocate multiple entities into several subgroups is presented for the validation purpose of the balanced assignment. The example is given by 30 entities and 3 attributes for each entity as shown in Table 1. And the characteristic of subgroup is tried to be similar when all entities are assigned to 3 subgroups. The result of the MTS method is checked by comparing it against the simulation result, and followed by the appropriate analysis.

MTS Clustering
In order to apply the MTS method, it is necessary to define the Mahalanobis space that can be used as a reference entity. In this study, the entities having the most extreme value of each attribute are set as the reference entities, and those are the Mahalanobis space. Table 2 is shown that 6 entities of A, B, C, D, E, and F are defined as the Mahalanobis space. In order to compute the Mahalanobis distance, the entity by each attribute must be converted using (2), and followed by correlation inverse matrix. The Mahalanobis distance between the Mahalanobis space and each data using (1) is shown in Table 3.  L orthogonal array table is not introduced in this study, since it is given as a basis of the experimental design. By applying (3), the SNR for the designated Mahalanobis space, is presented in Table 4.
The shaded areas in Table 4 represent the most suitable clusters for each Mahalanobis space. And based on this, the clustered entities is found in the Mahalanobis space corresponding to the orthogonal array table. The method of assigning the clustered entities by the Mahalanobis space can be another problem.
Here, simply the similarity among subgroups can be guaranteed by distributing the entities belong to the Mahalanobis space in sequential manner. The clustered result by the Mahalanobis space shows that there exists much duplication of entities. The duplicate entities can be regarded as a similar to each other even if they are assigned to any community corresponding to the Mahalanobis spaces. A balanced assignment should be done by assigning the non-duplicated entities first, and then the duplicated entities sequentially.

Simulation and Comparisons
In this section, the result of simulation is shown by changing the assignment criterion, and comparing it against the result of the MTS method. And the results of other simulations with corresponding to the assigning criteria are shown in Table 5.
As can be seen in Table 5, the results of allocating into 3 subgroups with different assigning criteria are examined for the mean and the variance. The assignment criteria referred in Table 5 arranges the MTS method, random simulation, and sequential assignment based on each attribute. The rightmost column in Table 5, the difference means the distance between the maximum value and minimum value after the corresponding assignment. Comparing the result of the MTS method and the simulation results, the results of the MTS method are comparatively satisfactory. And also the correlation between each attribute is examined through simple statistical analysis. The correlation between attribute 1 and attribute 2 is examined to be 0.91, indicating a strong positive correlation. This means that the attributes 1 and attribute 2 can be simplified by combining them into single attribute. The correlations between the other attributes are investigated a negative correlation.

Conclusions
In this paper, a balanced assignment methodology is proposed in case of entity with more than 2 attributes. The balanced assignment is a grouping method by matching the mean or variance of the attributes to make each subgroup similar.
The Mahalanobis distance is applied to ensure the balanced assignment, and the SNR (Signal to Noise Ratio) is utilized as a measure of classifying large groups with many entities into subgroups. The Mahalanobis distance is a basic idea of the MTS method as a distance indicator considering the correlation of entities in quality control techniques.
Through the simulation, the statistical analyzing process is performed on the balanced assignment by the case study. The results of the MTS method are comparatively satisfied by comparing the result of the MTS method with the simulation results. Validations against simulation data establish the tightness of this approach. By considering that the characteristics of the group before allocating are different from those of subgroup after assignment, the basic approach is to try to equalize the mean of each subgroup, and also to try to evaluate the variance representing the quality index of the subgroup.
Finally, for designing purpose of the balanced assignment problem, the standard criterion to determine the Mahalanobis space is not specified, and it is determined by the subjective judgment of the designer himself. Therefore, a systematic procedure for determining the Mahalanobis space is needed considering the importance of the design depth, since the designation of this space is crucial for finding a better solution.