A Heuristic Search Approach to Multidimensional Scaling

Abstract

This research effort presents an approach to accomplish Multidimensional Scaling (MDS) via the heuristic approach of Simulated Annealing. Multidimensional scaling is an approach used to convert matrix-based similarity (or dissimilarity data) into spatial form, usually via two or three dimensions. Performing MDS has several important applicationsGeographic Information Systems, DNA Sequencing, and Marketing Research are just a few. Traditionally, classical MDS decomposes the similarity or dissimilarity matrix into its eigensystem and uses the eigensystem to calculate spatial coordinates. Here, a heuristic search-based approach is used to find coordinates from a dissimilarity matrix that minimizes a cost function. The proposed methodology is used over a variety of problems. Experimentation shows that the presented methodology consistently outperforms solutions obtained via the classical MDS approach, and this approach can be used for other important applications.

Share and Cite:

McMullen, P. (2022) A Heuristic Search Approach to Multidimensional Scaling. American Journal of Operations Research, 12, 179-193. doi: 10.4236/ajor.2022.125010.

1. Introduction

It is often important to visualize data when the data is presented in a primitive form. The visual perspective enables more of an intuitive understanding of the data and interrelationships between data points. Consider several viral DNA sequences to be compared—it would be helpful to see the degree of similarity (or difference) between them all. Of course, some statistics might be available to assist, but placing them into some spatial arrangement (to enable visualization) would be most helpful. To do this, however, some effort is required.

Consider another example where we are given a matrix of mileages between cities. This data might make more sense if it were converted into the form of a two-dimensional map, so that we could visualize the distances between all cities. Again, this would also take some work.

These example scenarios are good candidates for the use of Multidimensional Scaling. Multidimensional Scaling (MDS hereafter) is a technique used to convert similarity (or dissimilarity) data into spatial form, so that the data set can be visualized [1]. Visualizing data simplifies understanding. The mathematics required to perform multidimensional scaling are somewhat intense, as Eigen decomposition of the given matrix is required. Additionally, results in terms of accuracy can be sub-optimal. As such, other alternatives are worth exploring.

One such general alternative is a search heuristic. A search heuristic usually starts with a feasible, randomly-generated solution with the intent of eventually finding an optimal (or near-optimal) solution to an optimization problem. The optimality of a solution is determined via an objective function measure. During the search process, feasible solutions are modified to a minor degree with hopes of improvement ultimately approaching global optimality. While this search process is ongoing, newly obtained solutions replace incumbent solutions. Sometimes, these incumbent solutions are replaced by relatively inferior newly obtained solutions. This might seem odd, but research has shown, that when properly implemented, search heuristics that occasionally replace incumbent solutions with relatively inferior solutions can help avoid being trapped at local optima [2]. Tabu Search [3] [4], Simulated Annealing [5] and Genetic Algorithms [6] are three types of search heuristics that fit the description above. When properly constructed and implemented, search heuristics can provide near-optimal results with less computational effort than other solution approaches.

For this research effort, a Simulated Annealing approach is used to address an MDS problem with a distance matrix as input that outputs a set of Cartesian Coordinates. Specifically, the following sections: describe classical multidimensional scaling; describe a simulated annealing based alternative to the MDS problem, outline an experiment for implementation of the proposed methodology; detail experimental results; and draw conclusions.

2. Classical Approach to Multidimensional Scaling

Classical Multidimensional Scaling is a technique used to convert similarity (or dissimilarity) data into m-dimensions. This conversion is done so that a spatial representation of the data can be pursued. The given data is typically provided in n x n matrix form. Each element in the matrix is some pairwise measure of the similarity or dissimilarity between data points. The conversion of the data from matrix to spatial form is done for a variety of reasons: market segmentation, graphing, matching, sequencing, etc. Any attempt to visualize observations when given similarity data can be thought of as an application of MDS.

In classical MDS, an n × n dissimilarity matrix is provided, which is referred to as D. This matrix contains elements dij, which shows the Euclidean distance between points i and j. For further analysis, the matrix D2 is constructed, which is the square of the values of dij. Centering matrix C is then used, which is I − ((1/n)(Jn)), where Iis the identity matrix (values of “1” on the main diagonal, “0” elsewhere), and Jn is a matrix of “1”. These elements form the double-centering matrix B. This double-centered matrix is constructed so that the means of all row and column vectors are zero [7]. This matrix is as follows:

B = ( 1 / 2 ) C D 2 C (1)

The B matrix is then decomposed into its eigensystem. Specifically, the m-largest eigenvalues and eigenvectors are captured. A spatial representation of the data is provided by the n by m matrix X, which is determined as follows:

X = E Λ 1 / 2 , (2)

where E is an n × m matrix of the m-largest eigenvectors, and L1/2 is the corresponding m × m matrix of the square-root of the m-largest eigenvalues on the main diagonal, with values of “0” elsewhere [8] [9].

This eigenpair approach exploits the dissimilarity matrix D to obtain spatial coordinates (X).

Z = i = 2 n j = 1 i 1 ( d i j ( x i 1 x j 1 ) 2 + ( x i 2 x j 2 ) 2 ) 2 (3)

In Equation (3) above, the xij values are simply values of the X matrix. The above equation assumes that a 2-dimensional spatial solution is desired. The value Z can be thought of as an objective function value. While this equation might appear intimidating, it is not—the objective is simply to find a solution (values of xij) resulting in an aggregate minimized distance equal to the given distance matrix. When the xij values result in distances that equal the dij distances, the objective function value has been minimized to a value of “0”.

Classical MDS decomposes an n × n matrix and outputs an n × 2 (or n × 3). This means that n − 2 eigenpairs are lost for a two-dimensional spatial solution, or n − 3 eigenpairs are lost for a three-dimensional spatial solution. These “lost” eigenpairs limit our ability to explain the variation in the distance data, as each eigenpair assists in explaining the variation in the distance data. This, of course, is unfortunate. Additionally, decomposing a matrix into its eigenpairs is computationally expensive. As such, a viable alternative is worth exploration.

3. Heuristic Approach to Multidimensional Scaling

Regardless of the approach used to perform MDS, the objective is the same—minimization of the objective function shown in Equation (3). In order to present the methodology used to solve this problem, the following parameters are defined in Table 1.

As stated before, Simulated Annealing is the chosen search heuristic to address the MDS problem at hand. Simulated Annealing gets its name from “annealing,” which is the heating of a solid (usually a metal) to a high temperature and then the slow, subsequent cooling of the solid. During the cooling process,

Table 1. Search parameters.

the particles arrange themselves from a chaotic state into an ordered state. This cooling process is intended to enhance some physical property of the solid. With “simulated” annealing, the objective function is analogous to the desired physical property of the solid—we start with a randomly-generated solution (analogous to a solid at a high temperature) and modify the solution until a desired condition is obtained. Early in the search, there is more liberality in replacing the incumbent solution as compared to later in the search—this is analogous to the “chaotic” state early in the annealing process compared to the more “ordered” state later in the annealing process.

The following subsections detail the Simulated Annealing heuristic search process used to find a solution to the problem at hand.

Step 1: Initialization

An initial solution is constructed by assigning random values to the xab values. Specifically, this is done as follows:

x a b = 25 0 + U ( 0 , 5 00 ) , a , b (4)

where U(0, 500) is a uniformly-distributed random variable on the (0, 500) interval. It should be noted that our solution boundary is confined to the (0, 500) interval for both the horizontal and vertical axes. The “test” solution and “best” solution values are set equal to this initial solution. Specifically, this is as follows:

( x a b ) t = x a b , a , b (5)

( x a b ) b = x a b , a , b (6)

The initial solution is also used to determine the objective function value, Z. This is done as follows:

Z = i = 2 n j = 1 i 1 ( d i j ( x i 1 x j 1 ) 2 + ( x i 2 x j 2 ) 2 ) 2 (7)

The objective function values for the “test” and “best” solutions are assigned this same value as follows:

Z t = Z (8)

Z b = Z (9)

Step 2: Modification

In order to improve the current solution such that it results in an optimal (or at least “near optimal”) condition, it must be “modified”. Specifically, the “test” solution must be modified as the first step toward improving the current solution. This modification is done to one of the n data points. The modification is a “minor” modification—large modifications are avoided such that a controlled improvement process is followed.

First, the cost vector is obtained via the following:

c i = j = 1 n ( d i j ( x i 1 t x j 1 t ) 2 + ( x i 2 t x j 2 t ) 2 ) 2 , i (10)

The cost vector shows the amount of “error” each of the n locations contributes to the objective function value. Higher cost values contribute more to sub-optimality. Each potential target data point (q) for the modification has the following probability of being selected:

P ( q ) = c i / i = 1 n c i (11)

Monte-Carlo simulation is used to select the target value q. Cities with higher cost vectors are more likely to be selected as targets for modification, the actual modification of the selected data point (q) is as follows:

( x q 1 ) t = ( x q 1 ) t + T 25 ( 1 U ( 0 , 1 ) ) (12)

( x q 2 ) t = ( x q 2 ) t + T 25 ( 1 U ( 0 , 1 ) ) (13)

As stated, this modification is done for all m dimensions of the targeted data point (m = 2 is used here). The current temperature value (T) is used in this calculation so that the aggressiveness of each modification decreases proportionally to the value of T.

Step 3: Objective Function

The modified test solution is then used to determine the objective function value (Zt) as shown below:

Z t = i = 2 n j = 1 i 1 ( d i j ( x i 1 x j 1 ) 2 + ( x i 2 x j 2 ) 2 ) 2 (14)

Step 4: Solution Comparison

Test 1:

If the objective function value associated with the test solution is less than the objective function associated with the current solution, the test solution replaces the current solution. Specifically, the following applies:

x i j = ( x i j ) t (15)

Z = Z t (16)

Otherwise, the difference between the current solution and test solution is determined:

δ E = ( Z t Z ) / Z (17)

This difference is shown in relative form, with the value being represented by δE. This value is always positive, since Zt > Z. This value is used to determine whether-or-not the inferior test solution should replace the current solution, via a probabilistic condition. Specifically, the value PA represents the probability of replacing the current solution with the test solution. This probability is as follows:

P A = exp ( δ E / k B T ) (18)

The value kB is referred to as the Boltzman Constant [10], and is user-determined such that a certain degree of relative inferiority (δE) results in the test solution having a specific probability of replacing the current solution.

A uniformly-distributed random number on the U(0, 1) interval is then generated. If this random number is less than PA, the test solution replaces the current solution, in accordance with Equations (12) and (13).

Test 2:

If the objective function value associated with the test solution is less than the objective function associated with the best solution, the test solution replaces the best solution.

( x i j ) b = ( x i j ) t (19)

Z b = Z t (20)

Otherwise, no action is taken.

Step 5: Incrementation

Steps 2, 3 and 4 are repeated Iter times. The value of T is then updated as follows:

T = T C R (21)

This continues while T > 1. When T ≤ 1, the simulation has concluded. When the simulation has concluded, the current solution is replaced by the best solution, and the value of T is set to T1. Mathematically, this is done as follows:

x i j = ( x i j ) b (22)

Z = Z b (23)

T = T 1 (24)

This is repeated Sim times, and the best solution found is reported.

To gain a better understanding of the presented methodology, it is presented in pseudocode via Figure 1.

As a point of clarification, it should be noted that repeated simulations in an effort to obtain an optimal solution may present some confusing terminology. As such, Figure 2 below shows the hierarchy of terms used here.

Figure 1. Search heuristic in pseudocode.

Figure 2. Hierarchy of simulation elements.

A simulation is a subset of a specific solution. Specifically, there are Sim simulations performed for each solution. This is one of the defenses against being “trapped” at local optima. At the conclusion of each simulation (Sim) during the search process, the “current” solution is replaced by the “best” solution. This provides the next simulation (Sim) with the best possible starting point in the attempt to find the optimal solution. If this were not done, the search process is likely trapped at a sub-optimal solution. A single instance of executing the presented methodology is classified as a “solution”. A “simulation” in this context is only a subset of the solution process.

4. Experimentation

An experiment is designed to evaluate the effectiveness of the presented methodology. A distance matrix was obtained for 36 metropolitan areas in the continental United States. These are “air-distances”. These values were obtained via the website https://www.airmilescalculator.com/. These distances can be thought of as “links” in the parlance of networks. The number of links in a network of n locations is n(n − 1)/2. These links can also be thought of as pairwise distances.

4.1. Implementation

The methodology presented is coded via the NetLogo software package (https://ccl.northwestern.edu). NetLogo is a Java-based programming environment, particularly suited to enable agent-based simulation [11] [12]. Agent-based simulation is desired here, as the n cities can be thought of as agents.

The distance matrix is used to address problems ranging from four cities (n = 4) to 36 cities (n = 36). All values of n between 4 and 36 are used for assessment. As such, 33 (1 + 36 − 4) unique problems are addressed. Each problem is solved five times so that reliable estimates of performance are available. An example solution is detailed in Figure 3.

Figure 3 shows a solution for a problem with n = 26 cities. The “best” solution found for this example has an objective function value of Zb = 24.13. This value quantifies the square root of the sum of squared differences between the actual (given) distance matrix and the distance matrix associated with the “best” solution ((xab)b). In this case, there is a total difference of 24.13 US miles between the two matrices. A value of Zb = 0 is considered optimal—which would imply no difference between the actual distance matrix and the distance matrix associated with (xab)b.

Note from Figure 3 the solution resembles the map of the continental United States if the image is rotated about 45 degrees in the counter-clockwise direction. The search is not concerned with the proper “orientation” of the solution—it is only concerned with the proper location of the cities with respect to the other cities.

Figure 3. Example problem solution.

4.2. Computational Experience

The program was run on a Windows machine, with an Intel Core 8565U processor, at 4.6 GHz. The user-chosen parameters were chosen via trial and error, so that the “best” solutions could be found repeatedly. This is detailed in Table 2.

For all solutions, a cooling rate (CR) of 0.99 was used, along with 75 iterations per temperature level (Iter). The starting temperature (T1) was 25 and the final temperature (TF) was 17.5. The kB value was set such that a solution with 1% relative inferiority would replace the incumbent solution with a 10% probability.

4.3. Stagnation

During the development of the search process, it was noticed that the solution was getting trapped at local optima, despite efforts to prevent this. This local optima trapping phenomenon is referred to as “stagnation” hereafter. Specifically, improvement was impeded by cities being out of place. That is, cities located in places where minor modifications are not aggressive enough to improve the objective function value, resulting in solutions becoming stagnant. There are two types of stagnation requiring treatment—single city stagnation and two-city stagnation.

4.3.1. One-City Stagnation

Figure 4 shows another solution for n = 26 cities.

Proper bearings might be realized if the solution is rotated about 150 degrees in the counter-clockwise direction. An unfortunate development then becomes apparent. Miami, Florida (MIA) is “out of place”. MIA should be closer to the other Florida locations of Tampa and Orlando (TAM and ORL). Other cities seem relatively properly placed. Unfortunately, the modification process described above will not correct the problem, as this modification process can only accomplish minor modifications—as such, MIA will not be moved anywhere that is helpful. Instead, the MIA location must be moved in a more aggressive fashion. The general procedure for this lies with the cost vector (c). The maximum value in the cost vector displays the most “offensive” city in terms of contribution

Table 2. Simulation settings.

Figure 4. Example of one-city stagnation.

to the objective function value. The city associated with the highest value of the cost vector is then selected as the target. This target city is referred to as “q”. The modification to city q is as follows:

( x q 1 ) t = ( x q 1 ) t + 5 00 ( 1 U ( 0 , 1 ) ) (25)

( x q 2 ) t = ( x q 2 ) t + 5 00 ( 1 U ( 0 , 1 ) ) (26)

This modification is similar to the one done in normal circumstances detailed above with two key differences. The modification to combat the stagnation is not sensitive to the value of T. Additionally, the modification here is much more aggressive than the modification performed under normal circumstances, so that the “offending” city can be moved further that normal.

4.3.2. Two-City Stagnation

Figure 5 shows yet another solution. If the image is rotated about 45 degrees in the counter-clockwise direction, proper perspective might be realized.

This solution, at first, might appear to be “reasonable”. Upon further inspection, however, one might notice a “disconnect” between the eastern cities and western cities. Specifically, the western cities seem to be inverted when compared to the eastern cities. Seattle and Portland (SEA and POR) are shown very much out of place. Unfortunately, this condition cannot be remedied via the single-city stagnation just described. The two “offensive cities” are properly oriented with respect to each other and other nearby cities. This prevents us from moving, for example, SEA to its proper general location because the result would still result in a suboptimal condition—a large distance between SEA and POR would be harmful. Moving both offending cities simultaneously by the same amount is

Figure 5. Example of two-city stagnation.

found to alleviate the problem. This two-city movement is done by re-examining the cost vector c.

Specifically, the following value Y is calculated:

Y = max ( c i + c j ) (27)

The row and column resulting in the maximum value Y are the two cities contributing most to sub-optimality. Here “r” is used as row index “i” and “s” is used as column index “j”. These two targets are moved in an aggressive fashion. The following shows the movement process. The dx and dy terms will show horizontal and vertical movements, respectively.

d x = 5 00 ( 1 U ( 0 , 1 ) ) (28)

d y = 5 00 ( 1 U ( 0 , 1 ) ) (29)

The following movements are then made for cities r and s.

( x r 1 ) t = ( x r 1 ) t + d x (30)

( x r 2 ) t = ( x r 2 ) t + d y (31)

( x s 1 ) t = ( x s 1 ) t + d x (32)

( x s 2 ) t = ( x s 2 ) t + d y (33)

Cities r and s are moved the exact same amount and in the same direction. The distance between them will not change. It is desired that their simultaneous movement will at some point induce a condition that ends stagnation and move towards an optimal solution.

It should be noted that the search heuristic performed modification to address both forms of stagnation on a periodic basis during the search.

4.4. Convergence

With stagnation addressed, attention is turned to how well the solutions approach their desired objective function values. It is, of course, desired that the objective function value finds its minimum as efficiently as possible. Figure 6 below shows three types of convergence noticed during the development process. The horizontal axis represents time (the “age” of the search), while the vertical axis shows the “best” objective function value found at the associated time. For example, at the start of the search, the objective function value is high (undesirable) because of randomly-generated initial solution is not motivated in terms of the objective function value. Improvement then commences because the search process is motivated to find minimal values of the objective function value.

The blue line in Figure 6 shows the first type of convergence, which is classified here as “quick convergence”. This is where the objective function achieves a near-optimal value without incident. The orange line shows what is classified as “stagnation remedied”. This is where during the search process, stagnation occurs, but is remedied via the approaches described above, and a near-optimal condition is eventually obtained. Note the “rapid improvement” for the orange line when stagnation is remedied. The grey line is classified as “stagnation not remedied”. This is where stagnation occurs, and is not remedied. Fortunately, this research effort only experienced “stagnation not remedied” during the developmental phase—once this condition was observed during development, the search heuristic was modified such that stagnation was always remedied. The results section further details this.

Figure 6. Convergence types.

5. Experimental Results

Table 3 below shows the results for each value of n. For each value of n, a solution was obtained (5) times. The mean objective function value is reported along with its standard deviation. The ratio of standard deviation to mean is less than 1% for each value of n. As such, the small variation in the mean provides a large degree of confidence in the estimates. Also reported is the objective function value when the classical MDS approach (“Eigen”) is used.

As one can see, the search approach provides vastly superior results as compared to the classical approach (“Eigen”). Figure 7 details these findings.

Figure 7. Number of cities (n) and objective function.

Table 3. Results of search heuristic.

The blue line shows the mean objective function value associated with the presented search approach, as a function of the number of cities (n). The orange line shows the objective function value via the classical MDS approach as a function of n.

6. Concluding Comments

A search heuristic has been presented to convert a distance matrix into spatial coordinates. Some potential pitfalls encountered such as “stagnation” were encountered and remedied. The final results show objective function values that are near-optimal. Additionally, these results are superior to those obtained via classical MDS. Classical MDS is unable to compete with the direct search heuristic presented here.

As stated earlier, MDS can be used for a variety of applications. It so happens the application used here was to convert distance data into spatial coordinates. The intent of this research is not to just find spatial coordinates, but to illustrate an approach to obtain a good-quality solution to an MDS problem. If one just wanted spatial coordinates for locations, they could simply look them up on the internet. The example here is considered reasonable because the end-user should have a pre-conceived belief as to where the locations should be assisting with understanding the search process. This search process could be used for most any situation where it is desired to convert similarity (or dissimilarity data) into spatial data.

There are, of course, opportunities for research beyond this effort. More locations would be an interesting pursuit, as would consideration of multiple geographic regions (i.e., continents). Of course, the search parameters could also be further explored. Other possible follow-up efforts could be related to other applications of MDS, such as consumer segmentation, information storage, and DNA sequence comparison.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Mboko Ibara, S. and Ossouna, D. (2021) Determinants of Multidimensional Poverty among the Under-Five: Illustration Based on Data from the Congo Multiple Indicator Cluster Survey. Theoretical Economics Letters, 11, 363-380.
https://doi.org/10.4236/tel.2021.112024
[2] Bianchi, L., Dorigo, M., Gambardella, L.M. and Gutjahr, W.J. (2009) A Survey on Metaheuristics for Stochastic Combinatorial Optimization. Natural Computing, 8, 239-287.
https://doi.org/10.1007/s11047-008-9098-4
[3] Glover, F. (1989) Tabu Search—Part 1. ORSA Journal on Computing, 1, 190-206.
https://doi.org/10.1287/ijoc.1.3.190
[4] Glover, F. (1990) Tabu Search—Part 2. ORSA Journal on Computing, 2, 4-32.
https://doi.org/10.1287/ijoc.2.1.4
[5] Kirkpatrick, S., Gelatt Jr., C.D. and Vecchi, M.P. (1983) Optimization by Simulated Annealing. Science, 220, 671-680.
https://doi.org/10.1126/science.220.4598.671
[6] Goldberg, D.E. (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Kluwer Academic Publishers.
[7] Marden, J.I. (1995) Analyzing and Modeling Rank Data. Chapman & Hall, New York.
[8] Kruskal, J.B. (1964) Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis. Psychometrika, 29, 1-27.
https://doi.org/10.1007/BF02289565
[9] Johnson, R.A. and Wichern, D.W. (2018) Applied Multivariate Statistical Analysis. 6th Edition, Pearson.
[10] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953) Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics, 21, 1087.
https://doi.org/10.1063/1.1699114
[11] Wilensky, U. and Rand, W. (2015) An Introduction to Agent-Based Modeling: Modeling Natural, Social and Engineered Complex Systems with NetLogo, MIT Press.
[12] McMullen, P. (2020) An Agent-Based Approach to the Newsvendor Problem with Price-Dependent Demand. American Journal of Operations Research, 10, 101-110.
https://doi.org/10.4236/ajor.2020.104006

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.