Correlation and Simultaneous Linear Regression

Abstract

Hirschfled (1935) posed the question. Is it always possible to introduce new variates for the rows and the columns of the contingency-table such that both regressions are linear. In reply, he derived the formulas of dual sealing. This approach was later employed by Lingoes (1963, 1968) who was obviously unaware of Hirschfeld’s study, but noted that the approach would use the basic theory and equation worked out by Guttman (1941). We have to use a graphic with linear regression to find optimal weight which has good results by using a correlation as a new step to adjusting the spacing of rows and columns after quantification is linear, the condition under which correlation attains its maximum. It shall present here merely an example to illustrate the date have a certain ρ = 0.65277 between x and y which increases to reach the maximum value then the relation becomes a straight line which illustrates the maximum value of ρ.

Share and Cite:

Sheet, K.F. and Sadiq, K.M. (2022) Correlation and Simultaneous Linear Regression. Open Access Library Journal, 9, 1-7. doi: 10.4236/oalib.1108425.

1. Introduction

Let us see the data in Table 1, suppose that those option weights and subject scores are simultaneously assigned to the responses (i.e., I’s) in Table 2, resulting in the table of weighted responses. Can you figure out how this table is prepared?

You can see the weight of option of each item chosen by subjects in Table 2 which constitutes the second term of each pair of Table 3. The left-hand side of each pair is nothing but the corresponding subject’s score. As Guttmann (1941) reasoned, you can say that the two unknown equations in each pair in Table 4

Table 1. Data in terms of weights for options.

Table 2. Data in terms of scores for subjects.

Table 3. Simultaneously weighted data.

Table 4. Multiple-choice data (categorical data).

are assigned to the same response, that is, they are the common descriptions of a single response, and therefore, the two unknowns should be given as similar values as possible (Block & Jones, 1968) [1].

One of the popular measures of the relationship between a pair of variables is the so-called product-moment correlation or Pearson tan correlation. This measure indicates the degree of linear relationship, which is the tendency that as one variable increases, the other increases, too. Let us indicate this correlation by ρ. To simplify the expression for ρ, let’s choose the units and the origins of y’s and x’s as follows:

(The sum of squares of responses weighted by yi) = (The sum of squares of responses weighted by Xi) = d, and (the sum of responses weighted by yi) = (the sum of responses weighted by xj) = 0.

Don’t worry about these conditions on yi and xj because they will not alter the value of ρ or η2. Now ρ can be expressed simply as:

ρ = the sum of products of paired weights d = f i j Y i X j d (1)

where fij = 1 or 0 as shown in Table 1.

Dual scaling is also a technique to determine Yi and Xj in such a way that ρ is a maximum (Block & Jones, 1968) [1]. Note again that these subject scores, Yi, and option weights, Xj, are identical to those obtained by the methods discussed so far. In addition, you should note that:

ρ = η , that is, η 2 = ρ 2 (2)

In statistics, the squared product-moment correlation is not equal to the squared correlation ratio generally. The equality between them as shown in Equation (2) is strictly a result of the duality of this scaling method (Fisher, 1940) [2].

We here try to illustrate the ordinary correlation and then go to what you mention in another research.

It is important to look at this approach to dual scaling as applied to the contingency table, because it will offer you another opportunity to see the distinction between continuous data and categorical data in analysis (Guttman, 1946) [3]. Let us consider a contingency table which is typically obtained by asking two multiple-choice questions consider the following questions:

Q1. How do feel about taking sleeping pills?

( ) strongly for, ( ) for, ( ) neutral, ( ) against, ( ) strongly against.

Q2. Do you sleep well every night?

( ) never, ( ) rarely, ( ) some nights, ( ) usually, ( ) always.

Suppose you obtain the data from 140 subjects as shown in Table 5.

The important distinction between continuous and categorical data, referred to previously, can now be explained as follows. Suppose you assign weight y1 to

Table 5. Sleeping and sleeping pills.

the “strongly for” option of Q1.

This is a step to illustrate the weights when you have more than one way to find the optimal solution, this is true for y1 , y2, y3 , y4 ,y5 as same as.

The element in the first row (strongly for) and the first column (never) in the table. That is, 15, is now given weight y, so that the weighted response is 15y, so far is the same for both types of data once you consider the sum of squares of weighted response, however, you will recognize the difference in the meaning of the expression 15y1 between the two types. In continuous data, 15 is a single number or a quantity (Nishisato & Clavel, 2003) [4]. Then by using the matrix including the variables of rows and columns, we find the other values for the remained question.

Therefore, the square of this weighted response is ( 15 y 1 ) 2 = 225 Y 1 2 . In contrast, 15y1, in dual scaling means that each of 15 responses is given y1, hence the sum of squared responses being equal to Y 1 2 + Y 1 2 + + Y 1 2 = 15 Y 1 2 .

Do you see this distinction? When you derive formulas for categorical data, this is of the utmost important importance, because it is one of the main distinctions in the formulation of categorical data analysis that of continuous data.

Dual scaling of the data in Table 5 determines five weights Yi for the options of Q1, and five weights Xj for the options of Q2 in such a way that statistic ρ is maximum. In the formula:

η Y i = f i j X j f i , η x j = f i j X j f . j (3)

fij is no longer 1 or 0 but the frequency of row i and column j of Table 5. You may wonder what the above operation of maximizing ρ really means. Let us start with a case of (nonoptimal weights) (Nishisato, 2014) [5].

2. Numrical Example

Suppose that you decide, as most people do, to use your subjective, or common-sense, weights of −2, −1, 0, +1, +2 for the options “never, rarely”, some nights, usually always respectively, for Q2. Using these weights, calculate the mean weighted response for “strongly for” of Q1 Thus, mean,

m 1 ( stronglyforQ1 ) = 15 x 2 ( 2 ) + 8 x 2 ( 1 ) + 3 x 2 ( 0 ) + 2 x 2 ( 1 ) + 0 x 2 ( 2 ) 28 = 1.3

using the same way for all m’s of Q1, Q2 and Table 6 and Table 7 show the values.

How good are these common sense weights in explaining the data? One way to check is to construct a graph where you plot these means. Just calculated, against your say subject weights (Figure 1). Assuming as before that you assign weight yi for row i (Q1) and Xi for column j (Q2). Let us call the plot of ni against the subjective column weights. The “regression of Y on X” and the plot of nj against the row weights the “regression of X on Y”. This graph alone does not tell us much, So, just wait until you see the corresponding results when you use instead of your subjective weights, optimal weights (Guttman, 1946) [3]. obtained from dual scaling, that is, those weights that maximize ρ as is given in Nishisa (1980a, pp 66-68), we show you only the graph obtained without computation, but using dual scaling weights (Figure 2). Can you see that this is a remarkable plot? Both lines are straight and their slopes are identical! Those optimal weights had the effect of adjusting the spacing of rows and columns in such a way that the relation between rows and columns after quantification is linear, the condition under which ρ attains its maximum. This remarkable characteristic was termed simultaneous linear regression by Lingoes (1964) indeed, it served as the criterion in Hirsch Feld’s (1935) formulation of this quantification method specifically. Hirschfield posed the questions: Is it always possible to introduce new variates for the rows and the columns of a contingency table such that both regressions are linear?

You can see Hirschfield’s results in the expressions of dual relations or transition Formulas (4) As you recall ρ in formula (4) is the same as ρ in the:

ρ = The sum of products of paired weight d

ρ 2 = f i j Y i X j d (4)

Table 6. Means for Q1.

Table 7. Means for Q2.

where ρ = η that is η2 = ρ2 which is the key quantity, called the parameter, in linear regression. There is one more approach which is also obvious now that you know the dual relations (Nishisat, 2014) [5]; Nishisato and Shen, 1984 [6] ). This approach provides a simple method of calculating “optimal weights” (Table 8).

Figure 1. Graph for subjective weights.

Figure 2. Optimal weights.

Table 8. Optimum weight values.

3. Conclusions

We have to use a graphic with linear regression to find optimal weight as a new method instead of the iterative method.

Correlation and simultaneous linear regression is a good and interesting process to find the optimal solution for any problem we have, and we need to find correlation as we have in our paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Block, R.D. and Jones, L.V. (1968) The Measurement and Prediction of Judgementand Choices. Holden-Day, San Francisco.
[2] Fisher, R.A. (1940) The Precision of Discriminant Functions. Annals of Eugenics, 10, 422-429. https://doi.org/10.1111/j.1469-1809.1940.tb02264.x
[3] Guttman, L. (1946) An Approach for Quantifying Paired Comparisons and Rank Order. The Annals of Mathematical Statistics, 17, 144-163. https://doi.org/10.1214/aoms/1177730977
[4] Nishisato, S. and Clavel, J.G. (2003) A Note on Between-Set Distances in Dual Scaling and Correspondence Analysis. Behaviormetrika, 30, 87-98. https://doi.org/10.2333/bhmk.30.87
[5] Nishisato, S. (2014) Elements of Dual Scaling: An Introduction to Practical Data Analysis. Psychology Press, New York. https://doi.org/10.4324/9781315806907
[6] Nishisato, S. and Sheu, W.J. (1984) A Note on Dual Scaling of Successive Categories Data. Psychometrika, 49, 493-500. https://doi.org/10.1007/BF02302587

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.