_{1}

^{*}

This paper presents the dual specification of the least-squares method. In other words, while the traditional (primal) formulation of the method minimizes the sum of squared residuals (noise), the dual specification maximizes a quadratic function that can be interpreted as the value of sample information. The two specifications are equivalent. Before developing the methodology that describes the dual of the least-squares method, the paper gives a historical perspective of its origin that sheds light on the thinking of Gauss, its inventor. The least-squares method is firmly established as a scientific approach by Gauss, Legendre and Laplace within the space of a decade, at the beginning of the nineteenth century. Legendre was the first author to name the approach, in 1805, as “méthode des moindres carrés”, a “least-squares method”. Gauss, however, used the method as early as 1795, when he was 18 years old. Again, he adopted it in 1801 to calculate the orbit of the newly discovered planet Ceres. Gauss published his way of looking at the least-squares approach in 1809 and gave several hints that the least-squares algorithm was a minimum variance linear estimator and that it was derivable from maximum likelihood considerations. Laplace wrote a very substantial chapter about the method in his fundamental treatise on probability theory published in 1812.

The least-squares method has primal and dual specifications. The primal specification is well known: Given a regression function (either linear or nonlinear) and a sample of observations, the goal is to minimize the sum of the squared deviations between the data and the regression relation, as discussed in Section 3. The dual specification is not known because it is not sought out over the past two hundred years. This paper presents such a dual specification in Section 4. First, however, the reader is offered a historical and illuminating perspective of the least-squares method in the words of its inventor.

Karl Friedrich Gauss, at the age of 18, conceived the least-squares (LS) method. However, he did not publish it until 1809, [

Some idea occurred to me in the month of September of the year 1801, engaged at the time on a very different subject, which seemed to point to the solution of the great problem of which I have spoken. Under such circumstances we not infrequently, for fear of being too much led away by an attractive investigation, suffer the associations of ideas, which more attentively considered, might have proved most fruitful in results, to be lost from neglect. And the same fate might have befallen these conceptions, had they not happily occurred at the most propitious moment for their preservation and encouragement that could have been selected. For just about this time the report of the new planet, discovered on the first day of January of that year with the telescope at Palermo, was the subject of universal conversation; and soon afterwards the observations made by the distinguished astronomer Piazzi from the above date to the eleventh of February were published. Nowhere in the annals of astronomy do we meet with so great an opportunity, and a greater one could hardly be imagined, for showing most strikingly, the value of this problem, than in this crisis and urgent necessity, when all hope of discovering in the heavens this planetary atom, among innumerable small stars after the lapse of nearly a year, rested solely upon a sufficiently approximate knowledge of its orbit to be based upon these very few observations. Could I ever have found a more seasonable opportunity to test the practical value of my conceptions, than now in employing them for the determination of the orbit of the planet Ceres, which during the forty-one days had described a geocentric arc of only three degrees, and after the lapse of a year must be looked for in a region of the heavens very remote from that in which it was last seen? This first application of the method was made in the month of October, 1801, and the first clear night, when the planet was sought for* (by de Zach, December 7, 1801) as directed by the numbers deduced from it, restored the fugitive to observation. Three other new planets, subsequently discovered, furnished new opportunities for examining and verifying the efficiency and generality of the method.

Several astronomers wished me to publish the methods employed in these calculations immediately after the second discovery of Ceres; but many things―other occupations, the desire of treating the subject more fully at some subsequent period, and, especially, the hope that a further prosecution of this investigation would raise various parts of the solution to a greater degree of generality, simplicity, and elegance―prevented my complying at the time with these friendly solicitations. I was not disappointed in this expectation, and I have no cause to regret the delay. For the methods first employed have undergone so many and such great changes that scarcely any trace of resemblance remain between the method in which the orbit of Ceres was first computed, and the form given in this work. Although it would be foreign to my purpose, to narrate in detail all the steps by which these investigations have been gradually perfected, still, in several instances, particularly when the problem was one of more importance than usual, I have thought that the earlier methods ought not to be wholly suppressed. But in this work, besides the solution of the principal problems, I have given many things which, during the long time I have been engaged upon the motions of the heavenly bodies in conic sections, struck me as worthy of attention, either on account of their analytical elegance, or more especially on account of their practical utility, see [

This lengthy quotation points to several aspects of discovery of which scientists were aware more than two hundred years ago: elegance as a crucial scientific criterion, serendipity, and the importance of long periods of reflection in order to better understand the properties of new methods. This last aspect perfectly fits the spirit of the present note that is devoted to the presentation of the dual specification of the least-squares method, a property that was neglected for over two hundred years.

Another striking feature of Gauss’ thinking process about measuring the orbit of heavenly bodies consists in his clearly stated desire to achieve the highest possible accuracy, see [

It can only be worth while to aim at the highest accuracy, when the final correction is to be given to the orbit to be determined. But as long as it appears probable that new observations will give rise to new corrections, it will be convenient to relax more or less, as the case may be, from extreme precision, if in this way the length of the computations can be considerably diminished. We will endeavor to meet both cases”.

Here, Gauss seems to be totally aware of the problem connected to out-of-sample prediction and the necessity or, at least, convenience of a recursive algorithm to account for the information carried by new observations.

Gauss’ reading becomes even more exciting, see [

Gauss proceeds to state, analytically, the function that represents the probability of an event composed of many observations and to derive from such a statement the least-squares principle, see [

Gauss did not name his approach as the least-squares method. This name was suggested first by Adrien Marie Legendre in 1805. In his preface, Legendre states, see [

There remains to mention Laplace. In 1812, he published a fundamental textbook about probability theory, see [

When all the properties and features of the LS method were thought to be well known, and when all the possible ways of obtaining the least-squares estimates of a linear system’s parameters were thought to have been discovered, there surfaced an intriguing question: What is the dual specification of the least-squares method? It is difficult or, better, impossible to conjecture whether such a question could have occurred to either Gauss, or Legendre, or Laplace. The Lagrangean method [

We abstract from any statistical distribution of the error terms and hypothesis-testing consideration. The traditional (primal) LS approach consists of minimizing the squared deviations from an average relation of, say, a linear model that consists of three parts:

where y is an

In the terminology of information theory, relation (1) may be regarded as representing the decomposition of a message into signal and noise, that is

with the obvious correspondence: y = message,

The least-squares methodology, then, minimizes the squared deviations (noise) subject to the model’s specification given in Equation (1). Symbolically,

Primal:

subject to

where SSD stands for sum of squared deviations. An intuitive interpretation of the objective function (3) is the minimization of a cost function of noise. We call model (3) and (4) the Primal LS model. The solution of model (3) and (4) by any appropriate mathematical programming routine gives the LS estimates of parameters

Traditionally, however, the LS method is presented as the minimization of the sum of squared deviations defined as

The Lagrange approach is eminently suitable for deriving the dual of the least-squares method. Hence, choosing the (n ´ 1) vector variable e to indicate n Lagrange multipliers (or dual variables) of constraints (4), the relevant Lagrangean function is stated as:

with first order necessary conditions (FONC)

A first remarkable insight is that, from FONC (6), the Lagrange multipliers (dual variables), e, of the LS method are identically equal to the deviations (primal variables, noise), u. Each observation in model (4), then, is associated with its specific Lagrange multiplier that turns out to be identically equal to the corresponding deviation. A Lagrange multiplier measures the amount of change in the objective function due to a change in one unit of the associated observation. If a Lagrange multiplier is too large, the corresponding observation may be an outlier. Secondly, FONC (6) and (7), combined into

Therefore, the Lagrangean function can be streamlined as:

using relations (7) and (9).

The Dual of the LS model can now be assembled as:

Dual:

subject to

Constraints (12) constitute the orthogonality conditions of the LS approach, already mentioned above. An intuitive interpretation of the dual objective function can be formulated within the context of information theory. Hence, the dual problem seeks to maximize the net value of the sample information (NVSI). Typically, dual variables (Lagrange multipliers) are regarded as marginal sacrifices or implicit (shadow) prices of the corresponding constraints. We have already seen that dual variables e are identically equal to primal variables u. Thus, in the LS specification, the variables u have a double role: as deviations in the primal model (noise) and as “implicit prices” in the dual model. The quantity

In the dual model, the vector of parameters

where

Hence, from Equation (13) and Equation (14), we can write

that results (assuming the nonsingularity of the

All the information of the traditional LS primal problem is contained in the LS dual model, and vice versa. Hence, the pair of dual problems―the primal [(3)-(4)] and the dual [(11)-(12)]―provides identical LS solutions for separating signal from noise.

At optimal solutions,

It follows that

which demonstrates a previous assertion, namely that the change in the primal objective function corresponding to a marginal change in each sample observation is equal to its associated Lagrange multiplier that is identically equal to the corresponding deviation. The two primal and dual objective functions can also be rewritten as:

Hence, the quantity

An interpretation of the dual pair of LS problems, without reference to any empirical context, can be formulated using the Pythagorean theorem. With the knowledge that a solution to the LS problem requires the fulfillment of the orthogonality conditions

and also

Therefore,

that can be restated as:

which corresponds to the two objective functions of the primal (3) and the dual (11): the left-hand-side of equation (16) is the primal objective function to be minimized and the right-hand-side of the same equation is the dual objective function to be maximized. By the Pythagoras theorem (expressed by Equation (15)), for any given vector of observations y, the minimization of the noise function

This paper has retraced the history of the least-squares method and has developed the dual specification of it which is a novel way of looking at the LS approach. It has shown that the traditional minimization of the sum of squared deviations that give the name to the algorithm is equivalent to the maximization of the net value of sample information.

Quirino Paris, (2015) The Dual of the Least-Squares Method. Open Journal of Statistics,05,658-664. doi: 10.4236/ojs.2015.57067