1. Introduction
It is common to produce many lines to fit bivariate data as the observations are being altered in some way. For example, in order to determine a particular data point’s influence on the best fit, the point may be moved by changing its y-coordinate and a new line created. Some diagnostic tests are based on this. A point, which is called the pivot point, is the intersection of certain lines that are often used for examining influence.
An example of a pivot point is presented in Section 2. In Section 3, we derive the coordinates of the pivot point. We show that a pivot point can be created in two ways. One way is augmenting an original set of bivariate observations with an additional point, which can have arbitrary multiplicity. Another way is altering an existing observation’s y-coordinate as described above. Section 4 presents the benefit of the pivot point in that it can be useful to shorten calculations when adding a new observation.
2. Illustrative Example
Consider the data in Table 1 [1]. The predictor variable (x) is the age in months at which a child says their first word, and the response variable (y) is the child’s Gesell Adaptive Score from an aptitude test. These data have been analyzed many times for influential and outlying observations [2] - [7]. Using various criteria, Cases 2, 18, and 19 have been identified as significant. For illustrative purposes, we focus on Case 18.
When examining an individual observation’s influence on a bivariate least-squares linear regression, it is common to generate a sequence of regression lines. These lines fit the same set of observations, except that the y-coordinate is made to vary while its x-coordinate is unchanged for the specified data point of interest. The influence of Case 18 on the least-squares regression line is examined by keeping its x-coordinate of 42 and giving its y-coordinate the values 57, 77, 97, 117, and 137. This produces the five regression lines in Figure 1. Clearly, Case 18 could have a large influence on the regression line. Some authors have illustrated and evaluated leverage in this way [8] [9] [10] [11]. All these regression lines pass though a common point, called the pivot point [12]. In Figure 1, the pivot point (12.3, 96.1) is shared by the five lines, and its location is indicated by the symbol D.
3. Derivation of the Pivot Point
We derive the formula for the coordinates of the pivot point. The pivot point can be created by augmenting an original set of bivariate observations with an additional point, which can have arbitrary multiplicity, which is another method to diagnose influence on the line [5] [13] [14] [15]. We show that formulation to be equivalent to varying the location of a single point, while keeping the same first coordinate, as is done in Figure 1.
Consider the bivariate data set
. For simplicity, assume that coordinates are selected so that
. Unindexed summations are over the elements of S0. Define
. Introduce m copies of the new point R(u,v). If R is a point in S0, these are additional copies. The aggregate of S0 and m > 0 copies of R is denoted Sm.
For m = 0, the least-squares regression line of S0 is
.
Table 1. Age at First Word (x) and Gesell Adaptive Score (y).
For any integer m ≥ 0, the least-squares regression line of Sm is
, (1)
and the point of means is
, (2)
which is on line (1) for Sm.
When m > 0 and u ≠ 0, the pivot point
(3)
is on the least-squares line for all setsSm. This can be seen by substituting point (3) into the equation of the line (1), that is,
.
Point P on (3) is called the pivot point ofR with respect toS0, because P is on all regression lines for Sm, which have different slopes. Because the y-coordinate v of R is absent from the coordinates of P, it is also called the pivot point ofu with respect to S0. The set of regression lines that is created by adding copies of R, is called a pencil of lines or fan of lines throughP.
When u = 0, the best-fit line (1) translates in the y-direction as m increases, and the pivot point is said to be at infinity. The pivot point is solely an artifact of the least-squares regression equations. Initially, it was found and explained in a linear-algebraic setting [12].
The regression lines in a fan, which is formed by vertically moving one point in the data set, intersect at the pivot point. In particular, the regression line formed by addingm copies of the pointR(u,v) toS0 is equivalent to the line formed by adding a single point (u,vm) with
which can be seen algebraically by setting m = 1 and
in line (1), which yields (1).
Pivot points occur when the data are not centered at the origin. All best-fit lines can be rigidly translated, so that the new center is
. The slope of each line can be found from
,
which shows the dependence solely on the differences of each coordinate from its mean. The observations in Figure 1 are centered at the data set’s mean point
.
4. Computational Shortcuts When Augmenting a Bivariate Set
The pivot point offers two shortcuts for computing equations for regression lines. This is analogous to adding the n + 1st value a to the data set
, whose mean is
. The new mean can be calculated using
, which requires considerably less computation than not using
[11].
One shortcut is, given setS0, the regression line forSm can be computed as the line containing the point of means (2) and the pivot point (3). Recall that in (4), V and b0 are based only on the unaugmented data set.
The second shortcut involves the line obtained when multiplicitym becomes very large, then the line (1) approaches the line
(4)
which contains the new pointR and the pivot pointP. The coefficients in (4) provide the tool for rapid computation for the line (1) for any m, including m = 1 for a single additional point. In (1), am is a weighted average of a0 and
, andbm is a weighted average ofb0 and
with the same weights, in particular,
and
(5)
where
(6)
Equations (5) are seen by substituting a0 and b0 from (1),
and
from (4), and w from (6) into the right-hand sides of (5), which yields am and bm in (1).
5. Conclusion
Pivot points are omnipresent in applications of bivariate linear regression. In particular, they are points through which new lines pass when a data point is altered. One important purpose of altering a point is to determine its influence. We have displayed this phenomenon with the well-known data set of ages at first word versus Gesell scores, which has been analyzed by many authors from many points of view. A pivot point is a handy and efficient tool for shortening calculations when new data arises.
Acknowledgements
We are grateful to many of our colleagues who have frequently and freely shared their knowledge about regression and computational statistics.