^{1}

^{*}

^{2}

^{3}

The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.

Issues associated with outliers are often looked at in textbooks, whilst in practical sense academics tend to have divergent views on its meaning and how it can rightfully be determined and managed, if possible [

Outliers are different from controlling observations as was established by [

In a normal SEM model, very little portion of outliers and potential controlling observations can have a huge impact on model fit and parameter estimates. For instance, [

According to [

Moreover, [

Seven different plots in residuals have been utilized for purposes of identifying outliers [

Therefore, the current method adopts a different approach which is the uniform horizontal QQ plot. Now, take a given linear regression equation:

y i = x ′ i β + ε i

For y i represents the outcome for observation i, x i represents the predictor vector of size p × 1 for observation i, β represents a vector of unestimated parameters of size p × 1 and ϵ i the random error term (0, σ^{2}). The predicted values, y i , could then be plotted by a QQ residual plot as defined by

y ^ i = x ′ i β ^

observed in the x-axis versus the residuals, e i , defined as

e = y i − y ^ i = y i − x ′ i β ^ i

observed in the y-axis. This could be extended in SEM by constructing the residuals v ^ i ( θ ^ ) and ζ ^ i ( θ ^ ) versus its predicted counterparts in v ^ i ( θ ^ ) and ζ ^ i ( θ ^ ) respectively [^{th} observation, then

v ^ i ( θ ^ ) = ( I − Λ ^ W ^ ) z i (1)

ζ ^ i ( θ ^ ) = M ^ W ^ z i (2)

Given that θ ^ = θ ^ M L E .

Again, we obtain the predicted observations z ^ i ( θ ^ ) and η ^ i ( θ ^ ) which are linked to

z ^ i ( θ ) = Λ L i + v i _{ }

η ^ i ( θ ) = B η i + Γ ξ i + ζ i

The factor scores then replaces the observations in L i to give L ^ i = W z i which then provides predicted observations with estimators [

z ^ i ( θ ) = Λ W z i (3)

η ^ ( θ ) = [ B Γ ] W z i (4)

For practical implementation purposes estimators for predicted observations utilized, the vector θ with their sample counterparts θ ^ , were

z ^ ( θ ^ ) = Λ ^ W ^ z

η ^ ( θ ^ ) = [ B ^ Γ ^ ] W ^ z

which could be predicted for the i^{th} value

Z ^ i ( θ ^ ) = Λ ^ W ^ z i (5)

η ^ i ( θ ^ ) = [ B ^ Γ ^ ] W ^ z i (6)

As QQ plots in a given linear equation, the predictors in (5) and (6) were plotted with their counterpart residuals v ^ ( θ ^ ) = ( I − Λ ^ W ^ ) z and ζ ^ ( θ ^ ) = M ^ W ^ z respectively.

Now, take a general set of sample quantiles to be sorted as

μ ( 1 ) < μ ( 2 ) < μ ( 3 ) < ⋯ < μ ( n − 1 ) < μ ( n ) ,

The subscripts in the parentheses show an ordered data. The first ordered observation will lie in the horizontally in the middle of ( 0 , 1 / n ) , the next in the

middle of ( 1 / n , 2 / n ) and the last to be in the middle of interval ( ( n − 1 n ) , 1 ) . Thus, we take as the theoretical quantile value

ξ q = q = 1 − 0.5 n (7)

For q corresponding i^{th} ordered sample value. The quantity 0.5 is subtracted such that the data is exactly in the middle of the interval ( ( i − 1 n ) , i / n ) .

Now the QQ plot can precisely be defined. First, we compute through simulation the n expected values of the data, which we pair with the n data points sorted in ascending order. For the uniform density, the QQ plot is composed of the n ordered pairs

( 1 − 0.5 n , u ( i ) ) , for i = 1 , 2 , ⋯ , n

Deviations from the horizontal pattern allow for the spotting of possible issues outliers and/or controlling observations.

Generally, it can be seen, from the QQ plots below, that at any percentile, the observations lie within a uniform horizontal scale of observations. Slight deviations from the horizontal scale show evidence of an outlier. However, observations that depart farther away from the uniform horizontal scale indicates evidence of potential controlling observation. The QQ plots proposed in this study differs from other methods and from one another based on the estimated residuals of the measurement errors using either the Anderson-Rubin or Bartlett’s or Regression based methods for the simulated data using the EM method in detecting outliers and potential controlling observations for the SEM model.

It can be noticed from ^{th}). Again, in the second quartile (50^{th}) there was evidence of outliers which are observations deemed to lie close, about 0.5 cm, to the horizontal plane whereas the observation which lies farther away from the horizontal plane was identified as controlling observation within the median. Also, there were evidence of both outliers and controlling observations in the third quartile (75^{th}). In the last quartile, observations can be seen lying almost on the horizontal plane and others lying within the 1.5 cm distance which were all deemed to be outliers with some few observation found to lie outside the reference distance or father away from the horizontal plane and a such were deemed to be controlling observations.

From ^{th}). Meanwhile, the second quartile (50^{th}) showed evidence of outliers which are observations deemed to lie close, about 1.5 cm, to the quantile horizontal plane whereas the observation which lies farther away from the quantile horizontal plane were represents the controlling observations within the median. Also, as can be seen

from ^{th}). In the last quartile, observations can be seen lying almost on the horizontal plane and others lying within the 0.5 cm distance which were all deemed to be outliers with some few observation found to lie outside the reference distance or father away from the horizontal plane and a such were deemed to be controlling observations.

It can be noticed from ^{th}). Also, the second quartile (50^{th}) showed evidence of two outliers which are observations deemed to lie close, about 0.5 cm, to the quantile horizontal plane whereas the two observations which lie farther away from the quantile horizontal plane were identified as controlling observations within the median. Also, there was evidence of about four outliers and two controlling observations in the third quartile (75^{th}).

The fitting indices, as indicated in

The paper applied a group of residual estimators to spot outliers and possible

controlling observations via uniform horizontal QQ plot. The study implemented the QQ plots in SEM using JMP software. The implantation experience supports [

Our results showed that the presence of outliers and possible controlling observation which affirms the assertion made by [

Also, the present study found Aderson-Rubin technique the most efficient method of identifying outliers and possible controlling observations under SEM which corroborates the previous studies that utilized general techniques such as Mahalanobis and Cook’s distances [

Again, the present study provides a different perspective to spotting outliers and possible controlling observations through a uniform horizontal QQ plots approach as was opined in earlier methodological works which provided accessible tools to identify outliers and possible controlling observations in SEM [

It is worth noting that despite the significant contributions of the study, there were some limitations that call further studies. To begin with, it should be emphasized that, in the current study, we only focused on situations where a small proportion of data is partitioned in each quantile, based on the moderate sample sized used. Also, the data used in the QQ plot was found to be normal and for that matter further studies could ascertain new way(s) of detecting outliers and controlling observations for a non-normal data with the same or similar concept of residual estimators. Corrections for non-normality such as the Satorra-Bentler procedure which relies on sandwich estimator and higher-order moments of the sample data could be adopted as data used under the SEM concept often had skewness and kurtosis deviated from those of a normal distribution [

It can be deduced from the results on the various simulations of QQ plots that all these methods demonstrate the ability to detect outliers and potential controlling observations in an SEM framework. It is worth noting that the Anderson-Rubin method of QQ plot provided a more efficient and visual display of detecting outliers and potential controlling observations as compared to the other classes of residual estimators. This, therefore, provides an efficient way of expanding the cook’s method of detecting outliers and controlling observations with the QQ plot under the SEM framework.

The authors declare no conflicts of interest regarding the publication of this paper.

Abdul-Aziz, A.R., Luguterah, A. and Saeed, B.I.I (2020) Using Residual Estimators to Detect Outliers and Potential Controlling Observations in Structural Equation Modelling: QQ Plot Approach. Open Journal of Statistics, 10, 905-914. https://doi.org/10.4236/ojs.2020.105053