The Body Mass Index ( BMI ) and TV Viewing in a Co-Integration Framework

Many techniques are met in the literature, trying to investigate the effect of TV watching hours on BMI. However, we haven’t traced any empirical study with co-integration analysis, as it is applied here. With this in mind, we present in this paper the proper methodology, based on the co-integration analysis for a detailed justification of the effect of TV viewing hours together with some minor changes in life style of participants on BMI. Apart from finding and testing an acceptable co-integration relation, we further formulated an error correction model to determine the coefficient of adjustment. All findings, which are fully justified, are presented in details in the relevant sections. It should be pointed out, that we haven’t met this type of analysis in the relevant literature.


Introduction
It is worthy to mention that BMI has received particular attention in many research works (see for instance Ahn et al., 2011;Bridger et al., 2011;Browning et al., 2011;Eek et al., 2009;Farkas et al., 2005;Harbin et al., 2006;Komlos et al., 2006;Lee et al., 2008;Manios et al., 2004;Stea et al., 2008;Stenhammmar et al., 2010, among others).In most cases the effect of various factors on BMI was the main object.Besides, in many papers (see for instance Danner et al., 2008;Davison et al., 2006;Henderson, 2007;Lazarou et al., 2009, among others) an effort is made to investigate the effect of TV viewing hours on the BMI.Almost in all cases, cross-sectional data are used and the standard statistical methods are applied.Apart from simply presenting the results obtained from commercial computer packages (usually SPSS), no attempt is made to analyze the mathematics involved even for a plain justification of the results obtained.It seems that the computer output plays the role of the "follow me" car in the airports, driving the authors through a specific direction, without any explanation that this direction is the right one and why.Instead, a lot of redundant descriptive details are sited; no any specific and well-formulated model is presented, although the term "model" is extensively used.Regardless of the robustness of these methods, we haven't seen in the relevant literature an empirical study with co-integration analysis, as it is applied here.With this in mind, we made an attempt to jump to the other side of the fence, considering time series data and applying co-integration analysis in order to analytically investigate the effect of TV watching hours on BMI.This analysis requires that the series used must be integrated, of the same order, so that we can finally obtain a stationary linear combination (that can be a static regression) which constitutes the necessary and sufficient condition to conclude that the series used are co-integrated and we don't face the case of spurious regression.Further, we formulated an error correction model to determine the coefficient of adjustment.It is worthy to mention, that we haven't met this type of analysis in the relevant literature.

Data Used
Initially, 50 volunteers from the greater region of Thessaloniki, Greece, agreed to report regularly (twice per month) via an e-mail, their weight and height, together with some major remarks, if necessary.Finally we obtained a continuous stream of such data from only 32 individuals (40% males and 60% females), for almost 3 years, which means 72 reports (2 per month), by each participant 1 .At the very beginning, the age of participants vary between 17 and 21 years.As mentioned above, we received individual reports every 15th day (i.e. two reports per month).Each time we calculated the BMI of each participant and finally 2 we computed the mean BMI, regarding the 32 cases.The same calculations hold for the mean hours of TV viewing during the period of 15 days.These 72 mean values refer to series {BMI i }, {TVH i } (i = 1, 2, 3, •••, 72), which are presented in appendix.It should be noted however, that the majority of the participants stated that from the 62 nd observation (i.e. from middle of the 31 st month) and onwards, they increase there seating hours, for various reasons (decreasing physical exercise, working from home, etc.).To capture this structural change in participants lifestyle, we introduced a dummy (binary) variable ( i ), which takes the value 1 for the last 11 observations (from 62 up to 72), and the value of zero elsewhere.

d
It is noted that we denote by Y the natural logs of BMI [i.e.Y i = ln(BMI i )] and by X the natural logs of TV hours [i.e.X i = ln(TVH i ), i = 1, 2, 3, • It should be mention at this point, that we found an analogous formulation with group means in Johnston (1984: p. 406).Further, according to Green (2008: p. 189), the loss of information that may occur through using group means, as we have done here, might be relatively small.Also the estimators to be obtained in such cases would be consistent if the number of periods (T) is big enough, which complies with our data.

The Long-Run Relationship. The Concept of Co-Integration
Considering the series {Y i }, {X i }, and applying the Dickey-Fuller (DF/ADF) tests (Dickey & Fuller, 1979, 1981;Harris, 1995: pp. 28-47), we found that both series are not stationary, but they are integrated of order 1, that is I(1).This implies that differencing the initial series we get stationary ones.In other words the series are stationary, that is I(0).Additionally, a fairly new and much easier test proposed by Lazaridis (2008) has been also applied, in order to detect stationarity which can be easily verified from Figure 1.
It should be recalled that if two series are not stationary but they have common trends so that there exist a linear combination of these series that is stationary, which means that they are integrated in a similar way and for this reason they are called co-integrated, in the sense that one or the other of these variables will tend to adjust so as to restore a long run equilibrium.And this is the case of the two series {Y i }, {X i }, considered here, as we'll see in what follows.
The next step of this procedure is to specify and estimate a long-run linear relationship for the I(1) series, which in this particular case has the form which is a static linear model.
The OLS (Ordinary Least Squares) estimates from fitting (1) to the data presented in appendix are as follows (standard errors in brackets).
It is recalled that Hansen (1992) statistics, reveal that the model coefficients, individually considered, are stable for all conventional levels of significance (α = 0.01, 0.05, 0.10).The low value of the condition number (CN) indicates that no multicollinearity problems exist (Lazaridis, 2007).Finally from the p-values we conclude that all coefficients are significant.This implies that hours of TV viewing are (positively) associated with BMI.This is in line with the findings of Danner (2008Danner ( : p. 1101)).Obviously, the same applies for the change of lifestyle of participants from 62 nd observation and onwards.Also, the joined effect of both variables (X and d) is significant, according to the large value (2853.8) of F statistic.It may be worthy to mention at this point, that Danner (2008) using pooled data estimates a model where the only explanatory variable is a categorical one [time and (time) 2 , p. 1103], taking arbitrary values, so that one may argue that the correct specification of the model is questionable.Besides no model testing results are presented and no explanation is provided about the subscript i (i.e.where this subscript refers to).Next (same page), the author presents the estimated results of two models (fixed effects and random effects) without even a vague explanation of what a random effects model is, what estimation process is applied in this case and how the random effects are computed, if they are needed.Mainly, there is a lack of any proper testing (see Hausman, 1978), in order to justify as to whether the fixed effects or the random effects model is more suitable for the data used.In other cases (see for instant Lazarou et al., 2009), one reads the term "regression models" (page 71), but no such a model is explicitly presented in the typical form, as the one seeing in (1a).Henderson (2007), speaks about "modeling latent growth curves" (p.547), without any sufficient and analytical explanation of what it is and whether any conventional model tests have been applied.In other cases, misleading statistical measures are presented as in Browning et al. (2011Browning et al. ( : p. 1383)), where the plain arithmetic mean (30.6 Kg), of a sample where weight varies from 2.5 to 117.8 Kg is reported.In similar cases with such a large range, a sort of weighted mean, where the weights will be related somehow to the age, or at least the median (even if classification of the data is necessary), may be more representative measures of central tendency.
Finally, from the small value (0.012) of the standard error of estimate (s) seeing in (1a), we may conclude that uncertainty is rather limited when forecasting the values of the dependent variable.Note that in such a case, a measure of uncertainty is the width of the relevant confidence interval.
According to Granger and Newbold (1974), high t-values, high 2 R (i.e. the adjusted coefficient of determination 2 R ), together with low values of the DW d statistic, when a static regression of two independent random walks is estimated [as in (1a)], then we are facing the case of spurious regression, with undesired properties.However, a static regression between non-stationary variables will not be a spurious regression, if the variables have common trends, for instance if they are integrated of first order and the OLS residuals are stationary.Then the series [here {Y i } and {X i }] are integrated in a similar way and for this reason they are called co-integrated.This implies that although the series under consideration are non-stationary, there exists a stationary linear combination of these variables.
To show that {Y i } and {X i } are co-integrated, we compute the OLS residuals { } from: ˆi ˆ0.222699 0.04106 2.375191 Then we run the following regression without constant term, since for the OLS residuals 0 The value of q is set such that, after excluding the terms with insignificant coefficients, the noises ε i to be white, as it will be explained in details later on.With q = 0, the estimation results are as follows: With q = 1 and 2, we get the following estimation results.
(It is recalled that when the value of DW d statistic is close to 2, then we may conclude that no problem of first order autocorrelation exists).
Although in all cases there is no any problem of autocorrelation and heteroscedasticity, we considered model (3a), since in the other two models-and for higher values of q the additional explanatory variables are insignificant, as it is verified from the corresponding p-values.Hence the value of t-statistic (−6.35), which refers to the estimated coefficient of , will be taken into consideration next. 1 ˆi u  Since the series   ˆi u is the result of specific calculations and in the simplest case are the OLS residuals, it is not advisable (Harris, 1995: pp. 54-55) to apply the Dickey-Fuller (DF/ ADF) test, as we do with any variable in the initial data set.We have to compute the t u statistic from McKinnon (1991: pp. 267-276) critical values (see also Harris, 1995: Table A6, p. 158).This statistic (t u = Φ ∞ + Φ 1 /Τ + Φ 2 /Τ 2 , where T denotes the sample size) for the static relationship seen in ( 1a) is: α 0.01 α 0.05 α 0.10 3.52 2.9 2.59 is stationary for α = (0.01, 0.05, 0.10).Recall that the null [ i ~ I(1)] is rejected in favor of H 1 [ ~ I(0)], if t < t u .Hence (1) concerns a co-integrating relationship and can be considered as a longrun equilibrium relationship.The estimation results seeing in (1a) are quite satisfactory.
ˆi u A comparatively simple way (Lazaridis, 2008) to test that for the noises e i in (3), with q = 0, we don't have any problem of autocorrelation of higher order, is to compute the residuals from (3a) and to consider the corresponding Ljung-Box Q statistics and particularly their p-values, which should be greater than 0.1 to say that no autocorrelation is present.For this particular case, the corresponding Q statistics (column 4) together with p-values are presented in Table 1.From this table, we see that for all k (column 1) the corresponding p-values (column 5) are greater than 0.1, so we can conclude that there is no need to increase the value of q, to get Equations ( 3b) and (3c) in order to face autocorrelation problems.As far as heteroscedasticity is concerned, a practical way to trace it, is to find the explanatory variable which yields the smallest p-value for the corresponding Spearman's correlation coefficient ( ˆi e s r ), or the larger Z * statistic (absolute value).Since in (3a) there is only one explanatory variable, we found: r s = 0.075, p = 0.48 and Z * = 0.63.From the p-value we conclude that we have to accept the null of homoscedastic disturbances.When larger samples are considered, then Z * statistic is used, which is computed from: From the table of standard normal distribution we find: Hence we accept the null for all conventional levels of significance.
It may be useful to mention that we reach to the same results when model (3b) and (3c) are considered.We pointed out however, why these models have been dropped.
Having in mind some confusing remarks on this point, we underline once more that (1), which is a static regression, is a co-integrating relation, iff4 the residuals , computed from (1a), ( 2a From the estimated co-integration regression (1a), we can tell that the dummy d i , used as a proxy to capture the change in lifestyle of the participants, had a significant effect on BMI.We see also that the elasticity of the TV viewing hours is 0.222699.It is worthy to mention here, that taking into account that the quotient of sample standard deviations of Y i and X i is 0.107777/0.433222= 0.248, the estimated long-run response of 0.222699 in (1a) is reasonable close, which is another verification that this co-integration analysis has resulted to an acceptable model for investigating the relations between BMI and TV viewing hours.Thus we may conclude that an increase by 10% of TV viewing hours, may result to an increase of about 2.2% of BMI.

The Short Run Relationship
The lagged values of the residuals computed from the static long-run equation (i.e. 1 ), serve as an error correction mechanism in a short-run dynamic relationship, where the additional explanatory variables may appear in first differences and lagged first differences.All variables in this equation, known also as error correction model (ECM), are stationary so that, from the statistical point of view it is a standard single equation model, where all the classical tests are applicable.It should be noted, that the lag structure should be so selected in order to eliminate autocorrelation and to obtain at the same time a significant adjustment coefficient, which refers to 1 .With this in mind the short-run ECM corresponding to (1a) may have the form: According to Hansen statistics, all coefficients-individually considered seem to be stable for α = (0.01, 0.05).The condition number reveals that there is no any multicollinearity problem.That no autocorrelation of higher order exists can be detected from the table which is analogous to Table 1 and refers to the residuals of Equation ( 4), where all the p-values corresponding to the Ljung-Box Q statistics are greater than 0.1.In models like the one seeing in (4), the term 1 usually produces the smallest p-value, regarding the t-statistic of the corresponding Spearman's correlation coefficient ˆi u    s r , or the larger Z * statistic (absolute value).In this particular case we have: r s = 0.16, t = 1.336, p = 0.156 and Z * = 1.329, which means that no heteroscedasticity problems exist.Finally the RESET test (Ramsey, 1969), shows that there is no any specification error.
Before interpreting the estimation results, a further comment should be made about the sign of the coefficient of adjustment (−.2858).It is recalled that the residuals have been computed from (2), that is .However, if we compute these residuals from Then Equation (4) will take the form: In other words, the sign of the coefficient of adjustment depends upon the relation (2 or 5) used to compute the residuals entered in the ECM with one period lag.In any case, this coefficient is significant at α = 0.05.From Equations ( 4) and ( 6) we see that the adjustment coefficient (0.2858) gives a satisfactory percentage, regarding the BMI convergence towards a long run equilibrium.Besides, the significance of the coefficient of disequilibrium error (i.e. the coefficient of adjustment), in-dicates that in the long run there is a causality effect from TV viewing hours to BMI.Also, the coefficient of ΔX i is significant, so that we may conclude that changes in TV viewing hours influence BMI.In the short-run, a change of the hours viewing TV by 1%, results to an increase of BMI by about 124% from one period to the next.
Another point which deserves further attention refers to the condition number reported here, although we don't see it in other studies (see for instance Nguyen, 1987;Ouyang et al., 2000, among others).In many applications, particularly when the variables are in logs, then in the corresponding statistical models the value of CN is extremely high revealing in most cases a severe multicollinearity problem.And it seems that this is the main reason, for not reporting this statistic in relevant applications.However, Lazaridis (2008) has shown that in such cases, usually we have spurious multicollinearity.

Conclusion
We applied detailed co-integration analysis to investigate the effect of TV viewing hours on BMI.For this particular case, we found that the elasticity of TV viewing hours will not exceed 0.223, which implies that if these hours increase by 10%, than the expected increase of BMI will be about 2.2%.It has been verified that TV viewing hours influence BMI together with the lifestyle change of the participants, captured by the dummy variable d i .It is also verified that changes of the TV viewing hours influence BMI and a 1% change of the hours watching TV, may results to a change of BMI by about 0.124% from one period to the next.These findings do not mean that TV viewing hours might be inefficient in the long run, since from co-integration analysis we found that the adjustment coefficient is significant and it will be expected to fluctuate around 0.28, giving thus a satisfactory percentage, regarding the BMI convergence towards a long run equilibrium.Ceteris paribus, BMI is expected to undergo significant effects in the long run, from TV viewing hours.
It should be emphasized however, that this analysis is based upon self-reported data.Also the sample size (32 cases), to compute relevant means may be considered insufficient.This gives rise to further research by increasing the number of participants and mainly to have available authorized data.Additionally, if the number of participants may be considerably increased, separately estimation results could be obtained for mails and females, to better trace possible similarities and differences.Also, an effort should be made to capture the change in life-style of participants, by more official indices instead of a plain binary variable used here.Finally, if the number of periods (T) can be further increased, then we may reach to more detailed and robust results, regarding the effect of TV viewing hours and changes of life-style on BMI, separately for males and females.In fact, in this research work we didn't pay particular attention regarding data collection, since the main aim of the study was to analytically present a new approach to face similar tasks, which has not met in the relevant literature.
) are stationary.Apart from the proper test shown above, the stationarity of ˆi u   ˆi u may be detected from Figure 2 presented above.
••,72].Since the latter variables (i.e.Y 1 Last report received beginning of 2012.2It is recalled that the formula used to compute BMI is: