On the Covariance of Regression Coefficients

In many applications, such as in multivariate meta-analysis or in the construction of multivariate models from summary statistics, the covariance of regression coefficients needs to be calculated without having access to individual patients’ data. In this work, we derive an alternative analytic expression for the covariance matrix of the regression coefficients in a multiple linear regression model. In contrast to the well-known expressions which make use of the cross-product matrix and hence require access to individual data, we express the covariance matrix of the regression coefficients directly in terms of covariance matrix of the explanatory variables. In particular, we show that the covariance matrix of the regression coefficients can be calculated using the matrix of the partial correlation coefficients of the explanatory variables, which in turn can be calculated easily from the correlation matrix of the explanatory variables. This is very important since the covariance matrix of the explanatory variables can be easily obtained or imputed using data from the literature, without requiring access to individual data. Two important applications of the method are discussed, namely the multivariate meta-analysis of regression coefficients and the so-called synthesis analysis, and the aim of which is to combine in a single predictive model, information from different variables. The estimator proposed in this work can increase the usefulness of these methods providing better results, as seen by application in a publicly available dataset. Source code is provided in the Appendix and in http://www.compgen.org/tools/regression.

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

Bagos, P. and Adam, M. (2015) On the Covariance of Regression Coefficients. Open Journal of Statistics, 5, 680-701. doi: 10.4236/ojs.2015.57069.

 [1] Platt, R.W. (1998) ANOVA, t Tests, and Linear Regression. Injury Prevention, 4, 52-53.http://dx.doi.org/10.1136/ip.4.1.52 [2] Vickers, A.J. (2005) Analysis of Variance Is Easily Misapplied in the Analysis of Randomized Trials: A Critique and Discussion of Alternative Statistical Approaches. Psychosomatic Medicine, 67, 652-655.http://dx.doi.org/10.1097/01.psy.0000172624.52957.a8 [3] Becker, B.J. and Wu, M.J. (2007) The Synthesis of Regression Slopes in Meta-Analysis. Statistical Science, 22, 414-429. http://dx.doi.org/10.1214/07-STS243 [4] Mavridis, D. and Salanti, G. (2013) A Practical Introduction to Multivariate Meta-Analysis. Statistical Methods in Medical Research, 22, 133-158.http://dx.doi.org/10.1177/0962280211432219 [5] van Houwelingen, H.C., Arends, L.R. and Stijnen, T. (2002) Advanced Methods in Meta-Analysis: Multivariate Approach and Meta-Regression. Statistics in Medicine, 21, 589-624. http://dx.doi.org/10.1002/sim.1040 [6] Manning, A.K., LaValley, M., Liu, C.T., Rice, K., An, P., Liu, Y., Miljkovic, I., Rasmussen-Torvik, L., Harris, T.B., Province, M.A., Borecki, I.B., Florez, J.C., Meigs, J.B., Cupples, L.A. and Dupuis, J. (2011) Meta-Analysis of Gene-Environment Interaction: Joint Estimation of SNP and SNP x Environment Regression Coefficients. Genetic Epidemiology, 35, 11-18. http://dx.doi.org/10.1002/gepi.20546 [7] Paul, P.A., Lipps, P.E. and Madden, L.V. (2006) Meta-Analysis of Regression Coefficients for the Relationship between Fusarium Head Blight and Deoxynivalenol Content of Wheat. Phytopathology, 96, 951-961.http://dx.doi.org/10.1094/PHYTO-96-0951 [8] Rose, A.K. and Stanley, T.D. (2005) A Meta-Analysis of the Effect of Common Currencies on International Trade. Journal of Economic Surveys, 19, 347-365. http://dx.doi.org/10.1111/j.0950-0804.2005.00251.x [9] Peterson, R.A. and Brown, S.P. (2005) On the Use of Beta Coefficients in Meta-Analysis. Journal of Applied Psychology, 90, 175-181. http://dx.doi.org/10.1037/0021-9010.90.1.175 [10] Crouch, G.I. (1995) A Meta-Analysis of Tourism Demand. Annals of Tourism Research, 22, 103-118.http://dx.doi.org/10.1016/0160-7383(94)00054-V [11] Aloe, A.M. and Becker, B.J. (2011) Advances in Combining Regression Results in Meta-Analysis. In: Williams, M. and Vogt, W.P., Eds., The SAGE Handbook of Innovation in Social Research Methods, SAGE, London, 331-352.http://dx.doi.org/10.4135/9781446268261.n20 [12] Samsa, G., Hu, G. and Root, M. (2005) Combining Information from Multiple Data Sources to Create Multivariable Risk Models: Illustration and Preliminary Assessment of a New Method. Journal of Biomedicine and Biotechnology, 2005, 113-123. http://dx.doi.org/10.1155/JBB.2005.113 [13] Zhou, X.H., Hu, N., Hu, G. and Root, M. (2009) Synthesis Analysis of Regression Models with a Continuous Outcome. Statistics in Medicine, 28, 1620-1635. http://dx.doi.org/10.1002/sim.3563 [14] Debray, T.P., Koffijberg, H., Lu, D., Vergouwe, Y., Steyerberg, E.W. and Moons, K.G. (2012) Incorporating Published Univariable Associations in Diagnostic and Prognostic Modeling. BMC Medical Research Methodology, 12, 121.http://dx.doi.org/10.1186/1471-2288-12-121 [15] Noble, D., Mathur, R., Dent, T., Meads, C. and Greenhalgh, T. (2011) Risk Models and Scores for Type 2 Diabetes: Systematic Review. BMJ, 343, d7163.http://dx.doi.org/10.1136/bmj.d7163 [16] Moons, K.G., Kengne, A.P., Grobbee, D.E., Royston, P., Vergouwe, Y., Altman, D.G. and Woodward, M. (2012) Risk Prediction Models: II. External Validation, Model Updating, and Impact Assessment. Heart, 98, 691-698.http://dx.doi.org/10.1136/heartjnl-2011-301247 [17] van Dieren, S., Beulens, J.W., Kengne, A.P., Peelen, L.M., Rutten, G.E., Woodward, M., van der Schouw, Y.T. and Moons, K.G. (2012) Prediction Models for the Risk of Cardiovascular Disease in Patients with Type 2 Diabetes: A Systematic Review. Heart, 98, 360-369. http://dx.doi.org/10.1136/heartjnl-2011-300734 [18] Jackson, D., Riley, R. and White, I.R. (2011) Multivariate Meta-Analysis: Potential and Promise. Statistics in Medicine, 30, 2481-2498. http://dx.doi.org/10.1002/sim.4172 [19] Riley, R.D., Abrams, K.R., Lambert, P.C., Sutton, A.J. and Thompson, J.R. (2007) An Evaluation of Bivariate Random-Effects Meta-Analysis for the Joint Synthesis of Two Correlated Outcomes. Statistics in Medicine, 26, 78-97.http://dx.doi.org/10.1002/sim.2524 [20] Riley, R.D., Abrams, K.R., Sutton, A.J., Lambert, P.C. and Thompson, J.R. (2007) Bivariate Random-Effects Meta-Analysis and the Estimation of Between-Study Correlation. BMC Medical Research Methodology, 7, 3.http://dx.doi.org/10.1186/1471-2288-7-3 [21] Riley, R.D., Thompson, J.R. and Abrams, K.R. (2008) An Alternative Model for Bivariate Random-Effects Meta-Analysis When the Within-Study Correlations Are Unknown. Biostatistics, 9, 172-186.http://dx.doi.org/10.1093/biostatistics/kxm023 [22] Bagos, P.G. (2012) On the Covariance of Two Correlated Log-Odds Ratios. Statistics in Medicine, 31, 1418-1431.http://dx.doi.org/10.1002/sim.4474 [23] Wei, Y. and Higgins, J.P. (2013) Estimating Within-Study Covariances in Multivariate Meta-Analysis with Multiple Outcomes. Statistics in Medicine, 32, 1191-1205. [24] Green, W. (2008) Econometric Analysis. 6th Edition, Prentice Hall, Englewood Cliffs. [25] Hsieh, F.Y., Bloch, D.A. and Larsen, M.D. (1998) A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Statistics in Medicine, 17, 1623-1634.http://dx.doi.org/10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S [26] O’Brien, R. (2007) A Caution regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 673-690. http://dx.doi.org/10.1007/s11135-006-9018-6 [27] Johnson, R.A. and Wichern, D.W. (2007) Applied Multivariate Statistical Analysis. 6th Edition, Pearson Prentice Hall, Upper Saddle River. [28] Dwyer, P.S. (1940) The Evaluation of Multiple and Partial Correlation Coefficients from the Factorial Matrix. Psychometrika, 5, 211-232. http://dx.doi.org/10.1007/BF02288567 [29] Stapleton, J.H. (1995) Linear Statistical Models. John Wiley & Sons, Inc., Hoboken.http://dx.doi.org/10.1002/9780470316924 [30] Rencher, A.C. (1995) Methods of Multivariate Analysis. John Wiley & Sons, Inc., New York. [31] Weisberg, S. (2005) Applied Linear Regression. 3rd Edition, Wiley/Interscience, Hoboken.http://dx.doi.org/10.1002/0471704091 [32] Timm, N.H. (2002) Applied Multivariate Analysis. Springer-Verlag Inc., New York. [33] Seber, G.A.F. and Lee, A.J. (2003) Linear Regression Analysis. John Wiley & Sons, Inc., Hoboken.http://dx.doi.org/10.1002/9780471722199 [34] Bache, K. and Lichman, M. (2015) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml [35] Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C. and Johannes, R.S. (1988) Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus. Proceedings of the Annual Symposium on Computer Application in Medical Care, Orlando, 7-11 November, 261-265. [36] White, I.R. (2009) Multivariate Random-Effects Meta-Analysis. Stata Journal, 9, 40-56. [37] Higgins, J.P., Whitehead, A., Turner, R.M., Omar, R.Z. and Thompson, S.G. (2001) Meta-Analysis of Continuous Outcome Data from Individual Patients. Statistics in Medicine, 20, 2219-2241.http://dx.doi.org/10.1002/sim.918 [38] Schwertman, N.C. and Allen, D.M. (1979) Smoothing an Indefinite Variance-Covariance Matrix. Journal of Statistical Computation and Simulation, 9, 183-194.http://dx.doi.org/10.1080/00949657908810316 [39] Rebonato, R. and Jäckel, P. (1999) The Most General Methodology to Create a Valid Correlation Matrix for Risk Management and Option Pricing Purposes. Journal of Risk, 2, 17-28. [40] Higham, N.J. (2002) Computing the Nearest Correlation Matrix—A Problem from Finance. IMA Journal of Numerical Analysis, 22, 329-343. http://dx.doi.org/10.1093/imanum/22.3.329 [41] Field, A.P. (2001) Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods. Psychological Methods, 6, 161-180. http://dx.doi.org/10.1037/1082-989X.6.2.161 [42] Hafdahl, A.R. (2007) Combining Correlation Matrices: Simulation Analysis of Improved Fixed-Effects Methods. Journal of Educational and Behavioral Statistics, 32, 180-205. http://dx.doi.org/10.3102/1076998606298041 [43] Hafdahl, A.R. and Williams, M.A. (2009) Meta-Analysis of Correlations Revisited: Attempted Replication and Extension of Field’s (2001) Simulation Studies. Psychological Methods, 14, 24-42. http://dx.doi.org/10.1037/a0014697 [44] Prevost, A.T., Mason, D., Griffin, S., Kinmonth, A.L., Sutton, S. and Spiegelhalter, D. (2007) Allowing for Correlations between Correlations in Random-Effects Meta-Analysis of Correlation Matrices. Psychological Methods, 12, 434-450.http://dx.doi.org/10.1037/1082-989X.12.4.434 [45] Debray, T.P., Koffijberg, H., Nieboer, D., Vergouwe, Y., Steyerberg, E.W. and Moons, K.G. (2014) Meta-Analysis and Aggregation of Multiple Published Prediction Models. Statistics in Medicine, 33, 2341-2362.http://dx.doi.org/10.1002/sim.6080 [46] Wu, M.J. and Becker, B.J. (2013) Synthesizing Regression Results: A Factored Likelihood Method. Research Synthesis Methods, 4, 127-143. http://dx.doi.org/10.1002/jrsm.1063 [47] Dominici, F., Parmigiani, G., Reckhow, K.H. and Wolper, R.L. (1997) Combining Information from Related Regressions. Journal of Agricultural, Biological, and Environmental Statistics, 2, 313-332.http://dx.doi.org/10.2307/1400448 [48] Steyerberg, E.W., Eijkemans, M.J., Van Houwelingen, J.C., Lee, K.L. and Habbema, J.D. (2000) Prognostic Models Based on Literature and Individual Patient Data in Logistic Regression Analysis. Statistics in Medicine, 19, 141-160.http://dx.doi.org/10.1002/(SICI)1097-0258(20000130)19:2<141::AID-SIM334>3.0.CO;2-O [49] Clogg, C.C., Petkova, E. and Haritou, A. (1995) Statistical Methods for Comparing Regression Coefficients between Models. American Journal of Sociology, 10, 1261-1293. http://dx.doi.org/10.1086/230638 [50] Tofighi, D., Mackinnon, D.P. and Yoon, M. (2009) Covariances between Regression Coefficient Estimates in a Single Mediator Model. British Journal of Mathematical and Statistical Psychology, 62, 457-484. [51] Greenland, S., Schlesselman, J.J. and Criqui, M.H. (1986) The Fallacy of Employing Standardized Regression Coefficients and Correlations as Measures of Effect. American Journal of Epidemiology, 123, 203-208. [52] Greenland, S., Maclure, M., Schlesselman, J.J., Poole, C. and Morgenstern, H. (1991) Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives. Epidemiology, 2, 387-392.http://dx.doi.org/10.1097/00001648-199109000-00015 [53] Cheung, M.W. (2009) Comparison of Methods for Constructing Confidence Intervals of Standardized Indirect Effects. Behavior Research Methods, 41, 425-438. http://dx.doi.org/10.3758/BRM.41.2.425 [54] Begum, F., Ghosh, D., Tseng, G.C. and Feingold, E. (2012) Comprehensive Literature Review and Statistical Considerations for GWAS Meta-Analysis. Nucleic Acids Research, 40, 3777-3784. http://dx.doi.org/10.1093/nar/gkr1255 [55] Evangelou, E. and Ioannidis, J.P. (2013) Meta-Analysis Methods for Genome-Wide Association Studies and Beyond. Nature Reviews Genetics, 14, 379-389.http://dx.doi.org/10.1038/nrg3472 [56] Cantor, R.M., Lange, K. and Sinsheimer, J.S. (2010) Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. American Journal of Human Genetics, 86, 6-22.http://dx.doi.org/10.1016/j.ajhg.2009.11.017 [57] Sheng, E., Zhou, X.H., Chen, H., Hu, G. and Duncan, A. (2014) A New Synthesis Analysis Method for Building Logistic Regression Prediction Models. Statistics in Medicine, 33, 2567-2576. http://dx.doi.org/10.1002/sim.6125 [58] Chang, B.-H., Liopsitz, S. and Waternaux, C. (2000) Logistic Regression in Meta-Analysis Using Aggregate Data. Journal of Applied Statistics, 27, 411-424. http://dx.doi.org/10.1080/02664760050003605 [59] Cornfield, J. (1962) Joint Dependence of Risk of Coronary Heart Disease on Serum Cholesterol and Systolic Blood Pressure: A Discriminant Function Analysis. Federation Proceedings, 21, 58-61. [60] Halperin, M., Blackwelder, W.C. and Verter, J.I. (1971) Estimation of the Multivariate Logistic Risk Function: A Comparison of the Discriminant Function and Maximum Likelihood Approaches. Journal of Chronic Diseases, 24, 125-158.http://dx.doi.org/10.1016/0021-9681(71)90106-8 [61] Hosmer, T., Hosmer, D. and Fisher, L. (1983) A Comparison of the Maximum Likelihood and Discriminant Function Estimators of the Coefficients of the Logistic Regression Model for Mixed Continuous and Discrete Variables. Communications in Statistics—Simulation and Computation, 12, 23-43. http://dx.doi.org/10.1080/03610918308812298 [62] Press, S.J. and Wilson, S. (1978) Choosing between Logistic Regression and Discriminant Analysis. Journal of the American Statistical Association, 73, 699-705. http://dx.doi.org/10.1080/01621459.1978.10480080 [63] Xing, G. and Xing, C. (2010) Adjusting for Covariates in Logistic Regression Models. Genetic Epidemiology, 34, 769-771; Author Reply 772. http://dx.doi.org/10.1002/gepi.20526 [64] Robinson, L.D. and Jewell, N.P. (1991) Some Surprising Results about Covariate Adjustment in Logistic Regression Models. International Statistical Review, 59, 227-240. http://dx.doi.org/10.2307/1403444