Applying Multivariate Multilevel Models to Explore Arable Land Quality in Sub-Saharan Africa: A Case Study in Kenya

Controversy exists on the magnitude and variability of farm nutrient balances and quality of arable land in sub-Saharan Africa with Kenya among those affected negatively. This study investigates quality of arable land by fitting multivariate multilevel model to farm nutrient balance data collected from five agro-climatic zones of Kenya (arable lands). Objectives of the study were to investigate the magnitude and variability of Nitrogen, Phosphorus and Potassium (NPK) farm nutrient balances in arable lands of Kenya, study effects of agro-climatic zones on nutrient balances and to determine effects of household resource endowments on NPK nutrient balances. The study concludes that agro-climatic zones differ with respect to farm nutrient balances; that livestock resource endowments and hired labour have positive effects on the magnitude and direction of farm nutrient balances; and that household ownership of large capital resources do not guarantee a positive effect on farm nutrient balances. The study recommends integration of sound livestock practices and application of agro-climatic zone differentiated interventions in future strategies for addressing farm nutrient balances and arable land quality, and the use of large sample sizes and relevant factors/covariates in future analysis to shed additional insights on farm nutrient balances and on how arable land quality can be improved.


Introduction
Seminal studies conducted at national and regional levels in sub-Saharan Africa, How to cite this paper: Onduru, D.D. and Onyango, F. (2017) Applying Multivariate Multilevel Models to Explore Arable Land Quality in Sub-Saharan Africa: A Case using nutrient balance approach, have indicated declining arable land quality with severe net nutrient losses of the order of 10 kg Nitrogen, 4 kg phosphates and 10 kg potash per hectare annually [1] with Kenya being one of the countries with net nutrient losses [1].Empirical roots of nutrient balance studies are widely acknowledged [2].However, opinions are divided on the extent and intensity of nutrient mining and variability; whether farmers' achievements contradict nutrient depletion scenarios [3]; whether levels of nutrient mining differ by agroecological zones and land use systems; whether underlying factors exist to explain direction and magnitude of nutrient balances [4]; and how nutrient balances can be scaled-up.
One of the reasons why consensual accounts on nutrient balances remain intractable and illusive and at times anecdotal [5] is the limited use of rigorous statistical techniques to i) handle dependency in data and to reduce biases associated with variance estimates and inflation of Type I error [6] [7], ii) to quantify between study variability [8] and iii) to handle multiple outcomes/effect sizes simultaneously [9].Nutrient balance studies are inherently associated with myriad challenges: inadequate systematic replication in space or in time [10], dependencies in multiple outcomes, multicollinearity in independent variables, non-homogeneity in data, and missing values, and inadequate application of statistical techniques that can deal with nested or clustered data associated with such studies that use complex survey designs [11] [10].This study explores application of multivariate multilevel models in a meta-analysis of nutrient balances and thereby contributes to addressing the above challenges and controversies.Application of statistical procedures for meta-analysis was previously a domain of the health Sector but has recently been adopted in other disciplines [12].Meta-analysis statistical techniques have a potential to address challenges and controversies by pooling and analyzing multiple studies together thereby improving statistical power and reducing the likelihood of type II error (failure to determine a difference that truly exist); increasing precision of estimates [13]; relating outcome heterogeneity to explanatory covariates and factors, and identifying large scale-patterns even when obscured by local factors, thereby minimizing the danger of over-extrapolation from single context-based studies [14].
Approaches to model estimation in meta-analysis vary widely.Descriptive analysis and paired t-test have been used in meta-analysis of nutrient balances drawn from 57 studies in Africa and concluded that there were positive soil nitrogen and potassium balances in some spots in Africa [5] while there was nutrient mining in others.Descriptive analysis has been used to identify drivers of tropical deforestation from 152 previous studies [15], Ordinary Least Squares have been used in analysis of returns to agricultural research and development from 289 studies [16], a binomial test has been used in meta-analysis of the differences in environmental impacts between organic and conventional farming based on 59 previous studies [17] and vote counting has been used in meta-analysis of agroforestry adoption based on 32 studies from 32 countries [18].
Classical meta-analysis that estimates model parameters in addition to withinand-between study variability (random effects model) [19] [20] has also been used in a meta-analysis of the effects of woody and herbaceous legumes on maize yield in sub-Saharan Africa based on 94 studies from West, East and Southern Africa and concluded that inorganic fertilisers gave better maize grain yield response than legume trees and green manures, natural fallows and unfertilized maize in that order; and that "global maize yield response to legumes was significantly positive and higher than unfertilized maize and natural vegetation fallows" [21].
Current methods of meta-analysis, however, have several limitations [10] [11]: Inadequacies in modeling multiple outcomes simultaneously, in addressing dependencies in multiple outcomes (use incorrect standard errors), in dealing with non-linear correlations and non-homogeneity in data and in handling nested or clustered data [11], yet these challenges characterise nutrient balance studies where response variables (outcomes) are often multivariate and have dependencies.Furthermore, methods such as vote counting and sign tests have been deplored due to their low power and the fact that they ignore sample size and effect magnitude [22] while descriptive statistics do not provide a framework to explore the effects of multiple covariates and factors on the dependent variables.
Possible approaches to modeling multiple outcomes of nutrient balances, taking into account the above challenges, include: multivariate fixed and random effects models, structural equation models, and multilevel models for modeling primary data among others [23].Although the fore-mentioned methods offer a potential in meta-analysis, they have not been applied to nutrient balance studies.Further, the application of multivariate fixed and random effects models are constrained by limited availability of within-study correlation and variances for estimating the variance-covariance matrix required in the model when summary statistics are used in meta-analysis and when there are no individual participant data to estimate required variance-covariance matrix [23].A "workaround" that has been proposed to estimate "missing" correlations in such situations include the use of estimates from similar published work, conducting sensitivity analyses for possible ranges of correlations and the use of Bayesian hierarchical models with vague priors in a Markov Chain Monte Carlo (MCMC) framework among others [23].
In this study we demonstrate that multivariate multilevel models can be used in meta-analysis of farm nutrient balance data arising from complex surveys that involve multi-stage sampling, stratification and unequal sampling probabilities [24].A previous meta-analysis of 54 studies using multilevel models, and through application of simulation studies for comparison, has shown that dependencies and heterogeneity at different model hierarchy can be effectively accounted for and that multiple outcomes can be modeled simultaneously using multivariate multilevel models [6].Multilevel models also have a potential to estimate variances and standard errors correctly for nested data, model relation- The data comprised 349 observations (individual smallholder farm-households).
About 42% and 25% of the smallholder farm-households in the dataset were from semi-humid to semi-arid (ACZ4), and semi-arid areas (ACZ5) of Kenya respectively.Farm households from humid (ACZ1), sub-humid (ACZ2) and semi-humid (ACZ3) areas accounted for 12%, 15% and 7% of total households in the dataset respectively.The arid (ACZ 6) and very arid (ACZ 7), with very low potential for plant production, were not represented in the dataset (Table 1).
The 349 observations have 3 dependent variables: N full balance (kg ha −1 ); P full balance (kg ha −1 ); and K full balance (kg ha −1 ) and 18 selected independent variables (factors/covariates).The latter were measured at two levels: 1) at level of individual farmers (household resource endowments); and 2) at agro-climatic zone level (Table 2).

General Analysis Methods
The study used the following general analysis methods: 1) Determined whether a two-level multi-level model with multiple outcome variables (multivariate multi-level model) is required for the NUTMON dataset.

Determining Necessity of a Two-Level Multi-Level Model (Multivariate Multilevel Model)
The The study used Iterative Generalised Least Squares (IGLS) estimation algorithms in R2MLwiN package, to return estimates for random coefficients and their standard errors, estimates for deviance statistics and for variances and covariances for single and two level models (Table 3).
The study calculated Study Design Effect 1 for the two level model as follows: ( ) where:  The design effects for each nutrient balance were greater than 2.0 (Table 4).
Previous analysis has shown that a design effect greater than 2.0 indicates the need for multilevel modeling [28].Thus, the preliminary analysis indicates that multilevel modeling is appropriate for this nutrient balance dataset and a two-level multilevel model (multivariate multilevel) is suitable for this purpose; and is therefore applied in subsequent analyses in line with the objectives of this study.

Estimating Magnitude and Variability of Nutrient Balances
To estimate an aggregate magnitude and variability of nutrient balances across agro-climatic zones of Kenya, the study used Equation ( 1

Determining Whether Agro-Climatic Zones Differ from Each Other with Respect to NPK Nutrient Balances
To determine whether agro-climatic zones differ from each other with respect to nutrient balances, Equation (1) describing a variance component model was used.Parameter estimates were obtained in a similar way as in Section 2.3.The parameter estimates are presented in Table 3.
Further, in assessing whether agro-climatic zones differ from each other, the study determined whether the variance (  The p-value associated with the Likelihood ratio (LR) test statistic was determined from Chi Square distribution (with 1 degree of freedom).

Effects of Household Resource Endowments on Nutrient Balances
The study fitted a two-level multilevel model (multivariate multilevel model) to NUTMON dataset to determine the effects of household resource endowments on NPK nutrient balances.The household resource endowments in Table 2 were added to the model as explanatory variables (in the fixed part of the model) and the intercepts at level-1 and level-2 model hierachy allowed to vary resulting in arandom intercept model.Slopes for the explanatory variables were not allowed to vary since they were fitted to the fixed part of the model only.
The multilevel equations for this model was specified as follows: where at level-1: ij y = Individual response variable for i th farmer (level-1) in j th agro-climatic zone; oj β = Random intercept for j th agro-climatic zone (mean of all individual farmers in j th agroclimatic zone); each agro-climatic zone is assumed to have a different intercept coeffcient, oj β kij x = A vector of k predictor variables for the i th farmer in j th agro-climatic zone kj β = A vector of k regression coeffcients associated with the predictor va- riables in j th agro-climatic zone: Residual effect (variation) for i th farmer in j th agro-climatic zone where at level-2: oo γ = Random intercept for all five agro-climatic zones (grand mean of all j groups); The ( oj β )s' are considered to vary randomly around a grand mean of all j groups ( oo γ ) at level-2 onj Z = A vector of n predictor variables measured at agro-climatic zone level (level-2 or j-level) on γ = A vector of n regression coefficients associated with the predictor va- riables at agro-climatic zone level (non-random coefficients): Residual effect (variation) for j th agro-climatic zone; ie the deviation of the intercept of j th agro-climatic zone from overall intercept of all agro-climatic zones (all js) ko γ = A vector of k (fixed) regression coefficients indicating that the coefficients of Level-1 predictors ( kj β ) do not vary across agro-climatic zone level (non-random slopes at level-2): hj β : all j values of h β are fixed (do not vary across agro-climatic zone) and are estimated as a single coefficient  3).The mean nitrogen nutrient balance of −11.9 kg ha −1 (with 95% confidence interval: −47.0, 23.2) tended to corroborate results of aggregate seminal studies that have reported negative (direction) nitrogen balances at national level [1].This further confirms that arable land quality in Kenya is being degraded through declining farm nitrogen, though the observed figure was not statistically significant (confidence interval includes zero).The mean aggregate phosphorus (9.8 kg ha −1 ; p < 0.01; 95% Confidence Interval of 0.9, 18.8) and potassium balances (5.5 kg ha −1 ; 95% Confidence Interval of −6.9, 17.8) were however positive contrary to seminal aggregate studies that reported negative nutrient balances at national level [2].

Variability of Nutrient Balances
The Variance Partitioning coefficients (adjusted Intra-class correlation coeffeicients) for NPK nutrient balances at different levels of model hierarchy are summarised in Table 5 while absolute values of variances and covariances are shown in Table 6.
For farm nitrogen nutrient balance, 48% of the variation lies between agroclimatic zones (between agro-climatic zone variability) while 52% of variation lie between farms.For each of the nutrient balances studied, a high proportion of total variation was from between-farm variability, 52%, 79% and 89% for nitrogen, phosphorus and potassium respectively.
Based on residual variances of each nutrient balance and covariances at level 1 (farm level; Table 3), the study observed high positive correlations between Nitrogen and phosphorus nutrient balances (r = 0.8), Nitrogen and potassium nutrient balances (r = 0.82) and moderate correlations between phosphorus and potassium nutrient balances (r = 0.68).These results imply a high dependence between effect sizes at level 1 of the study, dependence that cannot be ignored during analysis.Similarly at level 2, the study observed high dependence between variables as measured by correlations: Nitrogen-phosphorus (r = 0.75), Nitrogen-potassium (r = 0.87) and Phosphorus-potassium (r = 0.92).

Agro-Climatic Zones and Nutrient Balances
The study assessed whether agro-climatic zones differ from each other, on average, with respect to farm nitrogen, phosphorus and potassium balances.This was explored preliminarily by looking at variance partitioning coefficient (VPC) and two tests i) assessing whether the variance of the random components of the intercept differ from zero and by ii) conducting likelihood ratio test.
Variance partitioning coefficient (VPC) measures the proportion of total variance which lies at the Agro-climatic zone level (level-2).Interpreted as VPC, 48%, 21%, and 11% of variation in nitrogen, phosphorus and potassium farm nutrient balances lie between agro-climatic zones respectively (Table 5; Table 6).This indicates that agro-climatic zones substantially differed with respect to NPK farm nutrient balances.
Suppose Agro-climatic zones were to differ only slightly or not at all, then the agro-climatic zone j values of oj  (Equation ( 1)) should differ little from each other and or exhibit low-to-no variance.However, a 95% confidence levels for random part of the model (Table 6) indicated that variances for farm nitrogen (95% CI: 1005.33,7919.74) and phosphorus (95% CI: 69.8, 2281.36) were significantly different from zero (Table 6).This indicates that there were significant differences between agro-climatic zones with respect to nutrient balances.
The study further used likelihood ratio test to tringulate the observation above as variances are known to have positively skewed sampling distributions while 95% confidence intervals assume assymptotic normal sampling distribution and may not be reliable.A likelihood ratio test done by comparing a model with agroclimatic zone effects (with oj  ) and one without oj  , to assess whether The p-value associated with the Likelihood (LR) test statistic (Chi Square value of 196.2) with 1 degree of freedom is 0.0001.Since the p-value is very small, we reject the null hypothesis (Ho: no agro-climatic zone variation or cluster effect exists and restricted or single model is "the true model") and conclude that a gro-climatic zone variation exists and is significant ( ) ( 2 1, 349 196.2,

p<
. This further confirms that there were significant differences between agro-climatic zones with respect to farm nutrient balances.

Household Resource Endowments and Nutrient Balances
A two-level multilevel model (multivariate multilevel model) fitted to the dataset (see Section 4) to test the hypothesis: All household resource endowments do not have an effect on the magnitude of full N, P and K nutrient balances returned results shown in Table 7.
The observation that more than one household resource endowment variable has an effect on NPK nutrient balances provides a strong evidence against the null hypothesis (e.g.Value of livestock (0.0005 kg N ha −1 , p < 0.001); cropping family labour (−0.05101kgK ha −1 , p < 0.01), Table 7.We thus reject the null hypothesis and conclude that at least one household resource endowment has an effect on the magnitude of full N, P and K farm nutrient balances (Table 7).
A negative relationship between family labour for cropping and NPK nutrient balances was observed (Table 7) with cropping family labour having a signify-Table 7. Effects of household resource endowments on NPK nutrient balances.Average slope of land, though a biophysical factor, was considered a proxy to land endowment resource quality.Farmers' management practices and prices farmers are willing to offer for a given piece of land tend to differ depending on slope percentage, perceived degradation and ease of management attributed to slope effect.The study observed a negative correlation between average slope (%) of land and NPK nutrient balances and that average slope was significantly and negatively correlated with nitrogen balances.
The study observed mixed results with regards to effects of household resource capital on nutrient balances.While the effects of "total capital owned" significantly lowered NPK nutrient balances, livestock-related capital (value of livestock) had significant positive effect (Table 7).Thus, the study has indicated that it is the type of capital (e.g.livestock) owned and not the volume and total value of household capital that may be important in determining nutrient balances, though previous studies indicate that resource-rich farmers have a high chance of returning positive nutrient balances in their farms should they employ nutrient adding, recycling and conserving technologies and practices [30].

Conclusions and Recommendations
Based on a two-level multilevel (multivariate multilevel) model fitted to the nutrient balance dataset, this study has shown that farm nitrogen mining is taking place and is putting the quality of arable land in Kenya at stake.The study concludes that livestock household resource endowments is an important determinant of nutrient balances at smallholder farm level, thus recommends improvement of livestock practices at farm level not only to improve on farm nutrient balances but also to increase farm-profitability.However, it is further noted that ownership of large volumes of capital (total value of capital) and family labour resources do not automatically translate into positive effects on farm nutrient balances, but rather it is the type of capital owned (e.g.livestock) and what use it has been put to that matters.
The study generated interesting results and demonstrated that multivariate multilevel models can be used to conduct meta-analysis of farm nutrient balances and to explore arable land quality despite the small sample size.Future studies with large sample sizes and a large pool of relevant factors and covariates are, however, required to further give higher order insights beyond this study.
This can be reinforced by meta-analysis that focuses on summary statistics and the use of simulation modeling to summarise inferences by random numbers rather than by point estimates and standard errors.

σ
study fitted a two-level multilevel model (multivariate multilevel model) without predictors (variance component model) to NUTMON dataset to determine whether multilevel modeling was needed at all for this dataset.Intra-class correlation Coefficient (ICC) and Design effect were calculated to aid in model output interpretation.The multilevel equations for the variance component model were specified as followsWritten in (mixed model) form by substitution of the level-2 equation into the level-1 equation, the model is: response variable for i th farmer (level-1) in j th agro-climatic zone; oj β = Random intercept for j th agro-climatic zone (mean of all individual farmers in j th agroclimatic zone) oo γ = Random intercept for all j agro-climatic zones (grand mean of all js) ij  = Residual effect (variation) for i th farmer around the mean of j th agroclimatic zone (random effect) oj  = Residual effect (variation) for j th agro-climatic zone around the grand mean (of all agro-climatic zones ie across all js) is the variance at individual farmer (level-1)

=
Total variance at level-2

σσ
), describing a variance component model.Iterative Generalised Least Squares (IGLS) in R2MLwiN package used to quantify the parameters of the model, returned parameter estimates shown in (Table 3), Section 2.3.Similarly, variability of nutrient balances (heterogeneity) at level-1 and at level-2 model hierarchy were estimated using Variance Partitioning Coeffcient (VPC)/Intra-class correlation coeffcient (ICC) using Equation (2) (see explanation in Section 2.3): is the variance at individual farmer (level-1) is the variance at agro-climatic zone (level-2)

 1 Likelihood 1 Deviance
a single level model ie without oj L = value for a two-level model ie with oj L =  0 Deviance statistics for a single level model-without oj D = statistic for a two-level model ie with oj D = 

3. 1 . 3 . 1 . 1 .
Magnitude and Variability of Nutrient Balances Magnitude and Direction of Nutrient Balances The two-level multilevel model (multivariate multilevel model) without predictors (variance component model) fitted to the dataset returned the mean NPK nutrient balances, see (Table of secondary production units (SPUNo: Number of livestock types) Tropical Livestock Units (TLUNo) Value of livestock (VALLVST: in Ksh) Total capital owned (CAPTOT: In Ksh) Cropping family labour (LABCROP: in days) Land rent received (VRENTOUT: in Ksh)Hired labour for RU-Cash (LABHIRUC: in Ksh) *p < 0.0001; ***p < 0.001; **p < 0.01; *p < 0.05; *"p < 0.1 cant effect (negative) on potassium balance.A unit change in cropping family labour lowering potassium nutrient balance by 0.05101 kg K ha −1 (p < 0.01).Although smallholders in Kenya rely heavily on family labour to manage their farms[29] this labour input may be for multiple purposes and not necessarily for strategic farm nutrient management alone.Contrary to Cropping family labour, hired labour for redistribution units (LABHIRUC) had a positive effect on the direction of NPK nutrient balances.It significantly predicted phosphorus (P) balances with a unit change in LABHIRUC resulting in a change of 0.023 (p < 0.01) units in phosphorus balances.
However, and contrary to on-going narratives on blanket existence of widespread nutrient mining in Kenya, evidences from this study indicate that farm phosphorus and potassium balances are not always negative.Agro-climatic zones are characterised by different biophysical potentials that may influence farm nutrient balances to different degrees.The study draws the conclusion that farm nitrogen, phosphorus and potassium balances do differ between agro-climatic zones classified as arable land in Kenya.For example, variances for farm nitrogen and phosphorus were significantly different from zero across agro-climatic zones.The same was corroborated by likelihood ratio test.This serves to indicate the necessity of designating agro-climatic zone specific nutrient management interventions to address declining quality of arable land rather than the use of blanket intervention approaches.Household resource endowments and resource flow and allocation patterns D. D. Onduru, F. Onyango DOI: 10.4236/ojs.2017.76069985 Open Journal of Statistics have a potential to influence farm nutrient balances.This study explored the effects of household resource endowments on nutrient balances in arable land.

2. Methodology 2.1. Dataset
was considered to have "n" separate studies (where n = number of studies).Studies which did not use multi-stage sampling to identify study participants were excluded from the analysis (on-farm and on-station experiments excluded).

Table 1 .
Study areas in Kenya and number of observations (farm-households).

Table 2 .
Factors and covariates used as independent variables in this study.
(Semi-humid), ACZ4 (Semi-humid-to-Semi arid), ACZ5(Semi-arid) Total variables 18 2) Based on (1) above, applied a two-level multi-level model (multivariate multi-level model) to: Estimate an aggregate magnitude and variability of nutrient balances across agro-climatic zones that cover arable lands of Kenya; Determine whether agro-climatic zones differ from each other in terms of NPK nutrient balances; and to Identify the effects of household resource endowments on NPK nutrient balances.

Table 3 .
Two-level multilevel variance component model compared with a single level model (fixed part of the model).
1Quantifies the effects of violating the assumption of independence on standard error estimates; Multiplier to be applied to standard errors to correct for negative bias that results from nested data.c n = Average number of farmers per study; In this case (

Table 4 .
Design effect for NPK nutrient balances.
c n = Average no. of farmers per study; ICC = Intraclass correlation Coeffcient.

Table 6 .
Two-level multilevel variance component model compared with a single level model (random part of the model).