Analyzing Accuracy of the Power Functions for Modeling Aboveground Biomass Prediction in Congo Basin Tropical Forests ()
1. Introduction
Aboveground biomass constitutes the major portion of carbon pools in forest ecosystems (Vashum & Jayakumar, 2012) . Significant attention has been made to the fact that change in aboveground biomass may have considerable impacts on climate change or climate change mitigation (Lu et al., 2002) . Its estimation is central in quantifying and monitoring the amount of carbon stored by trees. Different methods for estimating biomass and forest carbon are used. These include the average carbon stock, forest inventory, remote sensing techniques that include correlation of spectral indices with biomass or terrestrial forest carbon (e.g., Landsat, MODIS), aerial photography, 3D digital imaging, radar signal to measure the vertical structure of the forest (ALOS PALSAR, ERS-1, JERS-1, Envisat) and LiDAR. Each method has advantages and disadvantages (Gibbs et al. 2007) ; however, allometric equations are common methods used (van Breugel et al., 2011) in association with forest inventory and remote sensing.
Different studies (Ketterings et al., 2001; Chave et al., 2004; Molto et al., 2012) have revealed that sources of uncertainty start with inventory of trees when assessing forest carbon stocks. To improve the accuracy of the forest biomass estimates, different sources of errors must be identified, prioritized and action must be taken for minimization (Picard et al., 2014) . For allometric equations, four sources of uncertainty can be identified: 1) the error due to the choice of the allometric equation or model misspecification; 2) the prediction error (uncertainty on model’s coefficients and on residual error); 3) the measurement error on the tree dimension variables; and 4) the sampling error.
While the sampling error that is dependent on the landscape heterogeneity, the plot size, the shape and the number of the plots can be minimized by the sampling design (Picard et al., 2014) ; a great effort must be done to reduce the measurement error. The two other sources of errors are the model errors that are dependent on the allometric model. The choice of allometric model appears as the most important (Chave et al., 2004; van Breugel et al., 2011; Melson et al., 2011; Molto et al., 2012; Picard et al., 2016) . Appropriate allometric model becomes a major scientific concern for the accurate estimation of forest biomass (Rutishauser et al., 2013) . The type of equation (species specific, site-specific, ecosystem specific, pan tropical, etc.) used can also have some impacts on the errors (van Breugel et al., 2011) . Chave et al. (2014) have developed unique allometric equation for all ecosystems and concluded that the site effect can be negligible if the diameter, the height and the wood density are included. In a recent study, Djomo et al., 2016; Djomo & Chimi 2017 recommended the use of existing site-specific or ecosystem-specific equations to pan-tropical allometric equations in tropical moist forests.
As presented by Zianis and Mencuccini (2004) and Pilli et al. (2006) , the mathematical model commonly used for modeling aboveground biomass was based on the power function. This was founded on the base that the growth of a plant is characterized by the relation of proportionality between its total biomass and its size (West et al., 1997; 1999) . According to Parresol (1999) , existing equations for modeling wood biomass were classified into three types: the linear model with additive error effect (Equation (1)), the nonlinear model with additive error effect (Equation (2)) and the nonlinear model with multiplicative error effect (Equation (3)) written respectively as follow:
(1)
(2)
(3)
where
denotes the aboveground biomass,
the tree dimension variables (diameter at breast height, total height, age, crown length and their combinations),
the coefficients of the equations and
the residual error. To estimate the fitted parameters (coefficients), the log-transformation is appropriate, indeed necessary, for allometric analysis (Kerkhoff et al., 2009) . The linear regression from (Equation (3)) can be used assuming that the error is normally distributed and additive on logarithm scale, as:
(4)
that can not the case for Equation (2).
Thus, modeling of the biomass cannot be limited only to the quality of adjustment and the selection criteria. It is also essential to explore the adequacy of the model established with the biological process of the tree growth. The objectives of this research are: 1) to evaluate the sensitivity of the parameters and their combination included in the models and compare the additive effects to the multiplicative effects; 2) to analyze the uncertainty in the model prediction and 3) to evaluate a methodology to reduce the uncertainty of a selected model for biomass determination.
2. Material and Methods
2.1. Harvest Biomass and Forest Inventory Data
Two types of data were used in this study, the destructive aboveground biomass data from different published work and forest inventory data from tropical African forest.
The tree harvest data were from 362 sample trees with diameter and wood density (Table 1) and with 225 trees having height measurement. These data were collected from the transition forest between the dense evergreen forest and semi-deciduous forest in the Democratic Republic of Congo ( Ebuy et al., 2011 ) with 12 trees (diameter, height and wood density), in Cameroon (Fayolle et al., 2013) with 137 trees (diameter and wood density), in Gabon (Ngomanda et al., 2014) with 101 trees (diameter, height and wood density), in evergreen forest in Cameroon (Djomo et al., 2010) with 71 trees (diameter, height and wood density) and in the Boi Tano forest reserve in Ghana (Henry et al., 2010) with 41 trees (diameter, height, and wood density). The mean diameter was 44.9 cm and median 37.6 cm. 25% of trees had diameter greater than or equal to 70 cm (90 trees).
Table 1. Description of harvest sample trees by author: n = sample size; ne = number of species, DR = diameter range; transition = transition between evergreen forest and semi-deciduous forest.
(*): 42 species and 5 identified at the genus level.
The numbers of trees with diameter greater than 80 cm and 90 cm were respectively 67 and 47. The maximum diameter was 192.5 cm.
Inventory data from the permanent plots were from the Central African Regional Program for the Environment (CARPE) and installed with the Smithsonian Institution’s assistance. This work was conducted as part of the assessment of biodiversity in the forest reserves of Dzanga Sangha in Central African Republic with 5 plots (Balinga et al., 2006) , of Monts de Crystal National Park with 5 plots (Sunderland et al., 2004) , of Waka National Park with 5 plots (Balinga, 2006) in Gabon and of Nouabale Ndoki National Park with 4 plots (Sunderland & Balinga, 2005) in Congo Republic. For those four forest reserves, 19 1-ha permanent plots were set up. The sample trees over 10 cm dbh (diameter at breast height of 1.30 m above ground level) have been measured and identified to species. The maximum diameter value was 188 cm with an average of 24.8 cm and a median of 17.8 cm.
Height allometric equation (Equation (5)) that is the best one for moist forests (Djomo et al., 2016) was used to estimate the unmeasured height tree (H) in the inventory and harvest biomass data sets. The wood density values (
) of species were obtained through the international wood density database (Zanne et al., 2009) . For species without wood density values, the average of the plot was assigned.
(5)
2.2. Modeling Aboveground Biomass
The power function as the relationship between the aboveground biomass (AGB) and predictor variables, the dbh (D), H and
is presented by Equation (6), derived from Equation (3) where
is the allometric coefficient and
and
are the allometric exponents.
(6)
When natural logarithmic transformation is applied, Equation (6) is rewritten as:
(7)
Towards the recent discussions between Kerkhoff and Enquist (2009) and Packard (2009) , Xiao et al. (2011) used Monte Carlo simulations to compare the different approaches and conclude that the log-transformed linear regression will produce more accurate estimates and recommend also applying both statistical and biological analyses. For this purpose, data analysis of the sampled trees was limited to graphical analysis (diameter distribution scatter plots) to check the nature of this error (Figure 1). This allowed choosing log-transformed linear regression because of the multiplicative error in the original scale.
Based on the values of the allometric exponents (
), nine models were established, divided into 2 groups (Table 2). The first group was composed of four models with two predictor variables, the diameter and the wood density so that
. The predictor variable of model (Equation (a)) was a compound obtained from the combination of two variables D and wood density while (Equation (b)) and (Equation (c)) additive effects models of these variables. Equation (d) characterized the effect of using D square instead of D in Equation (a). The second group with five models analyzed the effect of height as the third predictor variable. As with the first group, the product of the three predictor variables Equation (e), the square of diameter, the height and the wood density and their additive effect were examined in Equation (f) to Equation (i). The third group used seven others models from many studies characterized by the power of
Figure 1. (a) Diameter distribution, (b) Scatter plot of above ground biomass with diameter and (c) natural logarithm scatter plot which allowed the normality and the homogeneity of variance.
Table 2. The fitted allometric equation models and the allometric coefficients (
) and allometric exponents (
and
) on natural scale when back-transforming from logarithmic scale;
the diameter (in cm),
the height (in m),
the wood density (in g/cm3), AGB, the above ground biomass (in kg).
n.s= coefficient not significant at 10%; NA = not applicable.
the logarithm of the diameter as predictor variables (Equation (j) to Equation (p)).
2.3. Selecting the Best Allometric Models
For each model the following goodness of fit criteria were calculated: the adjusted
(
), the residual standard error (RSE), and the Akaike Information Criteria (AIC). Others comparison criteria were computed: the relative mean absolute error (RMAE%), the residual mean square error (RMSE), the relative mean square error (RRMSE), the proportion of sample trees outside of the confidence interval
with (
) and
with
, the predictive residual square error (PRESS), each computed as follow:
,
,
,
,
,
,
with
, the estimated biomass and
the observed biomass of sample tree i,
the sample size,
the predicted value of sampled tree i when it has been excluded to the fitted model. The predicted values of aboveground biomass of each model with the plot inventory data have been calculated. The sixteen models have been compared and the range of predictions values was estimated by plot. These ranges can explain, for each inventory plot, the uncertainty (error) due to the choice of the allometric equation.
2.4. Prediction Error Calculation
Models accuracy was analyzed by computing the prediction error propagation at tree level using inventory plot data. The Monte Carlo simulation method was used. Thus, the residual error of each model was simulated by adding to the prediction a random normal distribution error
with mean zero and standard deviation error of the fitted model. The uncertainty on the fitted parameters was simulated with Monte Carlo iteration according to a multi-normal distribution with mean as the estimated fitted parameters and the variance-covariance matrix of the model’s coefficients. For each Monte Carlo iterate, kth, random coefficients (
,
,
and
) and the kth random residual error (
) were generated and the corresponding biomass computed. At each kth Monte Carlo iterate, the predicted biomass of the ith tree was for Equation (7) as follow:
(8)
with k varied from 1 to 10,000. For each model, the 10,000 predictions of the aboveground biomass of each inventory tree by plot were computed to appreciate the uncertainty level. The predictions data were used to calculate for each model and each plot, the Monte Carlo 95% confidence interval (
and
). An interquartile of confidence interval,
, was computed as (
).
3. Result
3.1. Fitted Allometric Parameters
The allometric coefficients and allometric exponents of the power function were calculated (Table 2). The correction of the bias for the back-transformation from the logarithmic scale to original scale was done by changing the coefficient
to
. Back-transforming equations Equation (j) to Equation (p) does not allow their expressions in power function with the predictor variables so that their allometric exponents are not applicable (NA). For those models, the values of
are higher (4.8 to 131.3) and also lower (0.003). The allometric exponents of models Equation (a) to Equation (i) are between 1.85 to 2.42, 0.26 to 0.93, 0.81 to 2.21 respectively for predictor variables
,
and
.
The major misleading related with RMSE (RMSE = 0.426) is explained by the product of the two predictor variables
and
while their additive effect with Equation (b) and Equation (c) improve the adjustment quality. The allometric coefficients of the models of group 2 are about 0.11 for multiplicative effect (Equations (e) to (g) and 0.21 for additive effect (Equations (h) to (i). For this group, the allometric exponents
and
of
and
are respectively about 1.85 and 0.93 for the multiplicative effects and, about 2.27 and 0.26 for additive effects.
3.2. Choosing the Best Allometric Models
All the comparison criteria (RMSE, R2aj, RMAE, RRMSE, AIC, PRESS, P1_alpha) are characterized by the same trend in the appreciation of the models goodness of fit (Figure 2). The adjusted coefficients of determination lie between 0.927 and 0.964 and the residual errors vary between 0.296 and 0.426. All the allometric coefficients and exponents are significant except the allometric coefficient of Equation (a) as presented in Table 2. In group 3 the regression coefficients of
and
are not significant in Equation (j) to Equation (o).
The patterns (Figure 2) allowed grouping the models in three or 4 groups. With two predictor variables (
and
), the model
is characterized by a poor adjustment compared to other models. When allometric exponents were assigned to each predictor variable (additive effect) as
(Equation (c)), an improved quality is obtained with 57.7% and 48.5% of the AIC and PRESS respectively. This improvement appears the same when the affected allometric exponent of
is equal to 1 (Equation (b)). Indeed, the test of comparison of (
) is characterized by the statistic
. Replacing
Figure 2. Trend of the goodness fit criteria of the models and the associated selection criteria AIC, PRESS and P1_alpha =
. In the legend letters a to p represent the model; Example, h correspond to the model Equation (h).
D to
in the combination of D and
(Equation (d)), improved the quality of the fit, slightly higher than those of Equation (b) and not exceeding 1.5% and 0.8% respectively for AIC and PRESS. Replacing
by
or
(Equations (j), (k) and (l)) the criteria values are higher than those of Equation (b) about 50.3% to 133.5% and 27.6% and 91.6% respectively for AIC and PRESS.
By integrating the height, and taking into account the product of the 3 predictor variables (Equation (e)), the quality of adjustment are hardly improved compared to Equation (b). The increased in the values of AIC and of PRESS is about 43.1% and 22.7% respectively. Assigning an allometric exponent to each predictor variables, Equations (h) and (i), an improvement quality is obtained with AIC and PRESS less than 7.3% and 3.2% compared respectively to Equation (b). The tests of significance of the allometric exponent of
make it possible to accept that they are equal to 1. These results showed that the best models are Equation (h) and i for the three predictor variables
,
and
, while Equation (b) and Equation (c) are those for the two predictor variables
and
.
3.3. Comparison of Model Predictions
For the sixteen models compared, the aboveground biomass predictions have been done on each of nineteen permanent plots. The ranges of the estimations varied from 46.1 t/ha to 218.1 t/ha (Figure 3). The lowest range was obtained
Figure 3. The range
of the estimated aboveground biomass on each inventory plot with the 16 models equations.
and
is the highest and lowest values of aboveground biomass estimation; The inventory plots are ordered from 1 to 4 or 5 for different forest reserves with D = Dzanga Sangha, M = Monts de Crystal, N = Nouabale Ndoki, W = Waka.
with the plot 1 of Waka forest reserve while the highest range was for the plot 5 of Monts de Crystal forest. These results highlight the need for obtaining an equation as reliable as possible. The analysis of the variance was used to compare the three groups and the sixteen models (Table 3). Significant differences are observed between groups of models (F value = 54.967,
) and between the 16 models (with F value = 26.2507 and the associated probability
. The interactions between plots and groups of models are not significant (F value = 1.10,
). Therefore, the comparison of the models can be done independently of the plot. The Snedecor-Newman-Keuls test of mean showed that the groups of models 1 and 3 formed a homogeneous group with predictions of 438.7 t/ha and 442.4 t/ha respectively and significantly different to group 2 with predictions of 412 t/ha. But when this analysis was made without Equation (a), the three groups of models are significantly different and the biomass prediction of group 1 returns to 430.9 t/ha. The models Equation (b) to Equation (d) of the group 1 are equal to models with additive effects of the group 2 (Equation (h) and Equation (i)) also equal to the models Equation (m) to Equation (p) of the group 3 with
as one predictor variable. The models Equation (e), Equation (f) and Equation (g) of synthetized predictor variables of the group 2 are those of lowest predictions values (404.7 to 405.8 t/ha) while the models Equation (j), Equation (k) and Equation (l) are those with highest predictions values (448.8, 449.1 and 481.6 t/ha). Those six models are not the best models according to selection criteria.
Table 3. Comparison of models: Aboveground biomass means and interquartile uncertainty of predictions of aboveground biomass (IQ-I) evaluated on 19 inventory plots by model and group of models; Snedecor-Newman-Keuls comparison mean test with the same letter (A to E) of m for no difference between models.
3.4. Propagation Error Analysis
The prediction error propagation was analyzed with the aboveground biomass Monte Carlo iterations interquartile values IQ. The analysis of the variance showed a difference between group 3 and the two others which are identical. Group 3 is characterized by the highest value of uncertainty of 93.6 against 85.70 and 80.7 respectively for group 1 and 2 (Table 3). The comparison of the 16 models shows that the models with additive effect of group 1 and 2 form a homogeneous group with the models Equation (m) to Equation (p) of group 3 and different to the others models as presented in Table 3. The greatest values of uncertainty more than 100.0 t/ha are obtained with those four models Equation (a), Equation (j) to Equation (l) while the lowest one is related to model Equation (h) with 75.4 t/ha as presented by the Figure 4.
In spite of the homogeneity of group 2, the models Equation (e), Equation (f) and Equation (g), characterized by non-additive effects of the predictor variables are of strong uncertainty. In comparison with the quality of adjustment criteria, it arises that the best models (Equation (h) and Equation (i)) are characterized by weak uncertainty. However, the models which are badly adjusted (Equation (a) and Equation (l)) have the highest uncertainties.
4. Discussion and Conclusion
Mathematic functions that explain the growth of a plant (Niklas, 1994; Kaitaniemi, 2004; Pilli et al. 2006) are applied for modeling aboveground biomass.
Figure 4. Boxplot of aboveground biomass interquartile of the prediction uncertainty of the sixteen models with Monte Carlo iterations.
It is shown through this study that models of group 3 differ from the other models by the fact that the power of the logarithm of diameter was used as predictor variable.
However, the worst model is Equation (a) with a predictor variable as the product of diameter and wood density. But when using the predictor variables in additional effect, the model Equation (b) becomes the best one with two predictors and can be expressed in power function. The estimated models highlight that the allometric exponent of the wood density as predictor variable equalizes to 1. This value is in conformity with the results of several studies (Fayolle et al., 2013; Ngomanda et al., 2014, Chave et al., 2005, 2014) . Indeed, according to Franceschini et al. (2016) , the allometry exponent can be interpreted in terms of the relative growth rates. This kind of growth rate cannot be applied to the wood density. Under these conditions one can reasonably admit that the ideal model is summarized, for each tree i, as
. This result is in conformity with those of Pilli et al. (2006) who compared the allometric coefficient as the product of a constant value (scalar) and the wood density of the tree. This can explain the misleading quality of Equation (a) with an exponent value of 2.205 and consequently the aboveground biomass prediction value is the highest one with the highest uncertainty. Further research should better consider the exponent of wood density variable in allometric equation. The exponent values of diameter are between 1.85 and 2.42. Zianis and Mencuccini (2004) using a list of 279 biomass allometric equations showed that this value should rather be closed to 2.36.
Many studies have highlighted the importance of tree height as predictor variable in the aboveground biomass equation (Chave et al., 2014; Djomo et al., 2016) . This study confirms the adjustment and prediction qualities of models with height but the additional effect of each predictor variable must be taken into account. The same allometric exponent for two or three predictor variables (Equation (e), Equation (f) and Equation (g)) is not appropriate to the modelling of allometric equation of aboveground biomass. The best equations have been obtained with additional effect of each predictor variable as presented by Equation (h) and Equation (i). This study concludes that for modeling allometric equation, the power function which characterizes the growth of a plant is the guide to choose the models to be estimated. Taking only a same exponent coefficient leads to a bad modeling, so that the additional effect of each predictor variable must be prioritized.
This study highlights the trend of the model choice error. The highest (481.6 t/ha) and the lowest (404.7 t/ha) aboveground biomass predictions are obtained by the worse models while the best ones are of middle (428.3 to 431.4 t/ha). The prediction biomass of the “best” models is in agreement with the estimated aboveground biomass in the Congo basin forests as reported by Lewis et al. (2013) . Comparing the aboveground biomass prediction with additional effects, the models with two predictor variables are characterized at mean 8 t/ha of aboveground biomass more than the models with the three predictor variables. The models with the highest values of prediction error are those characterized by the worst adjustment. Therefore, the adjustment and model selection criteria are able to anticipate the prediction quality of the best model chosen.
Acknowledgements
This research has received funding from the Global Environment Funds under the World Bank’s grant No. TF010038, sub-component 2b of the COMIFAC Regional REDD+ Project “Establishment of allometric equations for the Congo Basin forests”, a sub-component implemented by the ONFi/TEREA/Nature + consortium. We appreciate the valuable contribution of Adeline Fayolle.