Predicting Cork Oak Suitability in Maâmora Forest Using Random Forest Algorithm ()
1. Introduction
Cork oak forests cover 350,000 ha in Morocco. This is about 15% of world’s cork oak area. However, it contributes only for 4% to 6% in the world’s cork production [1] . Many factors are involved for explaining this situation (social pressure, intensive harvesting, climate) and regressive trends of these forests have been described [2] .
Maâmora as example is considered one of the largest cork oak forests in the world; it covers 133,000 ha. Unfortunately, cork oak covers only 70,383 ha. Since 1940, Maâmora has been managed according to four management plans. These plans revealed a cover regression trend which has been estimated at 30% of the cork oak area [2] . Therefore, since 1950, overgrazing and acorns picking has been considered a major obstacles to cork oak regeneration [3] , since then regeneration problems still prevail and overgrazing rate is currently estimated at 81.57% [2] .
To deal with this alarming situation of non-regeneration, forest managers have undertaken huge efforts for studying and regenerating cork oak. Results have been qualified as random [4] . Consequently, administration maintained and continued rehabilitation of cork oak. Thus, 20,000 ha have been planted under the scope of 2005-2014 program.
Cork oak regeneration studies in Maâmora pointed out that water balance (climate and soil factors) associated with the high anthropogenic pressure are the main factors which determine the success of planted areas. Moreover, previous experiments showed that initial density, soil type and waterlogging have a strong influence on the ability of the strains to reject. Further, Maâmora’s soils consist on a sand layer resting above a clay layer. Thus, sand depth and slope of underground clay floor seems to be crucial for cork oak seedlings success [4] [5] . Besides, rains follow a decreasing gradient from west to east by the distance to ocean and the location of the stands.
Despite that soil factors have been well documented in the Maâmora forest, no soils clay layer depth/slope or a soil suitability to cork oak regeneration maps are available for all the Maâmora. This could explain same cases of regeneration failure.
Habitat suitability models aim predicting lands which fit well for a species according to specified requirements, preferences, or predictors [6] . As suitability analyzes the interactions among three types of factors: location, development actions, and environmental elements [6] , they constitute a good tools for supporting decision making which improve the understanding of species habitat relationship in space and time and predict their likelihood of occurrence and abundance or performance based on habitat attributes affecting survival, growth and reproduction of considered species [7] [8] .
Habitat suitability models can be classified according to the approach and to the classification function used [9] . Nowadays, predictive techniques have been improved with a direct consequence on models quality and credibility [10] . Further, machine learning (ML) methods become promising tools for assessing suitability.
ML is nonparametric techniques for regression or classification which present some advantages compared to other statistical methods. They are able to: 1) deal with complex relationships between predictors; 2) process nonlinear relationships between predictors; 3) deal with large amounts of data; 4) process complex and noisy data and 5) don’t need any assumptions related to variable distribution [11] .
Among machine learning methods, many methods have been used for suitability assessment [12] such as classification and regression trees [13] , artificial neural networks [14] , heuristic methods such as genetic and simulated annealing algorithms [15] [16] , and Random Forests [17] . Several studies showed that Random Forest (RF) models, often reach top predictive performances compared to other methodologies [10] [18] [19] .
As summarized by Cutler et al. (2007), RF applications include: 1) regression; 2) survival analysis; 3) imputation of missing values; 4) clustering, multidimensional scaling and detecting general multivariate structure through the unsupervised learning; and 5) classification.
In this work, we used RF as a classification method and to explore relationship between observed cork oak plantation’s success rate and some environmental factors judged important. Our paper is organized as follows: 1) part 1: materials and methods which consists on a brief description of the study area, factors considered and modelling framework including a general description of the modeling procedures, the random Forest model and predicted maps; 2) part 2: results will illustrate the application of RF to Maâmora forest to obtain estimation of potential plantation success rate and discuss the results.
2. Materiel and Methods
2.1. Study Area
Maâmora is located in the north of Morocco (Figure 1). The study area precipitations range from 400 mm in the
Figure 1. Localisation of the study area (a) in Morocco; (b) main tree species in the Maâmora forest.
East to 600 mm in the West 50 km far from the East. Bioclimatic conditions range from subhumid in the Western part to the semi-arid in the Central and Eastern part. Soils encountered consist of a sand layer resting on a clay layer; the transition between soil layers can be gross to net [5] . Further, sand thickness range between 30 cm and 6 m.
Maâmora offer a multitude of goods and services for administration and local communities. Thus, it produces per year 60,000 cubic meters of cork, 100,000 cubic meters of fuel wood from Eucalyptus, 21 million feed units, and incomes exceeding 70 million dirhams (1 USD = 10.4 dirhams). Also, many other services are offered to communities: recreation for 4 neighbor cities, hunting, soil protection and recharging the groundwater. Further, Maâmora’s non wood incomes are estimated to US$440/ha/year [20] .
2.2. Factors Involved
Variables used as predictors were climatic, topographic, soil and social. The topographic variable considered is altitude which was derived from digital elevation model SRTM DEM. Soil variables considered are kind, sand depth and slope of the clay layer. Maâmora’s soil map has been used [21] . Maps of sand thickness and clay layer have been used form a previous work [22] .
Climatic variables are: annual precipitation, minimum average temperature of the coldest month, maximum average temperature of the warmest month and Emberger quotient Q2 [23] . Climatic variables have been extracted from World climatic database [24] .
Social factor has been modeled as distance from rural area (douar) to the forest stands. Since, anthropogenic pressure is higher around local communities that abuse for uses rights.
As response variable, success rates grades have been considered. Thus, all planted perimeters since 1970 have been mapped (Figure 2).
2.3. Modelling Framework
2.3.1. Modelling Procedure
Land suitability modelling process consists of four phases: model selection, training, prediction and final map production [10] . In the scope of our work, modelling procedure is described in Figure 3. Further, model selection and validation are based on training RF on the available data (historical regeneration maps on Figure 2). Thus, 2500 sample cell within regeneration areas have been chosen randomly. This data set has been randomly divided to 2 data sets with a respective ratio of 4/5 and 1/5: first part is the training set (Train DS) and the second one is the evaluation set (validation DS). On each cell of the two data sets, environmental and social factors considered as predictors have been extracted with their corresponding observed regeneration success rate (five
Figure 2. Map of areas previously planted by cork oak. Numbers from 1 to 5 refers to observed success rates: 0% - 20%, 20% - 40%, 40% - 60%, 60% - 80%, and 80% - 100%.
classes ranging from failure to success have been identified and numbered from 1 to 5: 0% - 20%, 20% - 40%, 40% - 60%, 60% - 80%, and 80% - 100%). Then, the first data set was used to develop the RF model. The second data set was used to evaluate the quality of model and to compare predicted and observed regeneration success rate.
2.3.2. Used Software
The process was mainly implemented within the R open source software environment for data analysis and statistical computing [25] . Many R packages have been used: “rgdal” for reading GIS data files, “sp” for spatial data handling, “raster” for raster and terrain analysis, “gstat” for kriging, “randomForest” [26] which implement the RF algorithm.
2.3.3. Random Forest Algorithm
RF implements an automatic combination of tree predictors and the base model is achieved through training on different bootstrap replicates of the data [27] . Each tree is trained by selecting a random set of variables and a random sample from the training dataset. As consequence to this feature, the problem of correlated variables will alleviate because they may be extracted in turn [10] .
RF needs three parameters to be defined: “ntree” (number of bootstrap samples for the original data), “mtry” (number of different predictors tested at each node), “nodesize” (minimal size of the terminal nodes of the trees). According to many RF algorithm users, it is often convenient to improve the quality of the model by choosing an optimal value for “mtry”. Liaw and Wiener (2002) proposed a default value of one third of the predictors number.
Moreover, the “ranfomForest” package within R has a tuneRF function that is specifically for searching for the “optimal” value for “mtry”. Also, it produces two useful plots: variable importance (measured as the deterioration of the predictive ability of the model when each predictor is replaced by a random noise) and partial plots (measures the marginal effect of predictor in RF estimates of success rate). In addition, it computes the OOB (“out of bag”) error estimate, which is computed for each tree over the data remaining out of the corresponding bootstrap sample, and then averaged [27] .
2.3.4. Predictive Maps
Once the RF model has been calibrated and validated, it was applied to the entire Maâmora to appreciate the suitability to cork oak and to estimate the predicted success rate of cork oak plantation.
During the validation process, confusion matrix has been used to measure the quality of classification results obtained for the independent dataset (validation dataset). Further, the Kappa coefficient assessed the quality of classification results observed during previous plantation and those predicted by the RF by using a per-pixel approach.
3. Results
The out-of-bag estimates of the error rate (ERROOB) were used to select the optimum Random Forest parameters. For the calibration dataset (CD), the Random Forest was able to classify the samples with an out-of-bag estimate of error rate less than 10% (OOB error = 8.85%).
Figure 4 shows the ranking of predictors by their importance measured as the increased mean square error (%IncNodePurity) representing the deterioration of the predictive ability of the model when each predictor is replaced in turn by a random noise. Higher %IncNodePurity indicates greater variable importance. Only four of the predictors contributed noticeably to the estimation of suitability grade, namely precipitation (Prec), elevation (Alt), temperature (m & M) and Emberger’s quotient (Q2).
Moreover, Figure 5 shows partial plots which represent the marginal effect of a single factor included in the RF model on estimates of cork oak plantation suitability. In these partial plots of marginal effects, only the range of values can be compared between plots of different variables.
The RF model provided a good prediction of success rate grade for regeneration in the validation dataset. Per class OOB errors ranges from 7.3% for the fourth classes to 16% for the third class. Further, using a per-pixel approach for classification quality assessment (observed success rates from previous plantation vs. predicted ones) through a kappa index gives a value of 0.827 which mean an almost perfect agreement [28] .
The RF model application to the whole Maâmora is shown in Figure 6. The most suitable area (4th and 5th classes: with success rates greater than 60%) are located on the western part of the forest. While the eastern part represent the lowest predictions scores of suitability. This can be explained by continentally and the rainfall gradient.
The RF algorithm used does not require any assumptions related to normality of the variables and can deal with nonlinear relationships. These features may have a substantial utility within the context of forest management and could be used to model distribution shifts resulting from climate change. Moreover it constitutes a
Figure 4. Variable importance plot generated by the random forest algorithm included within the “randomForest” package for R software.
Figure 5. Partial plots generated by the RF included within the “randomForest” package for R software. It represents the marginal effect of single variables included in the model on estimates of suitability level while averaging out the effect of all the other variables.
Figure 6. Predicted cork oak plantation suitability map for Maâmora.
promoting approach [7] [10] . Results obtained with the random forest method for predicting suitability maps are very encouraging; they can predict the success rate grades of regeneration. Further, compared to other approaches: LepoutreAbac [5] or suitability using AHP [22] , RF presents a highest predictive accuracy.
The RF variable importance measure indicated precipitation and altitude as important factors in the modelling (Figure 5). Even if precipitation is expected because of the Mediterranean climate of the Maâmora Forest, it was not expected that the altitude is also important as other factors.
When comparing the observed success rates from previous plantations with the predicted ones by RF model, the kappa statistic used indicate a good accuracy of the RF model.
As stated by Benito Garzón et al. (2006), biological validation and interpretation of results obtained by modelling process are very important. Thus, our results are encouraging because they agree with the main bibliographic conclusions [4] [5] [22] .
As compared to the larger previous cork oak distribution in the Maâmora Forest, what can be observed now is the result of historical and social dynamic on which area have been reduced mainly by intense anthropic activity (overgrazing, shifting species, etc.). Furthermore, Plantation success rates are not only affected by climatic, topographic, soil and social factors but with other factors. Moreover, seedling production and installation (seeds origin/quality, seedling production process, soil preparation and improvement, etc…), anthropogenic pressure, overgrazing and global warming issues, among others, have probably determined the current observed success rates. Models results should therefore be interpreted with full knowledge of these inevitable limitations. Assuming these limitations, the modelling framework may offer a good tool for ensuring sustainability of Maâmora forest, hence giving to the managers possibilities for planning regeneration and rehabilitation activities.
4. Conclusions
Succeeding Quercus suber plantation is a challenging operation for Maâmora forest managers. Indeed, suitability map remains a valuable tool which is very helpful to forest managers for planning forest regeneration and ensuring sustainability. The main purpose of this article is to produce this map.
The use of random forest algorithm made it possible to achieve the overall objective of this work. The produced map enables managers to choose the most suitable area to be planted and to guarantee the success of their regenerations efforts. Furthermore, it ensures suitability of the forest. The resulting map helps managers to make more rational decisions related to cork oak regeneration and to ensure the sustainability of Maâmora forest.
Acknowledgements
The research for this paper was financially supported by the wallonie bruxelle international, within the scope of the project “Comportement des plantations du chêne liège de la forêt de Maâmora, Maroc”.
NOTES
*Using random forest algorithm for assessing cork oak suitability in Maâmora-Morocco.