Gap Filling of Net Ecosystem CO2 Exchange (NEE) above Rain-Fed Maize Using Artificial Neural Networks (ANNs)

The eddy covariance technique is an accurate and direct tool to measure the Net Ecosystem Exchange (NEE) of carbon dioxide. However, sometimes conditions are not amenable to measurements using this technique. Thus, different methods have been developed to allow gap-filling and quality assessment of eddy covariance data sets. In this study first, two different Artificial Neural Networks (ANNs) approaches, the Multi-layer Perceptron (MLP) trained by the Back-Propagation (BP) algorithm, and the Radial Basis Function (RBF), were used to fill missing NEE data measured above rain-fed maize at the University of Nebraska-Lincoln Agricultural Research and Development Center near Mead, Nebraska. The gap-filled data were then compared by different statistical indices to gap-filled data obtained with the technique suggested by Suyker and Verma in 2005 [SV the structure of RBF and MLP (BP) networks was constant. However, data analysis indicated Papale’s approach gave better fits than the RBF and MLP (BP) methods. Thus, based on this work, Papale’s approach is the best method to estimate the missing data; though the applied statistical indices, which were used for model evaluation, show little difference between Papale’s approach and the RBF and MLP (BP).


Introduction
The eddy covariance technique is one of the most accurate and direct tools to help us measure the carbon exchange between the surface and the atmosphere in various ecosystems.
Most often eddy covariance data quality has problems, as a result of instrument malfunctions, power outage, extreme weather conditions, and insufficient turbulent mixing [1] [2] [3]. The annual average of missing values has been reported to range from 30% to 45% [2] and even as high as 65% [4].
Stable conditions and low turbulence are factors that result in under-estimation of daily NEE. Since these conditions often occur at night the Net Ecosystem Exchange (NEE) values are removed when the measured wind speeds are less than a threshold friction velocity (u*) [5].
To fill missing data, various methods are applied such as non-linear regressions, dual unscented Kalman filter, semi-parametric models, terrestrial biosphere models, and artificial neural networks [1]- [6]. The comparison of several approaches by [6] indicated that Artificial Neural Networks (ANNs) have some advantages such as their performance without the underlying assumptions in mechanistic models, and the capability to provide a good estimation in different ecosystems.
Although intensive efforts in gap-filling methods have already been made, there is a necessity for conducting more research in order to establish the accuracy and reliability of the techniques based on variability in plant, climate, and particularly, the quality and quantity of the data set.
In this study, first, we investigated the function of two different ANN methods known as the Multi-layer Perceptron (MLP) trained by the Back-propagation (BP) algorithm and the Radial Basis Function (RBF) in order to fill the missing NEE data for rain-fed maize. Then, the gap-filled data, using mentioned methods, were compared with the technique suggested by Verma & Suyker (V&S) [7] and the ANN method presented by [8]. The method applied by Verma & Suyker has been accepted as a basic approach for the AmeriFlux sites at Mead, Nebraska and Papale's approach has been recently utilized for filling the data gaps by scientists in the AmeriFlux community.

1) Artificial neural networks:
Artificial Neural Networks (ANNs) are dynamic systems, which can compute and identify the relationships among input and output data to obtain estimation Journal of Software Engineering and Applications with the best approximation [9]. ANNs are composed of simple interconnected processing elements (PEs), called neurons, which operate in parallel. They endeavor to make a model that is similar to the neuro-synaptic structure of the brain [9]. An artificial neuron, like its biological counterpart, is comprised of input vectors (P) which are analogous to dendrites, weight vectors (w) which are similar to synaptic junctions, bias vector (b) which represent the least stimulation threshold, an activation function (f) that mimics soma function, and an output vector (a) that is analogous to the axon ( Figure 1).
Today, ANNs are powerful tools that can approximate many complex phenomena. They were trained to perform pattern recognition, identification, classification, and control systems. ANNs are particularly suited to fill data gaps for NEE flux using other sources of data [2] [6] [10].
2) Multi-layer perceptron network: Multi-layer Perceptron (MLP) Networks are feed-forward neural networks containing input layers, an output layer, and one or more hidden layers. An MLP is a fully connected network ( Figure 2). In other words, each neuron in each layer is entirely connected to neurons in the previous or next layer. The weight and bias parameters are adjusted by the activation function and learning rule. The activation functions in hidden layers are usually tangent hyperbolic or sigmoid, whereas, in the output layer, the application of a linear function is recommended. The learning rule is a procedure, which the network parameters are adjusted in order to obtain a specific function, which can map the input vector to the output vector.
The best systematic method for training MLP networks is the Steepest Descent Back-propagation, organizing two main paths, forward and backward (Safa et al., 2011). In the forward path, the input vector is applied through the MLP network and its effects, via the hidden layer, are distributed onto the output layer without any change in the network parameters. In the backward path, unlike the first one, all network parameters change and are adjusted by the error correction rule. An error vector is formed in the output layer and is taken into account by the difference between the actual (observed) and estimated values. The error values from the output layer are distributed in the entire network. This process is continued until the amount of error becomes as low as possible and stability is observed in network parameters.
Equations (1) and (2) respectively, show the adjustment process in the output and hidden layers (Kumar et al., 2002): where; w is weight, N the number of iteration, χ input value, η learning rate, φ output, and δ is defined as 2 q I ε φ ∂ ∂ , I the sum of the weighted inputs, q neuron index of the output layer, and q ε error signal.  The learning rate (η ) determines the step span on the activation function over each repetition of network parameters optimization. If lower values of the learning rate are selected, the changes in the network parameters will be smaller after each repetition, a case that will help to smooth the movement path of parameters toward the optimum quantities and will slow down the learning process.
Inversely, when the learning rate is increased, although the learning speed is increased too, large changes are made from one repetition to the next, which occasionally will bring instability in the network situation generally referred to as divergent network parameters [9].
The momentum parameters can help to avoid this problem. Momentum is defined as the amount of inertia that is increased in each network parameter. It is utilized to improve the learning rate and prevents instability in the network [11]. Therefore, to modify the Back-propagation algorithm using a momentum term, the change of weight parameter is computed as follows [12]: where; µ is momentum coefficient and ( ) One of the other network parameters is epoch. One entire set of inputs which is given to the network at each learning cycle is called an epoch.

3) Radial basis function networks:
A Radial Basis Function (RBF) network is a fully inter-connected feed-forward structure composed of three layers (input, hidden, and output). The activation function in hidden neurons is a Gaussian exponential whereas, in the output layer, it is a linear function. The Gaussian function consists of two main parameters, center (c) and width ( σ ). In each neuron in the hidden layer, the Euclidean (radial) distance (x − c) is computed where x is input vector, and c is the prototype vector, which represents the center of RBF.
The output value for the hidden layer ( Figure 3) is computed by: Furthermore, the output of the output layer is computed by: where, N, L, and M are the number of units in input, hidden, and output layers; respectively, and P is the number of input patterns, p j R is the hidden layer activation function, p j h is the normalization of the hidden layer activation function.
RBF networks are trained using supervised and unsupervised learning [13] [14] [15] [16] [17]. In the unsupervised learning phase, the centers and widths of RBF are determined. First, the center ( ij c ) of the basis function ( j c ), using the K-means algorithm, and according to the input data similarities, are clustered in a finite number of samples so that the first L data is selected as an initial value for the center of first cluster. Then, all samples ( j c ) modes ( j M ) are classified in different groups based on the nearest center. and p x belongs to the subset of j c . j c is reiterated until a stable cluster formation is obtained.  where, j M is the number of sample nodes in the sample subset, j θ . Second, the width ( j σ ) of the basis function ( j c ) is determined using a P-nearest neighbor method which is expressed by the mean distance between the centers of the basis function and sample modes in the sample subset: In the supervised phase, the weights ( j θ ) between hidden layer neurons and the output layer neurons are determined according to the least-squares principle.
( ) where, s E is the function of j θ and is gotten by least-squares principle, p k y is observed output.
RBF networks in comparison with the Back-propagation (BP) networks have some advantages and disadvantages. For instance, they train faster than BP networks with less sensitivity to local minima [10] [18] [19].
One of the major disadvantages of RBF networks is some pre-selection of input data in the hidden neurons, due to the initial learning method during the unsupervised clustering phase, which leads to losing some information contributing to the output neurons [10].  Hourly data for twelve input elements accompanied by hourly NEE as output were used in the data matrix architecture (Table 1).
In the matrix, the rows were the calendar years, and the columns were input elements plus output values. After making the data matrix, one year-data (20%) was extracted to make a test file and the others (80%) were applied to the learn-  After making the data matrix, the training procedures with MLP (BP) and RBF using the initial network parameters for the learning and test file was initiated and continued to achieve the best network parameters based on model evaluation methods. The values of the network parameters used are listed in Table 2. There is no remarkable difference among network parameters except of the number of processing elements (PEs) in hidden layers for the RBF network, which is much greater than the MLP network. It is related to the fundamentals of RBF networks.
The quality control of the NEE values for the selected time period in 2005 using Verma & Suyker's approach indicated a number of 468 observed (screened) values that are necessary to be filled by the new data. This dataset formed the validation file. Table 3 shows the model evaluation according to the computed statistical indices for the validated file. Although there is not a big difference among the statistical indices values, the minimum RMSE was found in the RBF network. Furthermore, the quantities for MAE and e max (|Maximum Error Value -Minimum Error Value|) also demonstrate that the smallest values were observed for the RBF method. Schmidt [10] confirmed that RBF networks are able to reproduce the NEE missing values better than MLP (BP) networks. In addition, the results show the ability of the estimation of observed (screened) values using Verma & Suyker's approach is more accurate than the MLP network. Thus, the RBF network is evaluated as the best model in order to gap fill data for the NEE. Figures 4(a)-(c) show that there is no specific pattern between the estimated and observed (screened) NEE values. In other word, a significant difference is observed between the estimated and observed (screened). However, Figure 4(b) illustrates the statistical dispersion of the NEE values for the RBF network with respect to the other models is smaller. Figures 5(a)-(c) depict the dispersion of differences between the estimated and observed (screened) NEE (error) quantities. They indicate no remarkable difference between observed (screened) and estimated NEE for the three applied models though, the e max has the lowest value for the RBF network (Table 3).  Table 3. Models evaluation.  Figure 7 shows that the total gap-filled values belong to nighttime. Furthermore, a notable difference is seen between estimated and observed (screened) values, which indicates a striking distinction between the observed (screened) and estimated data.
One of the major advantages of the gap-filling procedures using RBF and MLP is that a single computation method is applied throughout a year whereas; Verma & Suyker's gap-filled producers are classified into three time periods. For each of them, the computation method of missing values is different. The study by [6] showed a low annual sum bias for gap-filling performance using an ANN comparing by the other methods such as regression techniques.
Overall, this study suggests that the application of RBF method to fill the gap data in comparison with Verma & Suyker's gap-filled producers is provide values that are more accurate.
2) Results of the comparison of NEE flux estimation using the MLP (BP), RBF, and Papale methods: Journal of Software Engineering and Applications   Table 4. Since the arrangement of the input parameters for both applied networks in sections (a & b) is similar; therefore, the results indicate that there is no remarkable difference between the size of data matrices and network parameters.
In other words, the response of MLP (BP) and RBF methods for the two distinc-Journal of Software Engineering and Applications tive data matrices which were used for both sections are almost the same.
The quality control of the NEE values for the selected time period in 2005 using Papale's approach indicated a number of 295 observed (screened) values that are necessary to be filled by the new data. This dataset formed the validation file. The model evaluation results according to the error analysis for the validated file are shown in Table 5. The smallest values for RMSE, MAE, and e max belong to Papale's approach. Table 5 also demonstrates that the evaluation indices indicate the better ability of the RBF network to fill the gap data in comparison with MLP (BP). Thus, it is concluded that according to the evaluation indices Papale's model appears to be the best approach. However, the observed differences among the applied methods are very small. The comparison of fifteen gap-filling techniques for NEE by [6] showed that the two ANNs, an applied feed-forward back propagation neural network by [8], and a stochastic Bayesian network used by [22] had the best performance for gap-filling data in different ecosystems.
The relation between observed (screened) and estimated data is shown in Figures 8(a)-(c). The stronger relationship based on pattern of NEE change was observed in Figure 8 in respect to Figure 4. The reason must be sought through the quality and quantity of data in the test and validated files.
The difference between observed (screened) and estimated NEE is depicted in Figures 9(a)-(c). The lowest dispersion belongs to Papale's method in Figure   9(c). Simultaneously, the e max has the least value for this method and however, the e max for the RBF network was very close to it.  Figure 10(b). It indicates that the estimated results using the RBF method are smoother than the others. The characteristics of data in the test and validated files have resulted in distinct differences among Figure 10 and Figure 6.    There may be several reasons that support Papale's approach compared to our methods, such as input variables and network characteristics. Network characteristics: The dissimilarity in network architecture and parameters are influential factors on the model performance; though there is no information regarding the network architecture and parameters in the references.

Conclusions
Eddy covariance method is one of the best approaches to measure net ecosystem exchange flux above plant canopies. However, data collection using this method is prone to gaps. Today, several techniques are used to fill the missing data. In The results confirmed the RBF network succeeded in the best fit for observed (screened) values based on statistical tests. Moreover, using a single method for gap-filling over a year was the advantage of the application of Artificial Neural