Fermentation process modeling of exopolysaccharide using neural networks and fuzzy systems with entropy criterion

The prediction accuracy and generalization of fermentation process modeling on exopolysaccharide (EPS) production from Lactobacillus are often deteriorated by noise existing in the corresponding experimental data. In order to circumvent this problem, a novel entropy-based criterion is proposed as the objective function of several commonly used modeling methods, i.e. Multi-Layer Perceptron (MLP) network, Radial Basis Function (RBF) neural network, Takagi-Sugeno-Kang (TSK) fuzzy system, for fermentation process model in this study. Quite different from the traditional Mean Square Error (MSE) based criterion, the novel entropy-based criterion can be used to train the parameters of the adopted modeling methods from the whole distribution structure of the training data set, which results in the fact that the adopted modeling methods can have global approximation capability. Compared with the MSEcriterion, the advantage of this novel criterion exists in that the parameter learning can effectively avoid the over-fitting phenomenon, therefore the proposed criterion based modeling methods have much better generalization ability and robustness. Our experimental results confirm the above virtues of the proposed entropy-criterion based modeling methods.


INTRODUCTION
Polysaccharides are produced by plants, algae and bacteria, which are used in pharmaceutical, chemical, pesticide and oil exploitation.Some microorganisms such as the lactic acid producers are known to synthesize exopolysaccharides (EPS), which can be used commercially as food additives and have health stimulating properties such as immunity stimulation, anti-ulcer activity and cholesterol reduction.However, as we may know well, EPS's fermentation mechanism is very complex because it refers to the growth and reproduction of microorganisms [1].In view of control, fermentation process contains high non-linearity, high time-varying and uncertainty.Meanwhile the lack of biosensor and the interaction of coupled parameters also bring much difficulty for the fermentation process modeling [2].In the last decade, artificial neural networks (ANNs) have been proved to be able to model nonlinear systems and successfully applied in various chemical and biological models [3].Especially they have emerged as an attractive tool for predicting and approximating the parameters in fermentation process [4], and demonstrated their powers in the factorial design [5].More examples include one ANNsbased model for amino acid composition and optimum pH in G/11 xylanase [6], and another ANNs-based model for optimization of fermentation media for exopolysaccharide production from Lactobacillus plantarum [7].In recent years, fuzzy systems and/or fuzzy neural networks researchers have paid particular attention on industrial fermentation process modeling [8].For instance, fuzzy neural network has been used for dissolved Oxygen predictive control of fermentation process [9], and Takagi-Sugeno-Kang (TSK) fuzzy system has been used for biochemical variable estimation of fermentation process [10].In addition, an application of fuzzy control in citric acid fermentation process has been adopted to maximize the biomass quantities [11].However, when MSEcriterion based objective function is used for model parameter learning, the above methods have the so-called over-fitting drawback, that is to say, MSE-criterion based modeling methods may over-fit each training sample such that the whole distribution of the training set is errously estimated and the generalization ability can not be assured.
In this study, in order to overcome the weaknesses mentioned above, the new criterion is proposed as the objective function for fermentation process modeling.This new criterion, called the entropy criterion, is based on the probability density estimation for the whole training set and relative entropy [12].And then the proposed criterion is used in the classical Multi-Layer Perceptron (MLP) network modeling, Radial Basis Function (RBF) neural network modeling and Takagi-Sugeno-Kang (TSK) fuzzy system modeling, for the EPS fermentation process modeling.

MATERIALS AND METHODS
The data we used in this study was derived from the reference [7].This project was conducted in 2004-2006 by Mumbai University of Food Engineering, in Mumbai, India.

Bacterial Strain
Lactobacilli strain is isolated from the Indian fermented food ragi.This isolation is characterized as Lactobacillus plantarum using biochemical tests.

Fermentation Conditions
The batch fermentation is carried in a 250 ml shake flask for 24 h at 150 rpm and 35℃.The pH of the fermentation medium is adjusted to 6.5 ± 0.3 with the addition of 1N NaOH/1N HCL.Flasks at the end of fermentation are analyzed for EPS production.

Analysis
The cells are separated by centrifugation(10,000 rpm, 10℃, 15 min) and the crude EPS is precipitated from the broth at 4 by the addition of two volumes of cold ethanol (95﹪).The resulting precipitate is collected by centrifugation and re-dissolved in water.The crude EPS solution is dialyzed at 4℃ to estimate the yield.

MSE-Criterion Based Fermentation Process Modeling
In most of current modeling methods, the MSE-criterion based objective function is often used for training model parameters.The MSE-criterion can be formulated as where , i d i y y are the predicted and desired output for ith sample , respectively.From Eq.1, we can see that the MSE-criterion based model parameter learning is just a local approximation process and does not consider the whole distribution of the training set [13,14], thus the generalization and robustness of the model will not be ensured and the over-fitting often occurs, especially when there are noises in the training data.

Relative Entropy and Jeffreys-Divergence Entropy
Entropy is a measurement of uncertainty in information theory, which is a function of the probability density distribution.The concept of relative entropy can be introduced to measure the difference between certain probability density distribution   f x and a given probability density distribution   2 i f x , which may be written as follows [12,15], where the smaller value of relative entropy is, the larger difference between the two density distributions is.Meanwhile, when certain probability density distribution is equal to the given distribution, the relative entropy will reach its maximum (equal to zero).It is well known that relative entropy is additive and non-symmetrical.To obtain a symmetrical measure, Jeffreys-divergence entropy (J-divergence entropy) can be used.It is also called symmetrical relative entropy which can measure the difference between two densities   According to the above J-divergence entropy, a novel objective function based on entropy-criterion will be illustrated in the next subsection.

Relative Entropy Based Objective Function
For a given training sample set , we can reconstruct two new sets, i.e., one contains the sample inputs and the sample outputs, ), and the other contains the sample inputs and the model predicted outputs, For the above sample set   1, 2,..., ， , its probability density can be estimated with the following parzen window density estimator, where  represents the window width.For a given data, it is constant and can be used to effectively estimate the corresponding density distribution.Here with maximum likelihood estimation (MLE), it is determined through cross-validation (CV) method, and the value resulting in the max magnitude is chosen [16].
For the above two data sets 1 S and 2 S , their probability density distribution functions, 1 ( , ) '' ( 2 ) ( , ) ( , ) ( 2 ) ( , ) ( , ) By using the properties of relative entropy, the bigger the value of relative entropy is, the smaller the difference between two probability densities is, as aforementioned.When the relative entropy reaches its maximal value, the two density functions will absolutely be the same, i.e.,

( , )
f z  .In other words, in this case the predicted output i y of the model approximates the sample output di y in the training set well.Consequently the novel objective function may be defined as From Eq.4-Eq.6,we can see that f(z) is obtained by Parzen window estimator , thus its value ranges from 0 to 1.According to the properties of Taylor's expansion, when   f z is small, we can just keep the linear parts of Therefore, submitting Eq.10 into Eq.9, we get, Please note, Erhan and Jose [17] have strictly inferred the following formulas, Thus, submitting Eqs.12,13, and 14 into Eq.11,we can immediately derive the novel objective function as follows Since Eq.15 actually originates from the Parzen window desity estimator and relative entropy for the sampling set and roots at the whole distribution of the training sample set, this novel objective function has the following virtues: since the new criterion is based on the density probability and not the local data points, this corresponding model parameter learning can effectively avoid the over-fitting drawback and show a less sensitivity to noise in the noisy environment.Our experimental results in this study will confirm these virtues.

Entropy-Criterion Based Parameter Learning
For a given modeling model, with the commonly used gradient descent procedure [18], we can easily get the following model parameter's learning rule, where p denotes the model parameter; t denotes the iteration number and r is the learning rate.

RESULTS
In this section, we will illustrate the performance of the proposed entropy-criterion based fermentation process modeling on EPS production from Lactobacillus.

Performance Index
In order to do the comparative study for the performances of different modeling methods with MSE-criterion and entropy-criterion, we adopt the following performance index to evaluate different modeling methods [19,20]. where   ; N denotes the number of the testing samples; d l y is the l th desired output in the testing set; l y is the predicted output of the model in testing set.Here, the smaller the value of J is, the better the performance of the corresponding training model is.

Results
In our experiments, we take three modeling methods: MLP network model, RBF network model and TSK fuzzy system model.All three models have four input nodes representing the four influential process variables (concentrations of lactose, casein hydrolysate and triammonium citrate, and inoculum size) and one output node representing the EPS yield (g/l) at the end of batch.The process data for modeling are generated by carrying out a number of fermentation runs under various input conditions.Here we collect 54 sample data as shown in Table 1, each sample data represents a pair of model inputs (fermentation conditions) and a single output (EPS concentration).For MLP network model, RBF neural network model and TSK fuzzy system model, these 54 sample data will be partitioned into a training set (45 samples) and a testing set (9 samples) [7].The training set is utilized to adjust the parameters of all three models and the testing set is used to evaluate the prediction accuracy.The EPS yield comparisons of the sample data and predicted ones in the testing set obtained by using MSE-criterion based models and entropy-criterion based models are illustrated in Figures 2-4. a L, T, C and I in the table represented for Lactose/(g/l), Triammonium citrate/(g/l), Casein hydrolysate/(g/l), Inoculum size/(vol%), respectively.
In fact, due to the extremely complexity of both the fermentation mechanism and the limitation of the experimental condition, experimental data may inevitably contain noise.Hence, how to enhance robustness of the fermentation process modeling is very important.In order to compare the robustness between MSE-criterion based models and entropy-criterion based models, we add Gaussian white noise (G(0, 1  )) to the training sample set, where 1 (0, 0.20)   [8].In Tables 2-4, we list the corresponding performance index for the testing set with 11 different Gaussian white noises.

MLP Network Modeling
Multi-Layer Perceptron (MLP) network [21] is one of the most widely utilized paradigms in the fermentation process modeling, because it is very simple, general and matured.In the network training procedure, the tangent sigmoid activation function and linear combination function are used for computing the outputs of the hidden and output nodes, respectively.When developing an appropriate MLP model, we must carefully select the number of hidden nodes and then use Back-propagation procedure (BP procedure) [22] to adjust the model parameters.Here the MLP network model contains 15 hidden nodes, and its architecture is illustrated in Figure 1(a).The experimental results about EPS fermentation data from Lactobacillus are illustrated in Table 2.

RBF Network Modeling
Another widely utilized modeling method is Radial Basis Function (RBF) neural network [23], Just like MLP network, RBF network is essentially a feed-forward network.However, RBF network utilizes radial basis functions as its activation functions in the hidden layer.In our experiments, the number of hidden nodes is fixed to be 13, and the RBF network's architecture can be seen in Figure 1(b).The experimental results about EPS fermentation data are illustrated in Table 3.

TSK Fuzzy System Modeling
Takagi-Sugeno-Kang (TSK) fuzzy system [24] has been widely applied, due to its strong capability in learning, universal approximation and handling with matural linguistics with fuzzy rules acquried from the skilled worker and/or experts.In our experiments, the number of the fuzzy rules is fixed to be 8, and the architecture of the TSK fuzzy system can be seen in Figure 1(c).The experimental results about EPS fermentation data are illustrated in Table 4.
As it can be seen from Tables 2, 3 and 4, the prediction accuracies of these three modeling methods with the proposed entropy-criterion based objective function are obviously higher than these methods with MSE-criterion based objective function.This fact means that the proposed objective function is very suitable for the EPS fermentation process modeling.2-4), and Y-axis denotes the testing performance index.Dotted lines correspond to the testing performance indices of MSE-based criterion (see the third column in Tables 2-4), while real lines correspond to the testing performance indices of these modeling methods with entropy-based criterion (see the fourth column in Tables 2-4).From Figures 5-7, it is easy to observe that the three curves corresponding to these three modeling methods with MSE-criterion objective function are always respectively over the curves of these three modeling methods with entropy-criterion based objective function.In addition, with the increases of the noise, the curves of predicted performance indices in Figures 5-7, corresponding to the MSE-criterion based modeling methods, have dramatic changes, which mean that the prediction accuracy is deteriorated greatly with the increasing of noise, while the curves corresponding to entropy-criterion based modeling methods in these figures are very smooth.Therefore the experimental results obviously demonstrate that the entropy-criterion based modeling   methods have a better generalization and robustness than the MSE-criterion based modeling methods in the EPS fermentation process modeling.

Statistical Results for the Obtained Performance Indices
In view of the mean and standard variance of EPS production obtained from the above experiments as the output of the training samples, we can see from Table 1 that the standard variance is not little, therefore, it is necessary for us to observe the performance of the above three modeling methods from the statistical viewpoint.
In this experiment, we keep the same inputs in the training set as above, however, add noise to the corresponding outputs.The added noise has the mean zero and the same standard variance as derived from the experimental data.In order to keep the experimental results fair, we run each sample data 50 times, and then take their means and standard variances of the performance indices J for the corresponding modeling methods.Table 5 lists the obtained results.We can clearly see from Table 5 that, both the means and standard variances of the outputs of these three modeling methods with entropy-criterion are always lower than the ones with MSE-criterion.This fact confirms our claims again that the proposed entropy-criterion based modeling methods possess the favorable capability in approximation, generalization and robustness.

DISCUSSION
When studying fermentation process modeling of EPS from Lactobacillus, we must consider two factors.One is the collected data corrupted by noise, due to the shortage of apparatus and the limitation of experimental conditions.The other is the comparatively weak generalization and robustness capability of current MSE-criterion based modeling methods.In this work, the EPS fermentation process modeling methods with entropy-criterion based objective function are addressed.When it is used in MLP network modeling, RBF modeling and TSK fuzzy system modeling for EPS fermentation from Lactobacillus, our experimental results demonstrate that three modeling methods with entropy-criterion are less sensitive to noise and have better generalization abilities and robustnesses than three modeling methods with MSEcriterion.Because the proposed objective function is derived from the Parzen window desity estimator and relative entropy, and considers the whole distribution structure of the training set in the parameter's learning process, which is different from previous study.The results obtained in this study are very useful in modeling EPS fermentation process, and the entropy-criterion based modeling methods can also be efficiently applied to other fermentation processes.

Figures 5 - 7
Figures 5-7 are generated from Tables 2-4.In Figures 5-7, X-axis denotes the added noise corresponded (see the second column in Tables2-4), and Y-axis denotes the testing performance index.Dotted lines correspond to the testing performance indices of MSE-based criterion (see the third column in Tables2-4), while real lines correspond to the testing performance indices of these modeling methods with entropy-based criterion (see the fourth column in Tables2-4).

Figure 1 .
Figure 1.(a) Architecture of MLP network; (b) Architecture of RBF neural µ n etwork; (c) Architecture of TSK fuzzy system.

Figure 2 .
Figure 2. Comparison of EPS yield prediction using MLP network model.

Figure 3 .
Figure 3.Comparison of EPS yield prediction using RBF model.

Figure 4 .
Figure 4. Comparison of EPS yield prediction using fuzzy system model.

Figure 5 .
Figure 5.Comparison of testing performance indices of MSE-criterion and entropy-criterion based MLP network modeling method.

Figure 6 .
Figure 6.Comparison of testing performance indices of MSE-criterion and entropy-criterion based RBF modeling method.

Figure 7 .
Figure 7.Comparison of testing performance indices of MSE-criterion and entropy-criterion based fuzzy system modeling method.

Table 2 .
The results about MLP network modeling with MSE-criterion and entropy-criterion.

Table 3 .
The results about RBF network modeling with MSE-criterion and entropy-criterion.

Table 4 .
The results about fuzzy system modeling with MSE-criterion and entropy-criterion.

Table 5 .
Statistical results of the performance index J of three modeling methods.