^{1}

^{1}

There is still no effective means to analyze in depth and utilize domestic mass data about agricultural product quality safety tests in china now. The neural network algorithm, the classification regression tree algorithm, the Bayesian network algorithm were selected according to the principle of selecting combination model and were used to build models respectively and then combined, innovatively establishing a combination model which has relatively high precision, strong robustness and better explanatory to predict the results of perishable food transportation meta-morphism monitoring. The relative optimal prediction model of the perishable food transportation metamorphism monitoring system could be got. The relative perfect prediction model can guide the actual sampling work about food quality and safety by prognosticating the occurrence of unqualified food to select the typical and effective samples for test, thus improving the efficiency and effectiveness of sampling work effectively, so as to avoid deteriorated perishable food’s approaching the market to ensure the quality and safety of perishable food transportation. A solid protective wall was built in the protection of general perishable food consumers’ health.

Explore the intrinsic correlation and rules among the massive data, then form the model, and this process is what we call data mining techniques [

Through consulting literatures home and abroad which are relevant to data mining [

The combination of multiple classified model structures whose every single model separately belongs to one type (the prediction result of each single model severally contains useful information from different angles) is called the combination model (the current research focus of the international machine learning community), and its principle is to use complementary advantages of different types of single models (They are not exclusive, but interrelated reciprocally among single models), the combination model after series connections can overcome the shortcomings of single models to solve a problem jointly. That is, the purpose of the combination model is to improve the robustness of the composite system by reducing the error rate of change of the combination model (That is, turn the change rate of the system prediction error into the combination of change rates of the prediction errors of the single models). Although the combination model has been widely used in many fields (e.g. speech recognition, face recognition, medical diagnosis), research and application in monitoring of perishable food during transportation are rare.

In this paper, the neural network model, Bayesian network model, classification and regression tree model are selected to be combined. It’s on the basis of the 3 principals that select the one-way models which constitute the combination model [

Serial structure, parallel structure and hybrid structure are the three structures of the combination model [

As the serial number increases, the precision is also increased (The final result of the combination model of the serial structure will outperform the prediction results of the single base classifiers) and the structure is simple. The accuracy of this paper is the key to the prediction model of monitoring of perishable food metamorphism in transit. Therefore, the serial structure is selected to combine the neural network model, the classification and regression tree model, the Bayesian network model. Because the learning result of the previous classifier of the series structure feeds the next base classifier, and so on, until the last base classifier learning. Therefore, the neural network model that has high accuracy of prediction, relatively volatile stability and poor interpretation is put in the first place, then the classification and regression tree model that has good interpretation but weak robustness is put in the second place, finally the Bayesian network model that has strong robustness, good interpretation but lower accuracy than neural network is put in the third place, thus, a prediction model with higher accuracy, better interpretation and stronger robustness than the single models is come into being to predict the metamorphism of perishable food through track monitoring. From the result of the example in this paper, we can conclude that the combination model which has been verified and assessed does achieve the intended purpose.

The complexity of the parallel structure and the hybrid structure not only makes it more difficult to build the model, but also reduce the interpretation of the model. Besides, on the improvement of the precision, both of them are not as good as the serial structure. Therefore, the parallel structure and the hybrid structure are not suitable for the establishment of the forecasting model of the metamorphism of perishable food through track monitoring.

In recent years, food safety accidents continue to cause people to pay special attention to food safety, and pay more attention to low-temperature preservation of perishable food. Temperature sensitive foods are not only required to control temperature in production, storage, and marketing, but also in temperature control during transportation and distribution. If the transport and distribution process can’t be effective in controlling the temperature caused by food quality problems, not only cause economic losses, and may threaten human life. Therefore, in order to meet people’s requirements for food quality and safety, we can monitor the temperature and humidity and mechanical damage of perishable food in the process of transportation. In this paper, the data of the strawberry transportation and metamorphism monitoring are taken as an example to establish and evaluate the data. The excavation experiment design [

In this paper, the main task of data mining is to take strawberry transport metamorphism monitoring as an example, the strawberry transport metamorphism monitoring collected time, temperature, humidity and mechanical damage and other data as the sample data, using the collected affect deterioration information attributes as input, prediction results as output. Based on the model of combination of complementary, For the neural network algorithm classification and regression tree algorithm, Bayesian network algorithm model were modeled respectively by using of data mining software clementine12.0 in the training set and test set, then the three specific model series combination, and the combination model improved the prediction accuracy of the model, so as to obtain the best prediction model of perishable food transport deterioration monitoring system.

In this paper, three parameters of temperature, humidity and mechanical damage of the metamorphic factors of a strawberry transportation were used as modeling independent variables, and metamorphism was used as the modeling target variable. At the same time according to the model to establish the need to set the type of independent variable as gather, the target set to mark.

There are three steps for data preparation: the first step is data selection, the second step is data preprocessing, and the third step is data transformation.

1) First of all, the data selection, from 6:00 a.m. to 24:00 PM, measured every fifteen minutes, to use 1000 strawberries transport monitoring process of temperature, humidity and mechanical damage of 73 groups of data as experimental data, whether or not the number of spoilt strawberries as the target data, to simplify the data mining initial data was to achieve the data selection Final goal.

2) Secondly, the data preprocessing, the quality of data preprocessing directly affects the effect of data mining, so data preprocessing is an indispensable step in data mining. The content of preprocessing usually refers to the elimination of duplicate records and the completion of data type transformations (for example, the continuous data is discretized to facilitate symbolic induction, and the discrete type is transformed into a continuous value type to facilitate neural network induction) and so on. Because of the continuity of the field of decision tree is more difficult to predict, Bayesian algorithms are required to deal with the discrete field, consistent with requirements of each model data type combination model combination, so the pretreatment of the continuous field discretization. In this paper, we need to discretize the parameters, mainly for the temperature and humidity division of different intervals, different intervals corresponding to different discrete values. For example, the temperature from −0.7˚C to 26˚C was divided into four stages respectively by 1 (low temperature: −0.9˚C - 0.9˚C), 2 (medium-low temperature: 1˚C - 10˚C), 3 (medium-high temperature: 11˚C - 20˚C), 4 (high temperature: 21˚C - 30˚C) to express; humidity from 50% RH - 95% RH is divided into six stages with 1 (90% RH - 95% RH), 2 (76% RH - 89% RH), 3 (70% RH - 75% RH), 4 (65% RH - 69% RH), 5 (56% RH - 64% RH), 6 (50% RH - 55% RH), strawberry metamorphic was indicated by 1, strawberry was not deteriorated indicated by 0; mechanical damage was calculated according to the number of strawberries with mechanical damage.

In order to escape the incomplete parameter in the pretreatment, select the appropriate parameters according to the object of study. In this paper, metamorphic strawberry transport monitoring as an example, the fundamental reason is caused by the deterioration of the strawberry pathogen bacteria, microbial pathogens by temperature, humidity, illumination and mechanical damage caused by the change of generation, or from the field (infected leaves and soil) directly bring in fruits and vegetables. Storage in almost all the major influence factors of fruits and vegetables are related to temperature, such as respiration, transpiration, ripening, ethylene production, growth, so the temperature is one of the most important key factors is the primary environmental conditions must be considered. But the water content is one of the basic characteristics of perishable foods, the perishable food in free water, bound water and hydrate, object water availability is called “water activity”. Due to microbial and fungal reproduction will cause the deterioration of perishable food, so we must determine the microbial water activity factor therefore humidity must also be considered the perishable food deterioration factor. But mechanical damage is caused by the injury caused by pathogenic bacteria and perishable food epidermal microorganisms are also the main factors to be considered. Although the photosynthesis light play a key role in the process of cultivation of strawberry, but the transport storage is not necessary. So this thesis chooses the factors of temperature, humidity and mechanical damage as a parameter perishable food transport monitoring. Therefore, there would be no parameter incomplete, if the parameter redundancy preprocessing can transform the data dimensionality reduction to solve.

3) Finally, the data transformation. In order to make the transportation monitoring data dimension reduction, we need identify the really useful characteristics from transportation monitoring the initial characteristics, and reduce the number and characteristics of variables to be considered in data mining, filtering time and sequence number, only temperature and humidity as well as mechanical injury three parameters, and the discrete data corresponding to three parameter.

In the model development process, through the unused data to verify the model was a standard model of development, this criterion can validate the model robustness, namely: if not only in the modeling data set model runs well, and in a similar data set on the model of running the same good, explains the model was not only suitable for modeling data set, but for all the data sets are applicable. Before the model was built, The last part required work was data segmentation, That was the final sample split into training data for training the model set, and the test data for the model test in two parts. The data splitting node were set up in the model, then the data are randomly divided into two parts by the data splitting node , Some were used to build the model, and the other was used to evaluate the accuracy of the model. The use of data segmentation could greatly avoid the emergence of over-fitting, can guarantee the effectiveness and reliability of the model. In this paper, the sample data was divided into 70% proportion the training data and 30% proportion the test data.

In this paper, the data mining tool Clementine12.0 software successfully used 70% of the training data and 30% of the test data to divide the sample data, and model the Bayesian network. Bayesian network algorithms for different parameters ,the classification regression tree and the neural network algorithm for different parameters were modeled respectively, the best prediction model of three kinds of models were selected to establish the series based on the three principles of model combination. The establishment of specific modeling and model evaluations would be covered in detail in Chapters 4 and 5.

Data mining software Clementin 12.0 provides a combination of three universal evaluation methods [

The numerical evaluation index includes three: the overall accuracy rate, precision rate (hit rate) and recall rate (coverage). The overall accuracy rate can be obtained directly from the correct rate shown in the partition table in the analysis report. The recall rate is based on the percentage of the coincidence matrix in the analysis report. In general, in predicting the final effect of the model, we must first ensure the overall accuracy of the model forecast, and then ensure that the model predicted hit rate, and then on this basis, try to improve the coverage of the model.

The overall accuracy rate: the correct rate of the entire sample. A represents a level (overall classification) Probability. P represents the positive and negative numbers of the correct forecasts. With T that the total number of samples, the overall correct rate is calculated as:

Hits: Indicates the percentage of perishable food that is predisposed to perishable food by the model, which is an indicator of the accuracy of the model. The formula is:

(2) where V is the hit rate, p indicates the correct number of positive forecasts, and t indicates the total number of positive forecasts.

Check the rate (coverage): that is accurately predicted by the model of food deterioration of the sample concentration of the actual percentage of deterioration. It uses an indicator to describe the accuracy. The formula is:

(3) where S denotes the coverage rate, p indicates the positive number of instances to be correctly predicted, a represents the actual number of positive instances.

The higher left side of the cumulative chart represents a better model, rather than the cumulative extension of the left side of the graph, and the lower part of the right side shows a better model. Profit charts, response charts, performance charts, profit charts, and ROI charts are the most common assessment of universality. This paper selects the income table, response graph, and performance chart as the evaluation criteria based on the selected three models.

Gains charts: A good or bad judgment of a model’s revenue graphs is to see if the lines in the table are steeply rising to 100% and gradually become gentle. If yes, the model is a good model. If not, then the model is not a good model. If a line in the model’s revenue graph is raised from the left end to the right end with a diagonal diagonal shape to a higher position, the model does not provide any information. With R that the gain, m quantile number of matching, M said the total number of matching, the gain is calculated as:

Response charts: The lines in the response graph just start from 100% or more, and the model is a good model. The entire image of the curve has been around the response rate around the description model without providing any information. The response graph is calculated as:

(5) where N is the response, s is the number of successes in the quantile, and r is the number of records in the quantile.

Lift charts: The performance model is a good model that is characterized by a high degree of stability from the beginning of the function graph just above 1.0 and then to the right. The entire image of the curve has been around 1.0 indicating that the model did not provide any information. It is calculated as:

(6) where E is the function, s is the number of successes in the quantile, r is the number of records in the quantile, T is the total number of successes, and C is the total number of records.

The mathematical model of a classification algorithm that simulates the behavior of human or animal neural network behavior and distributed parallel information processing is called the artificial neural network(Artificial Neural Networks, ANN) [

1) Training data the numerical evaluation of the neural network model of different parameters was evaluated by 70% proportion

The Numerical evaluation of the neural network of the two parameters is calculated by the calculation formula of the overall accuracy, percentage and coverage. The overall accuracy and hit ratio and coverage values are shown below:

From

2) The Quick neural network and the RBFN neural network evaluation chart are shown in the figure below:

On the test set, the analysis of

By the results of the Numerical evaluation of two parameters of Bias network 1) and Graphical evaluation of two parameters for Bias network 2) were analyzed and compared, The Quick neural network achieved better predictions on the test set than RBFN neural network modeling.

Classification regression tree technique [

The neural network of 70% Quick parameter | The neural network of 70% RBFN parameter | |
---|---|---|

The overall accuracy | 90.00% | 90.00% |

hit ratio | 90.00% | 83.30% |

coverage | 90.00% | 100.00% |

structures, important patterns and relationships of the highly complex data that are automatically detected can construct accurate and reliable forecasting models.

The classification regression tree model is evaluated based on the calculation formula of the overall accuracy, percentage and coverage. The overall accuracy and hit ratio and coverage values of the Classification regression tree model are shown below.

It is shown in

The advantage of Bayesian network [

70% training data segmentation and classification and regression tree CART | |
---|---|

The overall accuracy | 85.00% |

Hit ratio | 88.89% |

Coverage | 80.00% |

the most effective theoretical models of reasoning and uncertainty) represents a non-loop diagram of the dependencies between variables. It is made up of nodes and the arc of the connecting node. Because each node corresponds to a random variable, each node also corresponds to a conditional probability table. Qualitative representations of dependencies between nodes and the dependent relationships of nodes can be quantified by conditional probability tables. Bayesian network theory is based on a solid foundation; the knowledge structure is expressed in the way of natural; the reasoning ability is strong, in simple and intuitive way to explain the actual problem.

1) Numerical evaluation of two parameters of Bias network

Based on the calculation formula of the overall accuracy, Hit ratio and coverage, TAN Bayes and Markov’s Blanket Bayes on the test set model was evaluated in

The data comparison analysis of

2) Graphical evaluation of two parameters for Bias network

The evaluation diagram was shown in the figure below:

On the test set, the analysis of

TAN Bayesian network | Markov Bayesian network Blanket | |
---|---|---|

The overall accuracy | 84.00% | 86.00% |

Hit ratio | 90.48% | 87.59% |

Coverage | 82.61% | 91.30% |

By the results of the Numerical evaluation of two parameters of Bias network 1) and Graphical evaluation of two parameters for Bias network were analyzed and compared, the prediction of Markov Blanket Bayesian network modeling was much better on the test set than Markov Blanket Bayesian network modeling.

From the Section 5.1 neural network Numerical evaluation and Assessment evaluation map of results and in Section 5.2 of the classification and regression tree CART Numerical evaluation and Assessment evaluation map of results and in section 5.3 of the Bayesian network numerical evaluation and Assessment evaluation map of results to get the optimal combination of serial, Based on the 70% proportional training data and 30% proportional testing data were used to segment sample data, respectively established Quick neural network model, classification and regression tree CART model and Markov Blanket Bayesian network model combined in series in order to improve the accuracy of the models.

1) Numerical evaluation

Based on the numerical evaluation calculation formula, the results of the overall accuracy and hit ratio and the coverage of the combination optimization model were calculated (

Combination optimization model | Quick neural network model | Classification and regression tree model | Markov Bayesian network Blanket | |
---|---|---|---|---|

The overall accuracy | 95.00% | 90.00% | 85.00% | 85.00% |

Hit ratio | 88.89% | 90.00% | 88.89% | 87.59% |

Coverage | 100.0% | 90.00% | 80.00% | 91.30% |

We can know from the table above that on the test set for the combined model results for the overall accuracy of 95.00%, 88.89% hit ratio, 100.00% coverage. Therefore, the prediction results of the combined model are better than that of the single model.

2) Assessment evaluation map

From

By the results of the Numerical evaluation of two parameters of Bias network 1) and Graphical evaluation of two parameters for Bias network were analyzed and compared, the prediction of combination forecasting model was much better on the test set than a single model prediction results. And the accuracy of model prediction was enhanced by using the advantages of individual models.

The data mining system is widely developed and applied in business, economy, finance and management, but for perishable food transportation metamorphic monitoring field that does not see more, it is used to establish perishable food metamorphism prediction model for the transportation destination perishable food fast detection to extract the effective and typical samples for guiding the rarer. There are two points of innovation in this article: First, the data mining technique is applied to the prediction model in perishable monitoring of perishable food; Second, use the model of combination of the three principles, classification, regression tree and select neural network Bayesian network model of three complementary combinations to create a relatively high precision, strong stability, good explanatory model to predict perishable food transportation monitoring metamorphism. Through the combination forecast model of transportation destination perishable food fast detection to extract the effective work instruction, the typical test sample effectively improves the efficiency and effect of sampling inspection work, to avoid bad perishable food into the market and ensure the transportation safety of perishable food quality.

This paper, by using data mining technology for perishable food transportation metamorphic monitoring makes a preliminary exploration, and obtained some achievements, but there are still some limitations in theory and operation practice, and further studies are needed. In this paper, only 1000 strawberries were selected for the study, and the number of samples required for the data mining was still to be increased.

This work was supported by funding project for Youth Talent Cultivation Plan of Beijing City University Under the grant number (CIT&TCD201504051), and Beijing outstanding talent training project (2014000020124G093) and Beijing Intelligent Logistics System Collaborative Innovation Centre.

Liu, T.J. and Hu, A.Q. (2017) Model of Combined Transport of Perishable Foodstuffs and Safety Inspection Based on Data Mining. Food and Nutrition Sciences, 8, 760-777. https://doi.org/10.4236/fns.2017.87054