Fruit and Vegetable Nutrition Value Assessment and Replacement Based on the Principal Component Analysis and Cluster Analysis

Utilizing principal component analysis (PCA) and cluster analysis, the standardization, dimensionreduction and de-correlation of multiple evaluation index system for fruit and vegetable nutrition are performed to assign principal component factor based on cluster analysis of loading matrix and combining with actual meaning and evaluation direction of index categories. To evaluate the richness of its nutrition according to the score of nutrition of fruit and vegetable, finally equivalent replacement suggestions are given in different seasons of vegetables & fruits according to the result of clustering. Studies show that principal component cluster method can not only carry on the reasonable classification of multivariate data effectively, but also make reasonable evaluation on the sample object, and provide powerful basis for evaluation of fruits and vegetables’ nutrition.

People's nutrition improvement depends on economic development.At the same time, it also can promote the development of social economy.But with the development of social economy and the increase of income as well as the improvement of living standards, there are still people especially kids who have one-side diet and ignore the development of the human body health.Especially now, with more meat consumer, more and more elders turn to spoil their children.Indulging their children in monophagia will lead to the phenomena such as over nutrition or nutritional deficiencies in parts of human body.Yi-yong Cheng [1] pointed out in "Chinese residents of nutritional status and related health problems" that when the body has a good nutrition condition, it can promote growth, improve brain development and enhance immune function, thus creating more excellent human resources to promote the growth of economic development.Therefore, the development of the body's nutritional status is especially important.
Lacking of iron, zinc, vitamin A and other micronutrients is a main problem of residents in our country [1].The minerals that human body needs can be divided into common elements such as calcium, phosphorus, sodium and trace elements such as iron, zinc and iodine.As these minerals can't be synthesized in the human body, if you want to gain all the kinds of mineral elements, you must make supplements through dietary.Fruits and vegetables are the main element sources such as vitamin C, calcium and sodium for human body, and they contain organic acids which are beneficial for people to increase appetite, promote digestion, etc.
On the table, children regard the vegetables as their enemy, no matter what the elder say, they don't yield a little.While coming to the junk foods in shops, they treat them as treasures.At the same time, along with the preference of people for some vegetables and fruits, in order to meet the needs of the market, the anti-seasonal [2] fruits' and vegetables' production is constantly increasing.So, what characteristics do vegetables and fruits have?Zhi-hong Fan [3] pointed out that the nutrition of anti-seasonal vegetables and fruits is discounted; Tan Duimin [4] pointed out in his study of "counter-season fruits and vegetables' problems" that most of anti-seasonal vegetables' pesticide levels are over the criterion, and they may cause chronic poisoning and acute poisoning; Lv Bin's [5] study of how it is safe to eat revealed that the anti-seasonal vegetables and fruits are grown in the artificial intervention which will ultimately lead to their loss of nutrients with an expensive price; Yang Bo [6] pointed out that the counter-seasonal fruits and vegetables not only are expensive, but also have an exquisite taste and less nutrition content than the seasonal fruits and vegetables.At the same time, he also pointed out that children should not take anti-seasonal vegetables and fruits for a long term which may do harm to their growth and development and may become a potential danger to their health.Accordingly, anti-seasonal vegetables and fruits are less tasted, and the content of nutrition is much less than the seasonal fruits and vegetables.Therefore, the research on the nutritional content of fruits and vegetables and the similarity between them is particularly important to ensure human nutrition balance.
PCA and cluster analysis [7] is a very effective method in studying the similarity between these samples, which has been successfully used by scholars in many fields.For example, Hong-jian Chen [8] had successfully applied this method in the optimization of mining.It is shown that the PCA and cluster analysis not only can classify the multivariable data reasonably, but also can make a comprehensive assessment of the performance of various types to fully reflect the actual situation of the mine.Xin-hua Gao used principal component analysis and cluster analysis method in Comprehensively Assess Smart Grid Construction, and it assessed the general development and construction level of smart grid reasonably and accurately.There are also many successful examples in logistics planning [9], evaluating ecological safety of land use [10], assessment of student's grade [11], etc.
Therefore, in order to study the similarity of nutrient content between fruits and vegetables and to ensure the balance of nutrition in human body, this paper will use the principal component analysis method to divide fruits' and vegetables' indicators into some unrelated composite indicator variables, and then use the cluster analysis to make a group classification according to the new comprehensive index and the degree of feeling of differences between fruits and vegetables.Finally, combining the different proposals of various fruits and vegetables replaced in different seasons, the demand of all kinds of nutrition in the human body can be met in a more affordable price on the premise of their favorite fruits and vegetables.

Principal Component Analysis and Cluster Analysis Basic Thought of Principal Component Analysis and Cluster Analysis
By using the method of dimension reduction of Principal Component Analysis (PCA) [12], the original multiple indexes are recombined to be a few unrelated comprehensive indexes under the premise of guaranteeing the main information of original data being not discarded, which simplifies the complex problem.Commonly used mathematical processing method is to combine the original p index linear into the new comprehensive index, but if this linear combination is not limited, a lot of new composite indicators can be put forward.Therefore, in or-der to ensure the new comprehensive index which can reflect the information of original indexes as much as possible, a principal component extraction principle should be abided by when extracting the new indicators.Namely, when extracting a new composite indicator, only those maximum principal ingredients whose cumulative contribution rate reaches 85% are available, which help us seize the principal contradiction under the loss of a few information.This method of reducing the number of variables and seizing the main contradiction is helpful for us to analyze and handle the question, and it makes the following cluster analysis results more exact.
Cluster analysis [13] refers to the process of grouping the set of physical or abstract objects into categories so that objects with similar feature are in the same class.Objects in the same category have a lot of similarities and objects between different clusters have great differences.It is a kind of exploratory analysis.In the process of classification, people don't have to give a classification standard in advance but cluster analysis can start from sample data and make automatic classification.Different method of cluster analysis is used, which is likely to get different conclusions.Therefore, in the cluster analysis, we should choose the appropriate clustering method according to the research needs.
Principal component analysis and cluster analysis [11]- [14] is a new comprehensive evaluation method which combines principal component analysis with cluster analysis.The method carried out principal component analysis on the samples first, and then extracted several principal components as variables of cluster analysis.Specific steps are as follows: 1) selecting m principal component according to the cumulative variance contribution rate (usually greater than 85%) and then calculating samples' scores under each principal component; 2) putting m principal components as m variables for samples and taking cluster analysis; 3) evaluating the results of clustering and giving the relative suggestion.

Modeling for Principal Component Analysis and Cluster Analysis
Although there are many types of vegetables and fruits, they contain similar nutrients.In order to digitize the samples, we should make the definitions as follows.Definition 1.Let n be the kind of fruits and vegetables, each fruit or vegetable contain p nutrient elements, each kind of fruits and vegetables can be seen as a point with p dimension, then the p-th fruit or vegetable can be defined as Then n kinds of fruits and vegetables can be expressed as a matrix ( ) ( ) Standardized matrix is obtained by normal distribution standardized processing of the original data Calculating the correlation coefficient between samples according to the calculated standardized matrix ij ρ ( ) ( ) ( ) x .Then we can get the correlation coefficient matrix R It is known that matrix R is symmetrical and it can be transformed into a diagonal matrix , , , n U u u u =  is an orthogonal matrix referring to the eigenvectors related to eigenvalue ( ) . The j-th principal component's variance contribution ratio j a and the accumulation of variance contribution ratio are calculated as follows where m n < .To ensure the original variables lose less information and use less comprehensive index variables to analyze original data information, we will extract several principal components according to the principle of cumulative variance contribution rate (usually greater than 85%).These principal components will be regarded as new comprehensive variables.
Considering the front m-th principal component scores of every sample as the m-th variable, we can make a cluster analysis.
Definition 2. The distance between the samples.The distance between the samples is defined as follow: ( , 1 1 The distance between the categories.The distance between the categories is defined as Merge category p G and category q G and then we get a new category r G , the distance between r G and any other category can be calculated as follow:

Examples of Application
The experiment selected the Chinese residents most often eating 30 kinds of fruits and vegetables including 14 kinds of vegetables and 16 kinds of fruits.With reference to 2014 China food composition table [15], we select 11 kinds of nutrient element index to analyze the nutritional situation of fruits and vegetables.For convenient, we make the following definitions: 1 x refers to dietary fiber; 2 x refers to carbohydrates; 3 x refers to vitamin A; 4 x refers to vitamin B; 5 x refers to vitamin C; 6 x refers to vitamin E; 7 x refers to sodium (Na); 8 x refers to calcium (Ca); 9 x refers to iron (Fe); 10 x refers to zinc (Zn); 11 x refers to selenium (Se).Specific data is as Table 1.

Principal Component Extraction
Standardizing to 11 indexes of vegetables and fruits, according to the principle of principal component analysis, how much information will each principal component provide be measured by the value of the variance (eigenvalue i λ ).Principal component analysis was carried out on the data in Table 1, and we got the eigenvalue and the principal components variance contribution rate and the cumulative variance contribution rate in Table 2 The eigenvalue distribution is shown in Figure 1.From Figure 2, we can see that there is an obvious polyline from the 8 λ , so the number of principal components should be less than 8. Combined with the Table 2, we can know that the front 5 principal component cumulative variance is 89.303% and it is bigger than 85%, so we choose front 5 principle components as variables.Table 3 reveals the relations between the linear combination of principal component and the original variables, and the relationship can be described with specific function expression as follows:  Eggplant apricot apple peach wax gourd cucumber pumpkin towel gourd watermelon tomato grape orange pineapple hami melon pear romaine lettuce potato litchi longan banana radish cabbage rape lemon cauliflower balsam pear strawberry carrot mango coconut Finally, we can calculate comprehensive scores of vegetables and fruits and make an order as Table 4.

Cluster Analysis
Principal component analysis was carried out on the fruits and vegetables, and we have extracted six principal components.Using systematic clustering method to cluster the six indexes, we get the final result in Figure 2.
From Figure 2, we know that the 30 kinds of vegetables and fruits can be divided into 7 clusters.The above table shows the scores of all kinds of fruits and vegetables in each principal component, their composite scores and comprehensive rankings.As in the common 14 kinds of vegetables and 16 kinds of fruits, the scores of the first principal components, namely dietary fiber, carbohydrate, iron, zinc and selenium, in coconut are highest, and the scores of those components in lettuce are lowest.While the scores of the second principal components, namely sodium and calcium, in the lettuce are highest, and the scores of those components in coconut are lowest.From the angles of composite scores and comprehensive ranking, coconut, banana and potato get the higher score and are located in the top three, which means their comprehensive nutritional value is very high; and the composite scores of cucumber, wax gourd and lettuce are in the last three, which means their comprehensive nutritional value is low.In the top 15 fruits and vegetables, there are 13 fruits and just 2 vegetables.Therefore, fruits' nutritional value is slightly higher than vegetables.It can be seen that an overall and systematic comprehensive evaluation should be considered rather than part of the nutrient elements when evaluating the nutrition content of fruits and vegetables.
Among the 30 kinds of vegetables and fruits, coconut and mango are separately divided into the same category, and the rest can be divided into five categories, we can see it clearly in Table 5.
From the classification of vegetables and fruits in Table 5, we know that there are a lot of similarities in the nutritional content between vegetables and fruits.For example, in the third cluster, grape, wax gourd, watermelon, pineapple and other 10 kinds of fruits and vegetables contain the similar nutritional content.If you want to get the nutrition of this type, you can either intake from the fruits or from vegetables.It provides a wider choice for eaters so that they can make a choice according to season.Under the condition of guarantee for the same amount of nutrition, you can choose the tastier and more secure seasonal vegetables and fruits instead of anti-seasonal vegetables and fruits.
According to seasonal fruits and vegetables' list in Zhao Peng's [16] "anti-season vegetable and fruit, harmful or harmless", we can make a table for the 30 kinds of vegetables and fruits.The nutritional value and taste of anti-seasonal vegetables and fruits are less inferior than seasonal vegetables and fruits, because anti-seasonal cultivation is a kind of artificial intervention means used in vegetables' and fruits' growth.Its growth condition such as temperature and moisture, making its own accumulation of nutrition can't keep up with growth.The market price of this anti-seasonal inverse grow vegetables is not only more expensive than the seasonal vegetables, but it affects the taste and nutrition.As is known to all, the price of anti-seasonal vegetables and fruits is much higher than the price of the seasonal vegetables and fruits.Therefore, in order to guarantee good nutritional status, we should realize which vegetables are in season, and try to get the vegetables and fruits that contain more nutrition and more safety and healthy with a more affordable price.In guaranteeing of intaking the similar nutrition, seasonal vegetables and fruits can be replaced with each other in different seasons.As shown in Table 6, nutrient contents of the seasonal vegetables and fruits in the second category are similar and replaceable with each other.In summer, the types of nutrients can be derived from white gourd, cucumber, pumpkin, sponge gourd, tomato and pineapple instead of the anti-season wax gourd, cantaloupe, grape and orange with expensive price and less nutritional value and taste.

Conclusions and Recommendations
Based on the principal component analysis, the original data of dietary fiber, carbohydrates, vitamin A, vitamin B, vitamin C, vitamin E, sodium, calcium, iron, zinc, selenium, totally 11 evaluation indexes could be recombined into 6 unrelated comprehensive indexes.It makes the result of cluster analysis more reasonable and real.When combined with seasonal fruits and vegetables, consumers can purchase the corresponding fruits and vegetables more reasonably according to their needs.
Fruits and vegetables are closely related to the healthy development of our bodies, and they do not belong to high-fat, high-sugar and high-calorie food.Eating fruits and vegetables frequently can help people get dietary fibers which could improve digestive health and prevent them from cardiovascular diseases, vitamin C which could make a lower incidence of cataracts and enhance immunity, and other healthy nutrient elements.But with China's growing in storage technology and plant growth hormone, the wide use of ripening agent, anti-seasonal vegetables and fruits has been deeper into our daily life.Anti-seasonal fruits and vegetables are not only more expensive in price, but also are lower nutritional value.Therefore, we should have a correct understanding of what anti-seasonal fruits and vegetables are and what seasonal fruits and vegetables are, and then make replacements between them in different seasons.Now there is a common social phenomenon that the vegetables have nutrient elements which have contributed to the development of human health.However, the children and the adults don't like to eat green vegetables.What's more, most children treat green vegetables as their enemies.When their parents force them to eat vegetables, they think their parents do not love them.In addition, some families cherish their children carefully so that they can allow their children not to eat vegetables.The behavior of the anorexia will lead to the malnutrition, diabetes, hypertension, hyperlipidemia and so on.The children and the adults who do not like the vegetables can eat fewer vegetables but eat more fruits that have the same nutrition as the vegetables, which can ensure people have the good nutrition.

Figure 2 .
Figure 2. Dendrogram of 30 kinds of vegetables and fruits.

Table 1 .
. Data of vegetables and fruits.

Table 2 .
Eigenvalue and cumulative variance contribution.

Table 3 .
Component score coefficient matrix.

Table 6 .
30 kinds of vegetables and fruits distributed in seasons.