_{1}

Raohe honey (Honey in Raohe) is the only product which has obtained China’s national geographical mark for honey; however, it is always counterfeited by some producers due to its excellent quality. In this research, Raohe honey was identified by geographical sourcing, where the detection on 166 Raohe honey samples and 31 non-Raohe honey samples was conducted with Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Additionally, the method of Primary Composite Analysis accomplished dimensionality reduction by transforming the abundance ratios variables of 13 isotopes to 4 primary composites, and could explain 91.17% of the total variables. There were five models: Decision Tree, Naive Bayes, Neural Network, Partial Least Square Discriminate and Support Vector Machine, built on the four new variables of primary composites with the Agilent MPP Software. The validation of the models was performed with 11 Raohe honey samples and 5 non-Raohe honey samples randomly selected. The accuracies of the Decision Tree and Support Vector Machine models were both 93.97%, and those of the Naive Bayes and Neural Network models were both 87.5%, while the contribution rate of the Partial Least Square Discriminate model was only 75%. It was concluded that the Decision Tree and Support Vector Machine models could be used for indentifying Raohe honey, and the Naive Bayes and Neural Network models could work as references, while the Partial Least Square Discriminate model was not suitable for identifying Raohe honey.

Raohe linden honey is collected from the flowers of chaff lindens and purple lindens by northeastern black bees introduced from the east of Wusuli River in the early 20^{th} century. Since 1997, Raohe County has established the national nature reserve for northeastern black bees, which are pure in species. Chaff linden honey and purple linden honey are high in Baume. Northeastern black bees have a good immunity and are almost out of the need of antibiotics; accordingly the problem of veterinary drug residue is fundamentally resolved.

In recent years, Raohe honey has won quite a number of prizes in the domestic and foreign expos for its good quality and hence become preferred by consumers. Meanwhile, however, some traders and producers counterfeit Raohe honeys by packing non-Raohe honey with the packages of Raohe honey; these counterfeited products have not only deceived consumers but also greatly affected the local apiculture. Therefore, geographical origin sourcing and identification technologies are urgently needed for Raohe honey.

The first review about the researches of honey sourcing was a paper about the detection methods of geological and botanical sourcing of honey published in 1998 by Anklam et al. [

Sr locates in row 5 and group IIA of the periodic table of elements, having four isotopes: ^{84}Sr, ^{86}Sr, ^{87}Sr and ^{88}Sr, in nature. Though the ratios of Sr isotopes change during the processes of absorption and metabolism in plants and animals just like S, C, H, O and N isotopes do, the content of ^{87}Sr can be used as a target for geological sourcing. The ratio ^{87}Sr/^{86}Sr in plants and animals is affected by the amount of Sr absorbed by living bodies from the ground [^{18}O and dD are equal in living bodies, which means that the climate variation is insignificant, the effect of Sr isotope ratios identification is satisfactory [

B isotopes present fractionation effects owing to geochemical processes and hence result in variations among the ratios of ^{11}B/^{10}B in rocks, marine sediments and natural waters. Another important mechanism of the

natural fractionation of B isotopes is that the element exchange among boric acid, borate ions and

along with the variation of the pH value lead to the concentration of ^{11}B in boric acid. These natural processes might give rises to the d^{11}B value of 9%. Except for natural factors, the spread of chemical fertilizer with boron may also affect the ratio ^{11}B/^{10}B, resulting in great differences in the ratios of ^{11}B/^{10}B among different pieces of ground [

There are four natural isotopes of Pb:^{204}Pb, ^{206}Pb, ^{207}Pb and ^{208}Pb. ^{204}Pb has a half-life of 1.4 × 10^{17} years which is long enough for it to be regarded as a stable reference isotope; while ^{206}Pb, ^{207}Pb and ^{208}Pb, which always vary in natural abundances, are respectively the end products of the radioactive decay processes of ^{238}U, ^{235}U and ^{232}Th. Because of the differences of periods and contents of U, Th and original Pb, different kinds of natural matter have different compositions of Pb isotopes, becoming their characteristics which are constant in chemical and physical changes. Therefore, the abundance ratios of Pb isotopes can be regarded as the “finger prints” for identifying the origin of Pb, with the advantages of being able to show the possible origin and passing route of Pb and consuming a little amount of sample. Meanwhile, the abundance ratios of Pb isotopes in different areas differ from each other on account of different geological structures, geological ages, mineral contents and precipitations. The metal elements in living bodies mainly come from soil and ground water; therefore, the abundance ratios of Pb isotopes are able to indicate the origin area [

In consideration of the pertinence of the isotopes of the three elements above, this research detected the contents of Sr, B and Pb isotopes in honey and then conducted primary composite analysis with the data analysis software, Agilent Mass Profiler Professional (MPP), on the foundation of the isotope abundance ratios and forecasting models built afterwards. MPP is a powerful chemometric platform aimed at exploring and utilizing massive data information from mass spectroscopy and applying to any differential analysis based on mass spectroscopy for the purpose of determining the relationship between two or more sample groups and variables. The system also provides the functions of automatic sample classifying and forecasting, which have brought about revolutions in the qualitative analysis of unknown samples based on mass spectroscopy among many analytical applications. The MPP software was specially designed for mass spectroscopy specialists and statisticians, with introductory advanced working procedures. It provides broad statistical tools, including ANOVA, PCA, volcano plot, hierarchy tree, SOM, QT cluster analysis and 5 different classifying forecasting methods. Latorre et al. [

Inductively coupled plasma mass spectrometry (ICP-MS, Agilent, USA) with glass concentric nebulizer and Nickel sampler, with Ar and He as collision gases; MARS Xpress (CEM, USA), Milli-Q Water Purification System.

Sr isotope reference material: NIST987; Pb isotope reference material: NIST981; B isotope reference material: NIST951.

166 Raohe honey samples were collected from 7 producers: Heifengyuan (HFY), Raofeng (RF), Dadingshan (DDS), Dajiahe (DJH), Laoyinggou (LYG), Yongxing (YX), and Hongqiling (HQL); 31 non-Raohe honey samples were purchased from different supermarkets.

Sample preparation: an appropriate amount of honey was weighted in a centrifuge tube, when conducting ultrasonic process in 50˚C water for at least 30 min until the honey turn into lucid liquid.

Microwave digestion: for each sample pretreated, 0.1 g lucid liquid was weighted, to an accuracy of 0.0001 g, and put into a PTFE digestion tank, which had been cleaned by acid boiling, together with 5 mL concentrated nitric acid and 3 mL hydrogen peroxide. After that, the liquid was digested with heat following a pre-defined procedure. Next, it was cooled to ambient temperature, followed by opening the digestion tank, washing the inside and the cap for 3 to 4 times with small amounts of ultrapure water, and collecting the washing liquids to a 50 mL volumetric flask. At last, the liquid was made up to the volume with water and mixed.

ICP-MS was used to detect the concentrations of ^{84}Sr, ^{86}Sr, ^{87}Sr, ^{88}Sr, ^{10}B, ^{11}B, ^{204}Pb, ^{206}Pb, ^{207}Pb, and ^{208}Pb in the samples, with He as collision gas to reduce the interference of isobars, while performing mass calibration with reference materials [

There were 13 sets of abundance ratio data between Raohe and nonlocal (non-Raohe) honey: ^{84/86}Sr, ^{84/87}Sr, ^{84/88}Sr, ^{86/87}Sr, ^{86/88}Sr, ^{87/88}Sr, ^{10/11}B, ^{204/206}Pb, ^{204/207}Pb, ^{204/208}Pb, ^{206/207}Pb, ^{206/208}Pb, and ^{207/208}Pb. To conduct primary composite analysis, the large amount of data needed dimension reduction which could transform the former variables into new variables, which were the linear combinations of the former variables, and should represent the characteristics of the former variables as much as possible without information loss, thereby eliminating the overlaps in coexisting information. The selection of main factors might also affect the quality of models built afterwards: short of main factors might cause the exclusion of some useful information, which was called “under-fitting”. The more main factors were selected, the fewer discrete residual factors acceptable for the model would be, however, too many main factors might also cause “over-fitting”. As is proved, it was the optimal case to select 4 main factors.

To make the data of each sample more intuitive in primary composite analysis, the data were shown in the form of interaction diagrams which could show the distribution areas of the samples in the 2D plots of each primary composite directly. The results of interactive dimension building between each two among the primary composites are displayed as the following plots:

As

As seen in

As

Primary Composites | PC1 | PC2 | PC3 | PC4 |
---|---|---|---|---|

Cumulative % | 49.52 | 70.49 | 83.43 | 91.17 |

Five models, i.e. the Decision Tree, Naive Bayes, Neural Network, Partial Least Square Discriminate, and Support Vector Machine, were built with the Agilent MPP software by the method of PCA with four primary composites as new variables. The validation of the five models was performed by randomly selecting 11 samples of Raohe honey and 5 of nonlocal honey, and the results of forecasting were shown in

As

As

Sample | Forecasting result | Confident measure |
---|---|---|

HFY19 | Raohe | 1.0 |

RF11 | Raohe | 1.0 |

Nonlocal 17 | Nonlocal | 1.0 |

Nonlocal 1 | Raohe | 1.0 |

RF15 | Raohe | 1.0 |

RF1 | Raohe | 1.0 |

LYG5 | Raohe | 1.0 |

HFY4 | Raohe | 1.0 |

HFY2 | Raohe | 1.0 |

Nonlocal 30 | Nonlocal | 1.0 |

Nonlocal 5 | Nonlocal | 1.0 |

Nonlocal 31 | Nonlocal | 1.0 |

HFY20 | Raohe | 1.0 |

HFY21 | Raohe | 1.0 |

YX15 | Raohe | 1.0 |

HFY3 | Raohe | 1.0 |

Sample | Forecasting result | Confident measurer |
---|---|---|

RF11 | Nonlocal | 0.93 |

HFY19 | Raohe | 0.99 |

Nonlocal 17 | Nonlocal | 0.99 |

Nonlocal 1 | Nonlocal | 0.74 |

RF15 | Raohe | 0.99 |

RF1 | Raohe | 0.99 |

LYG5 | Raohe | 0.99 |

HFY2 | Raohe | 0.99 |

Nonlocal 30 | Nonlocal | 1.0 |

Nonlocal 5 | Nonlocal | 0.69 |

Nonlocal 31 | Nonlocal | 1.0 |

HFY20 | Raohe | 0.99 |

HFY21 | Raohe | 0.99 |

YX15 | Raohe | 1.0 |

HFY3 | Raohe | 0.99 |

HFY4 | Nonlocal | 0.99 |

As presented in

As

Sample | Forecasting result | Confident measurer |
---|---|---|

RF11 | Nonlocal | 0.93 |

HFY19 | Raohe | 0.99 |

Nonlocal 17 | Nonlocal | 0.99 |

Nonlocal 1 | Raohe | 0.99 |

RF15 | Raohe | 0.99 |

RF1 | Raohe | 0.99 |

LYG5 | Raohe | 0.99 |

HFY2 | Raohe | 0.99 |

Nonlocal 30 | Nonlocal | 0.99 |

Nonlocal 5 | Nonlocal | 0.91 |

Nonlocal 31 | Nonlocal | 0.99 |

HFY20 | Raohe | 0.95 |

HFY21 | Raohe | 0.98 |

YX15 | Raohe | 0.99 |

HFY3 | Raohe | 0.99 |

HFY4 | Raohe | 0.97 |

Sample | Forecasting result | Confident measurer |
---|---|---|

RF11 | Raohe | 1.0 |

HFY19 | Raohe | 1.0 |

Nonlocal 17 | Nonlocal | 1.0 |

Nonlocal 1 | Raohe | 1.0 |

RF15 | Raohe | 1.0 |

RF1 | Raohe | 1.0 |

LYG5 | Raohe | 1.0 |

HFY2 | Nonlocal | 1.0 |

Nonlocal 30 | Nonlocal | 1.0 |

Nonlocal 5 | Nonlocal | 1.0 |

Nonlocal 31 | Nonlocal | 1.0 |

HFY20 | Nonlocal | 1.0 |

HFY21 | Nonlocal | 1.0 |

YX15 | Raohe | 1.0 |

HFY3 | Raohe | 1.0 |

HFY4 | Raohe | 1.0 |

analysis, the Confident Measurer of the model reached 1.0. As the model merely preliminarily conducted multivariate regression analysis on the primary composites, there were four misjudgments: nonlocal 1, HFY2, HFY20, and HFY21; and the total forecasting accuracy, 75%, was lower than those of the other models.

As

In observation of the forecasting results provided by the five models (Tables 2-6 and Figures 5-10), the Decision Tree, Neural Network, Partial Least Square Discriminate and Support Vector Machine misjudged the

Sample | Forecasting result | Confident measurer |
---|---|---|

RF11 | Raohe | 0.97 |

HFY19 | Raohe | 0.44 |

Nonlocal 17 | Nonlocal | 0.37 |

Nonlocal 1 | Raohe | 0.73 |

RF15 | Raohe | 0.83 |

RF1 | Raohe | 0.84 |

LYG5 | Raohe | 0.57 |

HFY2 | Raohe | 0.09 |

Nonlocal 30 | Nonlocal | 0.49 |

Nonlocal 5 | Nonlocal | 0.05 |

Nonlocal 31 | Nonlocal | 0.86 |

HFY20 | Raohe | 0.09 |

HFY21 | Raohe | 0.09 |

YX15 | Raohe | 0.88 |

HFY3 | Raohe | 1.0 |

HFY4 | Raohe | 0.01 |

nonlocal sample 1. According to the labels, the sample was linden honey from Guangxi. However, since Guangxi linden honey was mainly from the northeastern of China, the sample might be Raohe honey. Therefore, the accurate rates of the forecasting of Decision Tree and Support Vector Machine could reach 100%. As the interaction diagram between Component 1 and Component 2 in the Naive Bayes model of HFY4 (

Seeing that the Component 3 and Component 4 had lower contribution rates and were not clearly distinguished in interaction diagrams, the interaction diagrams among the Component 1, Component 2, Component 3, and Component 4 were omitted.

The geological sourcing of Raohe honey was feasible with the method of ICP-MS in combination with PCA and forecasting models. The decision Tree and Support Vector Machine models could accurately distinguish non-Raohe honey from Raohe honey. For the Naive Bayes model, the accurate rate of forecasting could be raised by independent modeling on the Sr, Pb or B isotope. The accurate rate of the forecasting of Neural Network could be improved by changing the number of primary composites and the ranges of the parameters. And the preliminary forecasting on samples could be accomplished on the foundation of the Partial Least Squares Discrimination model.

My deepest gratitude goes first and foremost to Mr. MA Zhanfeng, the project leader, for his patient and careful guidance. And I would also like to express my heartfelt gratitude to Mr. XU Yao for his great efforts.