Prediction of Chemical Composition of Ancient Glass Relics before Weathering

Ancient glass relics are easily weathered by the influence of buried environment, and the internal elements exchange with the environmental elements in large quantities, resulting in changes in their composition ratio. Archaeological research can often detect the component content of glass relics after wea-thering, but it is difficult to obtain the corresponding component content before weathering. It is necessary to predict the chemical composition of glass relics before weathering in order to accurately identify the type of glass relics and repair them. To solve this problem, this paper proposes a distributed matching strategy, and studies the influence of weathering on the composition content of glass through compositional correlation analysis and linear regression statistical methods, so as to build a prediction model of the composition content of glass relics before weathering. The results show that the composition prediction model of glass cultural relics constructed by the distribution matching strategy has a good prediction ability, which is consistent with the change trend of the composition ratio of linear regression analysis. Moreover, the model is simple and easy to operate, which is convenient for popularization and application, and provides theoretical basis and reference value for further research on the composition and accurate classification of glass cultural relics.


Introduction
As a kind of precious vessel, glass is defined as the precious material evidence of How to cite this paper: Sun, J.H., Chen, H.Z., Liu, Y., Lin, H.Q., Zheng, H.W. and Qiu, Y.Z.(2023) Prediction of Chemical Composition of Ancient Glass Relics be-the early trade between China and the West in ancient times [1].The main raw material of glass is quartz sand, and its chemical composition is SiO 2 .The difference in the flux added when making glass will lead to the difference in the main chemical composition of the final glass.In ancient China, there were mainly two types of lead barium glass and high potassium glass.The content of PbO and BaO in lead-barium glass will be increased by adding the flux lead ore in the firing process.High potassium glass is made from plant ash or other substances with high potassium content as a flux [2].
Ancient glass products have been buried for a long time, which is highly susceptible to the influence of its surrounding environment.When the internal elements of glass exchange a large amount of chemical elements in the surrounding environment, weathering occurs, which will lead to changes in its chemical composition [3].In archaeological research, the unearthed glass relics are all in the state after weathering, so it is difficult to accurately judge the chemical composition content before weathering, which seriously affects the type identification of glass relics.In order to accurately identify the type of glass relics and then carry out restoration work, it is necessary to measure/predict the chemical composition content of glass relics before weathering.
It is a hot topic to identify the type of glass cultural relics by quantitative analysis of chemical composition [4] [5].In the early period, scholars did some research on the method of determining the chemical composition of glass relics.
Gan Fuxi et al. [6] analyzed the chemical composition of glass beads by proton excited X-ray fluorescence and inductively coupled plasma emission spectroscopy; Li Qinghui et al. [7] [8] proposed a method of proton induced X-ray emission (PIXE) for the determination of the chemical composition of ancient glass.
In recent years, with the development of information technology, neural networks, multi-layer perceptrons, decision trees, random forests, feature selection and other machine learning methods have also been applied to the research on the type identification of glass relics.Cao Yuxuan et al. [9] used GA-BP neural network to study the relationship between the internal chemical composition of different types of ancient glass and their own weathering degree, and established models for predicting the types of ancient glass and measuring the weathering degree; Shi Baoming et al. [10] established a multi-layer perceptron network model to predict the categories of ancient glass products; Lv Fei et al. [11] established decision tree model and random forest model to identify the types of ancient glass products.In addition, there are few quantitative studies on the prediction of various chemical components of glass cultural relics before weathering.Zou Ying [12] established a multivariate time series model to predict the contents of various chemical components of glass cultural relics before weathering.
Different from the methods of other scholars, this paper combined statistical correlation analysis and linear regression method to propose a strategy of data distribution matching.Assuming that there are two populations with normal distribution before and after glass weathering, the mean value and standard deviation of these two populations are used to build a prediction model for the contents of various chemical components of glass relics before weathering.
In order to better demonstrate the research idea of this paper, Figure 1 is used to show the general framework of this paper.
At the same time, the symbolic descriptions are shown in Table 1.

Data Source and Processing
According to the chemical composition of glass cultural relics and other detection methods, archaeologists collected a number of Chinese ancient glass cultural relics related data (2022 National College students Mathematical Competition in Modeling C), including the classification information of glass cultural relics and the proportion of the corresponding main components.However, due to the detection methods and other reasons, there are missing data or zero data, and the composition ratio of most samples is not accurate to 100%.Therefore, we first carry out necessary pre-processing of the data.

Missing Value and Zero Value Data Correction
The missing value is mainly generated because the content of one or several elements in the sample is very low and does not reach the detection limit, or the absence of artificial data entry cannot be excluded.There are three commonly used data missing value processing methods: 1) Directly delete the sample or variable containing the missing value; 2) Assign an arbitrary value below the detection limit to the missing value; The missing value is estimated based on the component analysis of the associated sample (e.g., likelihood estimation).In view of the fact that the original data is the component data, the missing values are too numerous and too scattered, and it is not suitable to directly delete or directly assign arbitrary values below the detection limit.Therefore, we use the component analysis based on the associated samples to estimate the missing values.Specifically, the number of missing components is calculated for each sample, and the missing data is filled by the expected maximum likelihood estimation method combined with the ideal condition of 100% of the proportion of each component.The absence of an element in the sample or the amount of an element that does not reach the detection limit may result in a zero value in the recorded data.Considering the small number of zero values of the original data in the sample and the scattered distribution, we choose to modify the zero values recorded in the original data to a small positive number (0.001), which is convenient for subsequent data processing and modeling calculation.

Correction of Composition Data
The component data is non-negative, and the sum of each individual element content in sample is 1 [13], that is ( ) . Therefore, if the cumulative sum of component data is biased, the correlation analysis results between elements will be biased, which will further affect the covariance and correlation matrix between components [14].
The direct data studied in this paper is the content proportion of each chemical component, that is, the component data, which should theoretically meet the constraint of cumulative sum of 1.However, due to the accumulation of the proportion of each chemical component and non-100% deviation caused by the detection means, it is necessary to correct the deviation.If the cumulative sample component content of the original data is between 85% and 105%, it is considered that the fixed sum deviation of the data is not large, and the sample data is valid and can be corrected.If the component content of the original data accumulates and exceeds the range of 85% to 105%, the sample is considered invalid data and cannot be corrected.For valid sample data that can be corrected, the sum is fixed according to formula (1): where ( ) represents the original data of the indicator j of the sample i, ij x′ represents the converted data.After conversion, the proportion of each component of each sample is 100%.
The descriptive statistical results of the data after modification are shown in Table 2.It can be seen from Table 1 that the skewness coefficients of component indicators in the sample are all greater than 0, indicating that there are more data on the right side of the mean value.The skewness coefficient of SiO 2 is close to 0, indicating that the data of SiO 2 is close to symmetric distribution.The kurtosis coefficient of 2 component indices in the sample is less than 0, indicating that the distribution is flatter at the top or thinner at the tail than the normal distribution.The kurtosis coefficients of the remaining components are greater than 0, indicating that the distribution is sharper at the top or thicker at the tail than the normal distribution.In general, the content of most components in the data does not conform to the normal distribution.

Ratio Transformation of Component Data
Since the ratio of component data variables is not restricted by the "constant sum" restriction, and the logarithm of the ratio usually follows the characteristics of normal distribution, this paper adopts the methods of additive log-ratio transformation (ALR) and central log-ratio transformation (CLR) [15], so that the component data presents the situation of multivariate normal distribution after transformation.Considering that the vector after the additive log-ratio transform (ALR) cannot correspond to the original vector one-to-one, the central log-ratio transform [16] is adopted: , , , , 0 , and , 1, 2, , .
This transformation is called the central log-ratio transformation (CLR).
The corresponding inverse transformation is: The transformed component corresponds to the original component one by one, and further enhances the interpretability of the variable.In addition, the central log-ratio transformation can make the data more stable, reduce the influence of outliers, and reflect the real situation of the data more accurately.
Use , , , 0 , the data transformed by CLR is: ( )  According to formula (4), the data after logarithm transformation of the content center of high potassium and lead barium glass can be calculated.In view of the rationality of the hypothesis of normal distribution of data, it is necessary to test whether the data distribution pattern conforms to normal distribution.There are many testing methods for normal distribution, including Shapiro-Wilk test, Kolmogrov-Smirnov test, skewness test and kurtosis test [17].In this paper, the normal distribution of data is tested by skewness and kurtosis test.In statistics, skewness is used to describe symmetry [18], and kurtosis represents the characteristic number of the height of kurtosis at the mean of the probability density distribution curve.The skewness and kurtosis of the data can be calculated through the software.Limited by space, only the skewness and kurtosis of the lead-barium glass data are shown (Table 3).When the skewness and kurtosis values are closer to 0, the data tends to be more normally distributed.In order to intuitively understand the influence of the center log-ratio transformation on the normal distribution of data, the original data frequency histogram and the data frequency histogram after the center log-ratio transformation were made for the data in this paper (Figure 2 and Figure 3).Limited by space, only some data frequency histograms of lead-barium glass are shown.
As can be seen from Table 3 and Figure 2 and Figure 3, the skewness and kurtosis of the composition content data of lead-barium glass are high, so most of the composition content in the original data does not conform to the normal distribution.After the central logarithmic ratio transformation, skewness and kurtosis of most component content data are reduced, and the processed data tend to be more normal distribution.Open Journal of Applied Sciences

Model Design
It is assumed that the glass relics are weathered under natural conditions and the chemical composition is measured accurately.According to the data characteristics of chemical components of glass relics, Pearson correlation coefficient was used to study the correlation of chemical components of glass relics, and the correlation degree and difference between different types of glass relics and their chemical components were discussed.Secondly, linear regression method was used to discuss the influence of weathering state on the content of chemical components of glass relics.At last, the prediction model of the chemical composition contents of glass relics before weathering was constructed by the idea of distribution matching.

Correlation Analysis of Chemical Components of Glass Relics
The "closure effect" caused by the constant value of sum will lead to the deviation of the analysis results of the correlation between variables.Therefore, the Pearson correlation coefficient between the variables of high potassium and lead barium glass is calculated using the data after the transformation of the central logarithm ratio.The calculation formula is as follows: where, , i i x y respectively represents the chemical composition variables of glass relics, , x y respectively represents the average value of the two variables, and n is the sample size.The closer the calculation result is to 1, the stronger the positive correlation between the two variables is, and the closer the result is, the stronger the negative correlation is.According to the calculation results of equation (5), we obtained the heat map of the correlation coefficient of high-potassium and lead-barium glasses, and the results are shown in Figure 4 and Figure 5.The results show that there is a high correlation between the glass type and the proportion of some chemical components, and the correlation between different types of glass and their chemical components is different.For example, for high potassium glass, SiO

Analysis of the Relationship between Weathering State and the Content of Chemical Components of Glass Relics
Since the weathering state is divided into weathering and unweathering, which belong to categorical variables, and the chemical composition content of glass relics belongs to continuous variables, linear regression analysis can be carried out by setting dummy variables.With no weathering as a reference variable and weathering state as a dummy variable, linear regression models of weathering state of high-potassium glass and lead-barium glass on the content of each chemical component were established according to the data after CLR transformation.The results are shown in Table 4.
The results showed that weathering had different effects on the contents of chemical components of different glass types.1) Weathering can cause significant changes in the proportion of SiO 2 and Na 2 O components of high potassium glass, p value is 0.000 (<0.05), and can cause changes in the proportion of CuO components to a certain extent.According to the linear model primary term coefficient, surface weathering causes the proportion of SiO 2 and CuO components to increase, and the proportion of Na 2 O components to decrease significantly.2) Weathering can cause significant changes in the proportions of SiO 2 , Na 2 O, K 2 O, CaO, Al 2 O 3 , PbO and P 2 O 5 of lead-barium glass to a certain extent, and the p value is less than 0.05.According to the linear model primary term coefficient, the surface weathering causes the proportion of SiO 2 , Na 2 O, K 2 O and Al 2 O 3 to decrease, while the proportion of CaO, PbO and P 2 O 5 to increase.

Construction of a Prediction Model for the Contents of Each Chemical Component of Glass Relics before Weathering
Due to the lack of detection data before and after weathering, the problem of predicting the chemical composition content of glass before weathering from the detection data of weathering points and unweathered cultural relics needs to analyze the statistical laws of weathering and unweathered populations, and make prediction through the relationship between the two populations.Therefore, the idea of distributed matching can be used for prediction.According to the fact that the ratio of the component data variables is not restricted by the "fixed sum" and the logarithm of the ratio usually follows the characteristics of normal distribution, it is assumed that the data after the transformation of the central logarithm  5 and Table 6.
In order to intuitively understand the change trend of the predicted data, the calculation results are visualized as shown in Figure 6 and Figure 7.As can be seen from Figure 6, weathering will cause the proportion of SiO 2 , CuO and P 2 O 5

Conclusion
In this paper, a distributed matching prediction model is proposed.Firstly, the correlation between the chemical components of the glass relics is analyzed by using Pearson correlation coefficient, and the difference of the correlation between the chemical components of different types of glass relics is compared.
The correlation between the variables of different glass types is quite different.

Figure 2 .
Figure 2. Histogram of the frequency of Al2O3, CuO and BaO of the unweathered lead barium population.
2 is positively correlated with K 2 O, CaO, MgO, Al 2 O 3 , CuO, P 2 O 5 and SnO 2 , and negatively correlated with Na 2 O and SrO 2 .SiO 2 has a very significant positive correlation with Al 2 O 3 and a very significant negative correlation with Na 2 O.For lead-barium glass, SiO 2 has a very significant positive correlation with Na 2 O and Al 2 O 3 , and a very significant negative correlation with P 2 O 5 and SrO.It can also be concluded that for high potassium glass, SiO 2 has a positive correlation with P 2 O 5 , and a positive correlation with Na 2 O, while the lead barium glass is the opposite.Therefore, the correlation between variables of different glass types is quite different.

Figure 4 .
Figure 4. Heat map of correlation coefficient of each chemic-al composition of high potassium glass.Note: In the Figure 4, red represents the positive correlation, blue represents the negative correlation, the size of the circle represents the absolute value of the correlation coefficient, and the asterisk on the circle represents the existence of significance, one asterisk means significant, and two asterisks mean very significant.

Figure 5 .
Figure 5. Heat map of correlation coefficient of chemical composition of lead barium glass.Note: In the Figure 5, red represents the positive correlation, blue represents the negative correlation, the size of the circle represents the absolute value of the correlation coefficient, and the asterisk on the circle represents the existence of significance, one asterisk means significant, and two asterisks mean very significant.
follows the normal distribution.Considering two populations of weathering and unweathering, namely weathered glass population σ , the data after the transformation of the central logarithm ratio were classified and processed to obtain four types of data, namely, high potassium weathering(6), high potassium unweathering(14), lead barium weathering (26) and lead barium unweathering (23).Here, the detected data at its unweathered point in the weathering sample is regarded as unweathered, such as 49 unweathered points in the weathering sample of lead barium is regarded as unweathered category.The mean and standard deviation of each chemical component content of weathered and unweathered glass relics were analyzed respectively.According to the weathered data ( ) CLR X , the pre-weathering data of high potassium and lead barium glass were calculated by the formula transformation formula (3) of the central log-ratio transformation is then used to restore the unweathered component data (the result is multiplied by 100).The predicted contents of each chemical component of high-potassium glass and lead-barium glass before weathering were calculated, as shown in Table

Figure 6 .
Figure 6.Visualization of composition prediction of high-potassium glass before weathering (part).

Figure 7 .
Figure 7. Visualization of composition prediction of lead-barium glass before weathering (part).
x µMean of population X

Table 2 .
Descriptive statistics of data after modification.

Table 3 .
Test table of skewness kurtosis of lead-barium glass.

Table 4 .
Linear regression results of high potassium and lead barium glasses.

Table 5 .
Prediction data of high-potassium glass before weathering., and the proportion of Na 2 O, CaO and K 2 O components to decrease.As can be seen from Figure7, weathering causes the proportion of SiO 2 and Al 2 O 3 in lead-barium glass to decrease, while the proportion of CuO, CaO and PbO increases, while BaO does not change significantly before and after weathering.It is highly consistent with the above linear regression analysis results of weathering state and chemical composition, indicating that the prediction model constructed has a good prediction effect.

Table 6 .
Prediction data of lead-barium glass before weathering..42817.577 9.970 0.678 0.077 0.008 0.008 Open Journal of Applied Sciences For high potassium glass, SiO 2 is positively correlated with K 2 O, CaO, MgO, Al 2 O 3 , CuO, P 2 O 5 and SnO 2 , and negatively correlated with Na 2 O and SrO 2 .For lead-barium glass, SiO 2 is positively correlated with Na 2 O and Al 2 O 3 , and negatively correlated with P 2 O 5 and SrO.It can also be concluded that for high potassium glass, SiO 2 has a positive correlation with P 2 O 5 , and a negative correlation with Na 2 O, while the lead barium glass is the opposite.Secondly, the influence of weathering state on the content of various chemical components of different glass types was investigated by linear regression analysis method.The degree of influence of weathering on the contents of chemical components of different glass types is different.Weathering can increase the proportion of SiO 2 , CuO and P 2 O 5 , and decrease the proportion of Na 2 O, CaO and K 2 O.At the same time, weathering can cause the proportion of SiO 2 and Al 2 O 3 to decrease, while the