Estimate of Heavy Metals in Soil with Non-Soil Removed

Quantifying and mapping heavy metals’ concentrations in the soil are important in monitoring and managing heavy metal pollution in the mining areas. However, the cover on the soil acts as a balk when retrieving information from soil. In order to retrieve heavy metal pollution precisely and quickly from hyperspectral images, this study presents a new method to remove non-soil information based NDVI from hyper-spectral and multi-spectral images. The method assumed that the mixed objects in each pixel of remote sensing images are composed only of soil and vegetation-based non-soil end-generational endmembers, then, the soil information of each pixel can be compensated with the non-soil information removed based on its NDVI. Thus, the soil DN value can be corrected to retrieve soil information more precisely. The method has been used on the Hyperion image in June 8, 2002 and the Gaofen-2 (GF-2) image in February 14, 2016 to retrieve the heavy metals’ contents in Bai-ma and De-sheng mining areas, Miyi County, Sichuan Province. From the non-soil information removed images, the R and RMSE of the models of estimating Cr, Ag, Cu and Ba in soil are 0.68, 0.724, 0.71, 0.695 and 75.96, 0.03, 52.88, 284.70 respectively. From the original images, the R and RMSE of the models of estimating Cr, Ag, Cu and Ba in soil are 0.67, 0.385, 0.425, 0.406 and 80.11, 0.18, 53.43, 396.49 respectively. The retrieval results show that the non-soil information removed images are superior to original images in soil heavy metals’ contents retrieval. This indicates that this method is feasible, and it can be used in soil information retrieval.


Introduction
The pollution of heavy metal in soil has aroused extensive attention in the min-How to cite this paper: Jian, J., Fang, Y., Li, W.-L., Chen, Q.-Y., Tian, H.-Y. and You, S.-L.(2017) Estimate of Heavy Metals ing area.The main reason is that the soil contaminated by heavy metals make a great influence on the local citizens and a great threat to the ecological system [1] [2] [3].However, the traditional method of estimating the heavy metals by collecting sampling points and analyzing in the laboratory is time-consuming and expensive.Therefore, new methods are needed for monitoring and managing heavy metal pollution in the mining areas.With the improvement in remote sensing technology, the new method of establishing inversion models with the heavy metals' concentrations and the feature spectra from hyperspectral and high spatial resolution images are popular to monitor heavy metal in soil [4] [5] [6] [7].Many studies have shown that the Visible and Near-infrared spectra of soil obtained from remote sensing images were effectively used for retrieving real-time heavy metals' concentrations in large mining areas [3] [8] [9].
However, the spectral mix of the pixel is present on both high spatial resolution image and hyperspectral image.This will make a great influence on the results [10] [11].The reasons are mainly: 1) the pixels usually contain a variety of ground information for the limited spatial resolution of the sensor; and 2) the pixels are of complexity in composition, organization and structure.Even though an image pixel size is smaller than the objects under study, the spectral features could still be mixed with those of adjacent objects when the pixels are on the edge of the objects [12] [13].Therefore, the spectral unmixing methods are widely studied in the past decade.Broadly, the spectral unmixing methods can be categorized as linear mixture model (LMM) and nonlinear mixture model (NLMM) according to the mathematical formulation.However, NLMM is used not as widely as the LMM for its complexity and difficulty in obtaining parameters [10] [14].The LMM is often used to map the urban impervious surface and the target materials with the high spatial resolution images, identify the exposed soil, and map the heavy metals' contents from remote sensing images [15] [16] [17] [18] [19].And, most of the existing unmixing algorithms are based on the LMM, which assumes that the spectrum as a linear combination of pure spectra of all elements within a pixel and assumes that no significant multiple scattering exists among different elements [20], such as non-local sparse unmixing (NLSU), blind spectral unmixing method based on sparse component analysis (BSUSCA), Structured Sparse regularized Nonnegative Matrix Factorization (SS-NMF) and so on [21] [22] [23].However, these studies are of low accuracy, the primary reason is that the non-soil information has not been removed effectively for that the vegetation and the soil have strong correlations when processing.
In this paper, a new method for removing non-soil information based on NDVI is proposed.The method tends to obtain a non-soil information removed image from original image by compensating the missed soil information.Then, the compensated soil information can be used to retrieve and map the heavy metals' concentrations more precisely.The highlights of this paper are as follows: Journal of Data Analysis and Information Processing 1) The non-soil information removal method supposes that there are only two endmembers: soil and non-soil in original images.This can avoid the influence of the correlation between the non-soils and lower the influence of non-soil information on the soil information extracting.
2) The non-soil information can be maximum remove by calculating the maximum value of the product of DN and the non-soil abundance.
3) The missed soil information caused by atmospheric and NDVI can be compensated to make sure the precision of the soil information be better.

Study Area and Data
The mining areas of Bai-ma and De-sheng located in the northeast of Miyi County are selected as the study area (Figure 1).Miyi is located in the north of the Panzhihua city, the southwest of Sichuan Province.Bai-ma and De-sheng are the most important mining areas in Miyi Country.The study area is rich in different kinds of mineral resources, such as vanadium-titanium magnetite ore, coal, limestone, dolomite, refractory clay and so on.The waste water containing heavy metals such as Ag and Cu from the mining areas can pollute the local soil.
The spectra of 16 soil samples were collected at about 5 cm below the soil surface in June 2015; the locations of the sample points are shown in Figure 1.The contents of Cr, Ag, Ba and Cu of the 16 soil samples had been chemically analyzed by conventional digestion methods using Inductively Coupled Plasma Mass Spectrometer (ICP-MS) (Table 1).The ICP-MS is the most popular ion source in analytical chemistry for elemental mass spectrometry.In ICP-MS a mass spectrometer is coupled to an ICP torch by an interface including sampler and skimmer cones so that representative samples of the plasma can be transmitted through its orifices to the mass analyzer through its orifices to effectively eliminate atomic ions Interference, reduce the detection limit of susceptible elements [24].
The spectra of soil samples were obtained from a high spectral resolution ASD Fieldspec III spectroradiometer.Before taking any observation, the spectroradiometer was calibrated with white spectrum in order to minimize the effect of change in sun illumination.The spectral range of this instrument is from 350 nm to 2500 nm.In the range of 350 -1000 nm, the spectral resolution is 3 nm with 1.4 nm of sampling interval, and in the range of 1000 -2500 nm, the spatial resolution is 10 nm with 2 nm of sampling interval.Each sample was measured three times and the average value was calculated afterwards for the feature spectra selection.
Hyperion image in June 8, 2002 and GF-2 image in February 14, 2016 are col-Slected as the remote sensing data source.The spectral range of Hyperion image is from 400 nm to 2500 nm, which is divided into 242 bands.The spectral range of GF-2 image is from 450 nm to 890 nm, which is divided into 4 bands.Soil sample's spectrum of images was obtained by using the position of sample's row

Data Pre-Processing
In general, most heavy metals would not be expected to produce spectral absorption in the visible and near-infrared regions.However, heavy metal ions can be adsorbed by inorganic and organic matter, therefore, the prediction of heavy metals' content can be performed indirectly by visible and near infrared [3] [25].
The field soil spectral measurements were pre-processed using the View SpecPro to enhance the spectral features in the mining area.The pre-processing of field spectra includes smoothing and averaging.
The digital remotely sensed image may contain noise or error that was introduced by the sensor system (e.g., electronic noise) or the environment (e.g., atmospheric scattering of light into the sensor's field of view) [26].This can result in radiation and geometric distortion to the image [27].Therefore, image pre-processing is required to correct the geometric and radiometric distortion in the original image of the study area.In this paper, five main pre-processing steps have been taken to correct Hyperion image including bad lines removal, uncalibrated and water vapor absorption bands removal, streaks removal, smile effect correction and atmospheric correction, two main pre-processing steps have been taken to correct the GF-2 image including radiation correction and atmospheric correction.

Non-Soil Information Removal Method Based on NDVI
NDVI is an effective index for detecting above-ground vegetation conditions, for that seasonal and inter-annual changes in vegetation growth and activity can be monitored, and the ratio reduces many forms of multiplicative noise (Sun illumination differences, cloud shadows, some atmospheric attenuation, some topographic variations) present in multiple bands of multiple-data image [26] [28].NDVI is defined as: where NIR and R are the DN values of the near-infrared and red region respectively.In this paper, band53, band30 and band4, band3 are selected as the near-infrared and red bands to calculate the NDVI of Hyperion and GF-2 images respectively.
Generally, the greater the amount of healthy green vegetation, the greater the NDVI value.However, the NDVI has a negative value when the land covered with clouds, snow or water, and the vegetation and soil are the main elements of the pixels.Thus, it is assumed that the mixed objects in each pixel are composed of only soil and vegetation-based non-soil end-generational endmembers (e.g.vegetation, water, cloud and snow.The influence of the terrain and the soil texture are not taken into account.).Then, the absolute value of NDVI can be used to calculate the non-soil "pure" pixel radiation and soil abundance quickly and efficiently, which can be seen as the non-soil abundance.And the non-soil DN value can be calculated.Then, the soil DN value can be calculated by removing non-soil information according to the principle of physics to reduce the influence caused by the strong correlation of vegetation and soil.Finally, the soil DN value will be corrected by using the probability principle to compensate the missed soil information during non-soil removal.
Thus, the total abundance of a pixel is considered to be 1, and the abundance calculation of a pixel is: where f non-soil and f soil are the abundance of non-soil and soil respectively, and the absolute value of NDVI of the pixel is taken as the value of f non-soil .
Next, the value of the non-soil "pure" pixel in a band (P i ) can be calculated as Equation ( 3): where the D i and f non-soil are the DN value and the abundance value of non-soil of a pixel at i band, P i is the maximum value of the product of f non-soil (i.e. the value can be used to maximize the removal of non-soil after testing.), and D i which is taken as the value of the non-soil "pure" pixel for i band.
Then, the missed soil information can be compensated with Pi as Equation ( 4): where n is the number of feature bands, D i is the DN value of a pixel at i band, D soil represents the total soil information of the pixel from all feature bands.
The flowchart of the method is shown as follows Figure 2.

Accuracy Assessment
Original and non-soil information removed images obtained by the above method are used to establish heavy metal retrieval regression models with the heavy metals contents in soil.Thus, the feasibility of the method of non-soil information removal can be verified by comparing the model accuracy from Original and non-soil information removed images.Generally, the coefficient of determination (R 2 ) is interpreted as the proportion of the variance in the dependent variable that is predictable from the independent variable; the Root mean square error (RMSE) is usually used to measure the deviation between the observed and predicted; and the model significance index is used to check if the linear statistical relationship exists between the response variable.Thus, according to relevant references, the R 2 , RMSE and significant index are usually selected to verify the model accuracy and the feasibility of the method [3] [5].In this paper, two methods are used to validate the performance of the non-soil information removal based on the NDVI: 1) Comparing the R 2 , the RMSE and the significance index of models from original and non-soil information removed images, the greater the R 2 , the higher correlation between the soil information and the heavy metal concentration.In the case of significant models, the larger the R 2 or the smaller the RMSE, the higher the accuracy of the model.The RMSE is defined as: ( ) where the P i and O i are the predicted and observed value of the heavy metals at the i band, and the n is the number of the soil samples.
2) The distribution of the retrieved heavy metals' concentration from non-soil information removed images is compared with the field data.A more similar distribution trends to the distributions of field data will lead a higher accuracy of the models and the extracted soil information.
The total flowchart to access the accuracy is shown as Figure 3.

Non-Soil Information Removed Results
After a detailed examination on the crest and trough of the spectrum curves of the samples for retrieving heavy metals, band11, band13, band44, band120, band198, band202, band203, band205, band208, band209, band210, band217, Figure 3. Verification of non-soil information removal model.band218, band222 and band223 are selected as the feature bands for Hyperion, and band1, band2 and band4 of GF-2 are selected as feature bands.The non-soil "pure" pixel DN value (Pi) of each feature band is calculated with Equations (1), ( 2) and (3), as listed in Table 2.
Then, with non-soil information removal method presented above, the soil information of Hyperion and GF-2 images can be calculated based on corresponding soil and non-soil abundances with Equation ( 4 where the D 11 , D 13 , D 44 and so on are the DN value of a pixel at 11, 13, 44 band and so on, respectively.
The pseudo color images from the results and the original data of Hyperion and GF-2 are shown in Figure 4 and Figure 5 respectively.From Figure 4

Geochemical Analysis
11 soil samples were selected to establish the inversion models randomly, and the remaining samples were used to verify these models.The heavy metals concentration (Cr, Ag, Cu and Ba) of the 11 soil samples are presented in Table 1.
From Table 1, we can see that the standard deviation (SD) of Cr, Cu and Ba in the soil were relatively high (135.36,70.55 and 473.26 respectively) except Ag, this indicates that their concentrations are of a little great difference in study area.

Heavy Metal Retrieval from Hyperion Images
The method of model building was described in another article.The parameters of inversion models of Cr and Ag from Hyperion are presented in Table 3.The D Hyper-soil and RMSE are calculated with Equations ( 5) and (6). Figure 6 and   Then, the other soil samples are used to verify the models by independent-samples T test.From Table 4, we can see that the P value are all less than 0.05.This indicates that the all models of Cr and Ag can be used to estimate the heavy metals concentration in the mining area.

Heavy Metal Retrieval from GF-2 Images
The parameters of inversion retrieval models of Cu and Ba from GF-2 are presented in Table 5.The D GF-2soil are calculate with Equation (7). Figure 8 and Figure 9 are the gradient maps of Cu and Ba from the calculated results.
Then, the other soil samples are used to verify the models by independent-samples T test.From Table 6, we can see that the P value are all less than

Validation of Non-Soil Information Removal Results
From Figure 4 and Figure 5, we can see that the original images have more vegetation-based non-soil information than the non-soil information removed images.This indicates that most of the vegetation-based non-soil information has been removed by the calculation of non-soil "pure" pixel and abundance.
From Table 3 and Table 5, we can see that the R 2 (Cr: 0.68, Ag: 0.724, Cu: 0.71 and Ba: 0.695) of inversion models (Cr, Ag, Cu and Ba) from non-soil information removed images are higher than that from original images (Cr: 0.667, Ag: 0.385, Cu: 0.425).The RMSE (Cr: 75.96,Ag: 0.03, Cu: 52.88 and Ba: 284.70)Journal of Data Analysis and Information Processing

Discussions
Many spectral unmixing methods have been applied to estimate heavy metals contents from hyperspectral and high spatial resolution images.However, these methods require complex endmembers selection and abundance calculation, which are time-consuming and do not consider the effect of vegetation on soil.
This paper presented a fast and convenient method based on NDVI to remove the vegetation-based non-soil information from the soil surface of the Hyperion and GF-2 images in the mining area.
Comparing the model parameters and the retrieved heavy metals concentration from the non-soil information and original images, from the non-soil information removed images, the relationship between the soil information and the heavy metals concentration are stronger and the deviation between the predicted and the observed value of the heavy metals are less than that from original images.Thus, compared with other unmixing methods, in this method, the multiple linear relationship between non-soil components can be reduced based on the NDVI, and the non-soil information can be maximum removed by the Pi.Then the missed soil information can be compensation by the probability theory.This step makes the obtained soil more accuracy and the relationship between the soil information and the heavy metal concentration stronger.
However, the retrieved heavy metals in soil with non-soil removed method based on NDVI may be influenced by the additive noise effects such as atmospheric path radiance, the scaling problems with saturated signals often encountered in high-biomass conditions and the canopy background variations.Thus, the method has some shortness need to be improved, such as the impact of cloud and snow, the selection of the optimal NDVI value and non-soil "pure" DN value, etc.Then, the non-soil information cannot be removed completely.We will try to address these shortcomings and make the non-soil information removed method can be used widely.

Conclusion
A non-soil information removed method based on NDVI from hyper-spectral and multi-spectral images was developed for retrieving heavy metal method in mining soil.The proposed approach assumes that the mixed objects in each pixel of remote sensing images are composed only of soil and vegetation-based non-soil end-generational endmembers, then, the soil information of each pixel

Figure 2 .
Figure 2. Non-soil information removal model based on NDVI.
Figure 5(a) and Figure 5(b) has a similar pattern.That is to say, most non-soil information of the images had been removed after this process.

Figure 4 .
Figure 4. Hyperion images ((a) original image, (b) non-soil information removed image.R: Band53, G: band30 and B: band21 are corresponding to the red, green, and blue channels respectively).

Figure 6 .
Figure 6.Cr's concentrations in the study area ((a) Retrieved from original image, (b) Retrieved from non-soil information removed image).

Figure 7
Figure 7 are the retrieved concentration gradient maps of Cr and Ag.From Table 3, we can see that the significance of Cr and Ag are less than 0.05.This indicates that the models of Cr and Ag established from the original and non-soil information removed Hyperion image are significant.

Figure 7 .
Figure 7. Ag's concentrations in the study area ((a) Retrieved from original image, (b) Retrieved from non-soil information removed image).

Figure 8 .
Figure 8. Cu's concentrations in the study area ((a) Retrieved from original image, (b) Retrieved from non-soil information removed image).

Figure 9 .
Figure 9. Ba's concentrations in the study area ((a) Retrieved from original image, (b) Retrieved from non-soil information removed image).

Table 1 .
Chemical analysis results of Cr, Ag, Cu and Ba (unit: mg/kg).
26Journal of Data Analysis and Information Processing and column at the images.

Table 2 .
The non-soil "pure" pixel DN value (P i ) of each feature band.

Table 3 .
Regression models of Cr and Ag.

Table 4 .
The reference of the T test of the Cr and Ag.

Table 5 .
Regression models of Cu and Ba.

Table 6 .
The reference of the T test of the Cu and Ba.Cr, Ag, Cu and Ba) from non-soil information removed images are closer to the field measurements than that from original images.These indicate that the soil information is more accurate by the methods of non-soil removing based on NDVI and missing soil compensation, and it has a stronger correlation with the heavy metals of the soil in the mining area.Thus, the method of non-soil information removal based on NDVI for obtaining soil is feasible.
models are not significant.While, the inversion models for Cr, Ag, Cu and Ba from non-soil information removed images are less than 0.032 and the models are significant.From Figures6-9, we can see that the retrieved heavy metal Journal of Data Analysis and Information Processing concentrate (