Lithological Mapping Using Landsat 8 OLI in the Meso-Cenozoic Tarfaya Laayoune Basin (South of Morocco): Comparison between ANN and SID Classification

In the Saharian domain, the Tarfaya-Laayoune coastal basin developed in a stable passive margin, where asymmetrical sedimentation increase from East to West and reach a sediment stack of about 14 kilometers. However, the morphology of the studied area corresponds to a vast plateau (hamada) presenting occasional major reliefs. For this purpose, remote sensing approach has been applied to find the best approaches for truthful lithological mapping. The two supervised classification methods by machine learning (Artificial Neural Network and Spectral Information Divergence) have been evaluated for a most accurate classification to be used for our lithofacies mapping. The latest geological maps and RGB images were used for pseudo-color groups to identify important areas and collect the ROIs that will serve as facilities samples for the classifications. The results obtained showed a clear distinction between the various formation units, and very close results to the field reality in the ANN classification of the studied area. Thus, the ANN method is more accurate with an overall accuracy of 92.56% and a Kappa coefficient is 0.9143.


Introduction
Orbital technology is continuously improving advanced levels which are useful for lithologic mapping as well as mineral exploration [1]- [13].
In another sense, this work aims to classify lithofacies by processing Landsat 8 multispectral data. Artificial Neural Network (ANN) and Spectral Information Divergence (SID) were used for classification while the spectra of the regions of interest (ROI) of the image were used as end members. For better precision, The ANN and SID classifiers were compared, and a choice was made for the classifier to be used as a support for geological mapping. Finally, we are guided to make a comparison between the two classification methods to choose the most efficient for a good presentation of the lithological cartography of the area.

Geological Settings
In Morocco, the Saharan domain presents morphological, climatic and hydrological peculiarities. This area is characterized by arid conditions, precipitation less than 54 mm/year, an annual average temperature that varies between 17˚C on the coast and 25˚C elsewhere and 3239. 81   The Meso-Cenozoic Tarfaya-Laayoune-Dakhla basin, a relatively tectonically stable platform since the Jurassic, is located to the NW of the Archean-Proterozoic ridge of Réguibate "NW of the West African craton" (Figure 1). Its sedimentary filling is asymmetrical, increasing from east to west, reaching a stack of about 14 kilometers [38]. The studied basin is a consequence of the opening of the central Atlantic [39]. This basin summarizes the geological history from the Trias to the Neogene at the northern borders of the West African Craton.
This basin was initially studied during oil exploration [40]. In general, the thickness of the Mesozoic series is greater than in the basins located north of the High Atlas. In the Laayoune basin, they reach over 10,000 m. The structural framework of the Layoune-Tarfaya basins, sketched by Heyman (1988), is a set of hemigrabens developed from NNW dipping listric faults, connected at a depth of about 16 km on a detachment fault.
To the west, the Paleozoic series are folded and form the Appalachian reliefs of the Zemmur, a little more exposed to the winds and rains of the Atlantic. In the north of the country Mechem, the Hamada Tindouf (or Draa) from a vast flat entablature, consisting mainly of limestones, which dominates the Paleozoic lands. The few wadis are directed to the center of the basin of Tindouf. To the west of the Reguibate and Zemmour extends the Atlantic plain of Tarfaya, Laayoune, Boujdour and Dakhla, whose substratum, mainly carbonated, is rich in phosphate deposits.
On the structural level, the Tarfaya-Laayoune basin boundary with the Anti-Atlas is rectilinear in a North Northeast-South Southouest direction [41]. In the South, this limit is aligned with the major accident North Northeast-South Southwest of Zemmour and two major structural directorates control the Tarfaya-Laayoune basin and its hinterland [42]: 1) East Northeast-West Southwest to North East-South West: the Atlas direction limits the Precambrian-Paleozoic to the hinterland of the Tarfaya-Laayoune Basin; this direction is materialized by the axis of the Tindouf Paleozoic basin, the Anti-Atlas chain, the Reguibat Precambrian Massive Middle Axis, the South-Atlas Fault (East Northeast-West Southwest) and the Agadir Fault (N45˚) [38] [43].
2) North Northeast-South Southwest: the Meso-Cenozoic Atlantic direction coincides with the general elongation of the Tarfaya-Laayoune basin and the direction of the Zemmour fault, inherited from the Hercynian cycle [44].
3) A third N90˚ direction, intersecting with the coast, is apparent from the analysis of the bathymetric and gravity maps of the northwestern African margin. This family of transverse accidents segments the margin in low zones and high zones which control, from the Triassic, the distribution of the evaporite deposits [45].

Data Characteristics
The Landsat 8 satellite rotates in a sun-synchronous, quasi-polar orbit at an alti-

Methodology
In our study, we used as data sources the geological maps of Laayoune area (1:100.000) ( Figure 2) and the correspondent Landsat 8 satellite image.
As shown in the flowchart (Figure 3), we begin the study using data sources (geological map and satellite image) and a software package for atmospheric and radiometric correction, pre-processing and processing.
Then, the pre-processing step requires the stacking of the data sources necessary for the stacking of the layers, the atmospheric correction and the radiometric correction of the corrected satellite image. It is an important step in remote sensing which aims to obtain a physical parameter independent of lighting conditions and even atmospheric conditions, which allows us to use images from different eras to detect changes.
A certain number of "radiometric noises" can be present in the image due either to deficiencies of the sensors, or to problems of data transmission, or finally of interpretation (coding/decoding). Basically, these radiometric corrections are carried out directly on the image reception by reassigning codes corresponding to neighboring pixels or to defective points.   On the other hand, the sensors installed on board Earth observation satellites operating in the spectral range of solar emission (wavelengths of 0.24 µm) are radiometers which measure the luminance reflected by the earth + atmosphere assembly lit by the sun. In a non-cloudy atmosphere, the radiometric signal depends on the reflectance of the earth's surface but also on the effects of the atmosphere that occur during the two paths (descending, from the Sun to the surface, and ascending, from the surface to the sensor) effected by solar radiation through the atmosphere. The simple calibration of the sensor data, in luminance (absolute values measured in) or in reflectance's (relative values), therefore does not provide information on the surface but a composite signal. the objective of this correction is to extract this signal provides information independent of the Open Journal of Geology effects of the atmosphere, which vary in time and space, and only concerning the terrestrial surface, which is the object to be studied.

Optimal Index Factor (OIF)
The Optimal Index Factor (OIF) is a statistical value that can be used to select the optimal combination of three bands in a satellite image with which you want to create a color composite developed by Chavez et al. (1984). It ranks the 20 three-band combinations that can be made from six TM data bands (not counting the thermal infrared band) and the optimal band combination out of all the possible 3-band combinations is the one with the greatest amount of d. '"Information" (highest sum of standard deviations), with the least duplication (lowest correlation between pairs of bands). This technique is valid for any multispectral remote sensing dataset. It is based on the amount of total variance and correlation within and between various combinations of bands [48]. The algorithm used to figure the OIF for any subset of three bands is: where:

Minimum Noise Fraction (MNF)
Minimum noise fraction (MNF) is a powerful technique for denoising remote sensing data. This is a preparatory transformation which condenses most of the components into a few spectral bands and to classify these bands in decreasing orders of interest [49] [50]. Thus, the lower band numbers contain more spectral information and present great interest in lithological mapping [49] [51] [52] [53]. Particularly, the MNF uses a noise covariance matrix to decorrelate and resize the noise in the data and then run a pca to convert the noise-bleached data [54]. In this way, the noise is evenly distributed between the bands, which allows the method to achieve good discrimination of spectral characteristics [55].

End Members Extraction
The identification of lithological units and the classification of images using spatial remote sensing appear problematic because few images pixels display "pure" spectra [64]. Thus, the previous analyzes relating to PCA and MNF have a preponderant function for the determination of endmembers (Training Samples).
Indeed, the use of Endmembers is mainly a critical step to classify the input data in the classification algorithm used [65] [66].
This approach requires a set of input endmembers that represent "pure" spectra of representative lithologies in the studied zone, where these endmembers can be extracted directly from imagery, so that all relevant materials are included that represent true surfaces and are under the same viewing and lighting conditions as all spectra in the scene. However, to limit the spectral similarity between endmembers, we used the separability of the ROI pairs to select only the most spectrally distinct. In this work, the representative endmember image spectra were derived for regions of interest (ROI) defined mainly from the existing geological map [34].

Artificial Neural Network (ANN)
ANNs are computer systems inspired by the neural networks that make up the human brain. Its concept is that they "learn" to do tasks from examples, generally without being programmed with task-specific rules [67] because it manages to find a solution in a non-algorithmic and unstructured way based on the adjustment of the weights connected to the network neurons [68].
For each neuron, the input value is calculated as follows [69] [70]: In each neuron, the value calculated from the Equation (3) is transferred by an activation function. The common function for this purpose is the sigmoid function, and is given by: The ANN algorithm has been widely used for the classification of several sets of remote sensor data and has produced very powerful results (Jensen, 2015) compared to traditional classification methods [71] [72] [73] [74] [75]. Thus, the ANN is able to perform calculations in order to acquire, represent and calculate a map of a multivariate information space from an initial data set [76].
Image classification using ANN is performed by extracting texture features and then applying the back propagation algorithm. The typical back propagation network is characterized by input and output layers and at least one hidden one.
Theoretically, there is no limit on the number of hidden layers, but there is usually only one or two. Neural Network technique uses standard back propagation for supervised learning which activates by adjusting weights in the node to minimize the difference between the activation of the exit node and the exit [77].

Spectral Information Divergence (SID)
SID is a spectral classification method that considers each pixel as a random variable and uses its spectral histogram to define a probability distribution [70]. ( ) D x y  is called the relative entropy of y with respect to x which is also known as Kullack-Leibler information function, directed divergence or cross entropy. The SID defined by Equation (5) can be used to measure the spectral similarity between two pixel vectors x, and x 2 .
The spectral similarity between two pixels is then measured by the difference in probabilistic behaviors between their spectra; the smaller the divergence, the more likely the pixels are to be similar [78]. Regarding methodology, the Endmember spectra used by SID can come from ASCII files or spectral libraries, or you can extract them directly from an image (as average ROI spectra) [79]; this algorithm improves the precision of the estimation of Endmember used in the optimal classification [80].

Results
The morphology of the studied area corresponds to a vast plateau (hamada) not presenting major reliefs, except for depressions (known as sebkhas) which does not rarely show most Meso-Cenozoic formations (mainly along oueds and edges of sebkhas).
Obviously, the use of remote sensing data is very valuable for the study of geological features especially in large areas with little or no in-situ data. In order to obtain better results, the Landsat 8 OLI data has been radiometrically and geometrically corrected. This pre-processing step is essential to obtain spatial and radiometric corrected images to delineate the geological units in the study area. Then, the image used requires more enhancement through the following methods (OIF, MNF and PCA).

Enhancement Images
First, the Optimal Index Factor (OIF) is a statistical value which help to find the optimal combination of three bands in a satellite image in order to create a color composite [10] [81]. Thus, Figure 4 shows that the band composition 6-7-8 of Landsat 8 data, represent the most optimal RGB color combination image that will be chosen for geological mapping in the study area according to the results of the OIF. Indeed, this RGB color combination represents the combination of the most information (as measured by variance) with the least duplication (as measured by correlation) and thus exhibits a noticeable color variation corresponding to large geological formations.
The MNF technique allows to segregate noise, feature extraction, spectral data reduction but also to better discriminate formation still confused at this stage.    The PCA transform was applied to Landsat 8 OLI ( Figure 6) data in order to produce uncorrelated output bands, segregate noise components, and reduce the dimensionality of the studied data. Table 3 summarizes the most influential bands in the PCA transformation, measured by eigenvectors.
The obtained results expect that the first PC represents the highest percentage of variance or eigenvalue and the last PC represents lower variance; this is computed by the presence of noise in the original spectral bands of the Landsat 8 data.
Thus, more colorful color composite images are produced using PCA instead of the spectral color composite images because the spectral bands are not correlated ( Figure 6 shows PCA bands 3, 1 and 2 in RGB for the study area). Thus, the structural information is well presented in the Landsat 8 image and is ready for lithological classification.
Before starting the lithological classification step, the extraction of endmembers helps to be near of the surface reality as much as possible. In this work, we chose first to study several typical cross-sections (along Oued Saquia Hamra and Sebkha Oum Debaa) and the geological map at a scale 1/100,000 [34]; at the same level, a visual interpretation on the enhanced images supported by the spectral profile of the pixels from the best OIF image.
On the other hand, to attenuate the spectral similarity between endmembers, we used the separability of the ROI pairs to select only the most spectrally distinct. In this study, and after several attempts to collect the ROIs covering the same facies on all the enhanced images (PCA and MNF), we opted for a separability threshold of 2 so that the ROI pairs are acceptable.

SID
Due to the lithological diversity down to the pixel, this type of classification, based on outcrop spectra, was not very satisfactory. Figure 7 shows the resulting image. Indeed, examination of the image reveals large unclassified areas and a large disparity with the geological map.    Quantitatively, the validation test confirms these remarks and findings and during which the accuracy assessment is 49.6144% and the Kappa coefficient does not exceed 0.4398. Probably, the high lithological diversity was not very adequate because of the pixel resolution of the used image.

ANN
Unlike the SID classification, the supervised classification ANN has presented very adequate results ( Figure 8) and an almost perfect correlation with the geological map of Laayoune where the formation was effortless recognized.
Indeed, the dune formation is well identified by this classification as well as the bituminous formations of Upper Cretaceous age which constitute minor thickness and have been plotted along the Oued Saquia Hamra and the edges of Sebkha Oum Debaa. Also, the Cenozoic formations show good similitude with the geological map: the hammada slab (q5), the Miocene sandstone formations (Mla1, Mla2…). Also, it should be noted that this classification differentiates the water zones from the alluvial surfaces.
The performance of these results is tested by the accuracy assessment. This assessment shows that the classification displays a precision equivalent to 92.56% and a Kappa coefficient of 0.9143.

Discussion
In order to specify which lithological classification is the most precise between the ANN and SID classifications, we used the landsat 8 OLI image which has been subjected to atmospheric and radiometric corrections. After the computation of the OIF, we obtain the highest combination of RGB bands (b6, b7, b8) which will perform a good presentation of the image data. Later, we calculated the PCA which is used to reduce the dimensionality of the satellite image and the MNF to separate the noise in the data, and to reduce the computational requirements for further processing.
The creation of ROIs is done, from the geological map of Laayoune, to supervise over the different facies in order to use it in the two classifications, then we calculate the parability which has values close to 2 that it is principal that we are correctly on the best way in our study.

Conclusions
The present study was carried around the Meso-Cenozoic Tarfaya-Laayoune basin (Saharian domain, Morocco). The studied area represents an arid region which corresponds to a vast plateau (hamada) not presenting major reliefs, except for depressions (known as sebkhas) which does not rarely show most Meso-Cenozoic formations (mainly along oueds and edges of sebkhas).
In this work, we have examined two methods of classification (ANN and SID) and compared them with the current map using an entire image processing process starting with atmospheric and radiometric correction, the georeferencing of the image and the geological map then performing the OIF, PCA, MNF calculations and creating the ROIs on which the separability calculation was performed.
After the creation of classifications input, the results were confirmed with the calculation of the confusion matrix and the Kappa index, which inform that the ANN classification is the most accurate method with an overall accuracy percentage of 92.56% and kappa index of 0.9143.
To conclude, the remote sensing processing on Landsat 8 OLI data can allow better precision in the lithofacies classification, and it gives geologists a very good opportunity to improve their investigations in areas of difficult access. This will be a major contribution to the spectral analysis based on the study. Indeed, with a good improvement of these images by the multiple methods currently available and an adapted classification (ANN for our case) one can easily draw up a very precise lithofacies map.