Classification of Hourly Clearness Index of Solar Radiation in the District of Yamoussoukro

The exploitation of systems using solar energy as a source of energy is not fluctuations free because of short passage of clouds on solar radiation. The amplitude, the persistence and the frequency of these fluctuations should be analyzed with appropriate tools, instead of focusing on their location over time. The analysis of these fluctuations should use the instantaneous clearness index whose distribution is given as a first approximation which is independent not only of the season but also of the site. It is important to evaluate the potential solar energy in a region. Indeed such evaluation helps the decision-makers in their reflections on agricultural or photovoltaic solar projects. Then this study was conducted for a predictive purpose. The method used in our work combines the classification method which is the hierarchical ascending classification and two partitioning methods, the principal component analysis and the K-means method. The partitioning method enabled to achieve a number of well-known situations (in advance) that are representative of the day. The study was based on the data of a climatic weather station in the district of Yamoussoukro located in the center region of Côte d’Ivoire during the 2017 year. Using the clearness index, the study allowed the classification of the solar radiation in the region. Thus, it showed that only 346 days of the 365 days in 2017 were classified (95%). We identified three clusters of days, the cloudy sky (29%), the partly cloudy sky (32%) and the clear sky (39%). The statistical tests used for the characterization of these clusters will be detailed in a future study.


Introduction
A previous study on the climatology of the solar radiation carried out in 2011 [1] in Côte d'Ivoire showed the importance of knowing the solar radiation arriving on the ground to better understand the performance of solar energy systems.
However, the solar radiation measured on the ground is not fluctuations free because of short passage of clouds on solar radiation [2]. This intermittency of solar radiation is one of the major disadvantages of the large-scale application of solar photovoltaic (PV) and other forms of solar energy. Thus, short fluctuations in solar irradiance can lead to unpredictable variations in voltage and power if the electricity produced is to be injected into the electrical network [3]. In addition, in order to provide adequate measures to reinforce the network over time while avoiding overly cautious and costly measures, network operators need tools for a realistic estimation of these disturbances [4].
In solar energy applications, the analysis of these fluctuations should focus on instantaneous or hourly clearness index [5]. The probability distribution for an average clearness index was achieved as a first approximation which is independent not only of the season but also of the site. Therefore, for effective dimensioning of energy conversion systems and for a predictive purpose, it appears important to characterize the solar energy resource. This characterization is justified by the fact that hourly and daily solar irradiation data do not take into account the fluctuations of local weather conditions. For this, it is necessary to classify the different states of the sky based on the clearness index. The clearness index as a classification criterion has been described in previous studies [6] [7], namely by the use of the fractal dimension and a mixture of Dirichlet distributions. The method used in our work combines several classification methods as the factorial method (PCA) having an exploratory purpose and reducing data, the supervised classification method (HCA) and the partionning method (Kmeans). Generally, the authors used a single classification method [8] [9]. In our work, the combination of several methods allowed to classify with high accuracy the different states of sky. This article presents the results of the classification of hourly clearness index of solar radiation obtained by measurements made in 2017. The data are from one of the weather-climate observation network stations and solar monitoring in Côte d'Ivoire (ROSSCI) located in Yamoussoukro.

Constitution of Database
The data of the weather-climate station are recorded in steps of one minute 24h/24h during the 365 days [1] of the year 2017. Eight (08) climatic weather parameters (Temperature, Relative humidity, Barometric pressure, Rainfall, j is the day number in the year from 1 to 365 in a calendar year; • solar height h [13]: • the latitude θ = 6.8692˚; • the declination δ of the place expressed in degrees: ( ) 360 284 23.45 sin 365 • the hour angle ω expressed in degrees: where LST is the local solar time and TE, the time equation. LST is given by [14]: φ is the longitude of the place, φ = −5.2396 ˚, and UT is the universal time expressed in hours.
TE is given by [15]: where 81 360 365 The calculation is developed in an algorithm in MATLAB software and the results are arranged as arrays.
Knowledge of hourly global solar radiation (IGH Wh/m 2 ) and hourly global extraterrestrial radiation (IGH 0 Wh/m 2 ) allows to determine the clearness index (K t ) which is defined as the ratio between arriving IGH on the earth's surface and IGH 0 following formula [16] [17]:

Methods
Since the accuracy of the measurements is fine and that the number n observation is large (365 × 12 = 4380), it may happen that the number of distinct values observed are relatively high. The observed distribution univariate (DO1) that flows from this: • A large number of lines in the staffing table; • Many low amplitude workforces.
This situation does not allow to easily identify the essential characteristics of the observed distribution. One solution to this problem is to adopt a more comprehensive approach to data by carrying a group of the latter, that is to say by bringing to- applying a hierarchical clustering using euclidean distances and 4) consolidation of clusters by the k-means method for better partition. The k-means method requires to know first knowledge of the number of classes to be determined [18].
In order to know this number, we used previously the PCA to reduce the number of variables and to visualize as much as possible on a plane the observations described by several variables. As the PCA does not distinguish enough the class number, we applied the ascending hierarchical classification using Ward's method. This procedure, presented in Figure 2, is implemented in the FactomineR Figure 2. Classification process.

Principal Component Analysis 1) Eigenvalues and percentage of variance
The PCA helped to highlight the affinities between different hours (variables) and to deduct distributions of clearness index profiles during the year. The first three components express 81.51% of the total variance, with 45.23% for the first factor, 30.00% for the second factor and 6.27% for the third factor. The analysis is restricted to the first two factors, the eigenvalues greater than unity, say more than 75% of the initial variance. Table 1 shows the values, the percentage of variance explained and that of the cumulative variance each axis.

2) Main variables correlated-component
The correlation matrix (Table 2) shows that the axis I is very well correlated positively to clearness index profiles from 7 am to 12 pm (Kt7h to Kt12h) and negatively with the profiles of 3 pm to 5 pm (Kt15h to Kt17h). The axis II by cons, has very good positive correlation with the clearness index profiles from 11 am to 5 pm (Kt11h to Kt17h) ( Table 2).

3) Results of variables and days graphs (PCA)
The correlation circle shows that the axis I positively door profiles from 7 am to 1 pm and negatively profiles from 2 pm to 5 pm, this reflects a contrast between these profiles and analyzing the averages of each profile, we see that the profiles of 7 am to 1 pm have relatively higher average than those from 2 pm to 5 pm. However, the axis 2 carries positively all clearness index profiles except 7 am profile (Figure 3(a)).
The plan of statistical units (days), highlights the spatial distributions of individuals (days) appears to favor three (3) groups. (Figure 3(b)).

Results of Hierarchical Clustering Analysis (HCA)
The hierarchical clustering analysis (HCA), by calculation of euclidean distances between individuals provides a dendrogram accompanied by inertial gains graph ( Figure 4). From this dendrogram with the cutoff level proposed by the FactoMineR  package, there are three clusters of individuals close to each other (Figure 4(a)): • class (or cluster) 1 Black • class (or cluster) 2 red • class (or cluster) 3 green Thus, class 1 is more distant from class 2 which is more distant from class 3.
This dendrogram is then projected in 3D on the factorial axes of the PCA for a better visualization. In this graph (Figure 4(b)), individuals (days) are colored according to their membership in each class.

Consolidation of Clusters by K-Means
This procedure led to the production of a map of days based on their membership in each cluster. This map is shown in the factorial axes ( Figure 5). On this map we distinguish three groups divided into functions of the similarities of each individual. Thus, the individuals present in cluster 1, in black, possess sufficiently similar characteristics to be considered as a single individual and thus forming a single group. It is the same for cluster 2, in red and 3 in green.

Discussion
The results of calculations showed asymmetric data values of clearness index.
Indeed, this asymmetry indicates that the data may not be normally distributed Also, the atmosphere, the distribution of extraterrestrial radiation is not homogeneous because the earth orbits around the sun and that the inclination of the equatorial plane relative to the orbital plane varies with latitude and seasons [12]. The PCA results showed three groups of days but does not distinguish accurately the intrinsic characteristics of the different combinations. Indeed, the importance of this step (PCA) is to rigorously perform and display projections in orthogonal planes [21].
The hierarchical clustering provided three clusters with the level of the proposed cuts dendrogram. The dendrogram resulting from this classification allows to examine successive aggregations of all individuals and visualize the connections between them [21]. Looking closely at the 3D projection of the dendrogram on the dimensions of the PCA, we see that the cluster 2 and 3 contain tiny parts of the cluster 1. This classification noise is due to the hierarchy between individuals.
The projection clusters on PCA's size allows us to accurately distinguish the membership of individuals (days) for each class. In fact the K-means method disregarding the individuals level of aggregation to better visualize and emphasize the similarity between individuals [12].

Conclusions
This study has classified solar radiation in the district of Yamoussoukro in 2017 with the hourly clearness index. In this study, this index was established as the