Accurate Imputation for Relative Humidity over Pakistan Gathered from AQUA Satellite ()
1. Introduction
Metrological data collected from satellites mostly contain gaps. Such gaps in data set occur due to less efficient sampling of satellite subsystem [1] [2] . Using missing data may produce misleading in purposed outcome of the research [3] . For the best use of relative humidity data, it is necessary to have best estimate of the gaps in satellite captured data with imputation techniques [1] [4] [5] [6] . Saleem [6] mentioned that for the filling of such data, it was mandatory to have the finest understanding of the spatial and temporal variations of climatic variables. Robeson [7] ; Junninen, Niska [1] ; Norazian [4] ; Yozgatligil, Aslan [8] ; Saleem [5] used average refilling of meteorological variables however this method of refilling lacks the integrity and quality in data set. Besides mean value imputations also other imputation methods have been produced for filling gaps in meteorological dataset [1] [9] [10] .
According to Sun and Oort [11] ; Cho, Newell [12] ; McCarthy and Toumi [13] ; Gettelman, Weinstock [14] ; Dessler and Sherwood [15] water vapors were major contributor to cloud formation and greenhouse effect in the atmosphere of our globe and its variation in upper troposphere plays an important role in daily radiation budget of earth [11] [15] [16] . The valuable work on relative humidity in troposphere carried out by Lindzen [17] ; Shine and Sinha [18] ; Del Genio, Kovari [19] ; Sun and Oort [11] ; Peixoto and Oort [20] ; Harries [21] ; Gettelman, Collins [22] . The past practice to collect relative humidity was carried out through radiosonde which was not any accurate method [23] [24] . With the passage of time, the emerging technological development introduced artificial satellites as a platform to observe water vapors in the atmosphere. The first meteorological satellite was Mariner −2 Venus Probe, with the task to determine water content in the planet Venus [25] . After this successful experiment, next two satellites Cosmos 243 and 384 were lunched to measure relative humidity of the earth [26] . Now relative humidity data are being captured by a number of remote sensing satellites with high accuracy and precision [22] [27] .
The relative humidity is defined as the relative amount of water vapors in the atmosphere as a percentage of the amount required for saturation at the same temperature. It varies quantitatively and qualitatively throughout the atmosphere. The relative humidity can change in the atmosphere by either changing the number of water vapors or by variation of temperature in the atmosphere [20] .
Pakistan has latitudinal spread from the Arabian Sea in the South to the Himalayan Mountains in North with longitudinal extent between Afghanistan and India in West and East (Figure 1 and Figure 2). Pakistanis located in the subtropic of the partially temperate region and is home of about 200 million people. Its large portion is facing climate change for many decades. Pakistan is an arid to semi-arid territory with changing in a meteorological variable like temperature, humidity, etc. [28] . It is noted that a large variation in rainfall pattern throughout the country with an average annual rainfall equals to 10 inches [5] . The Monsoon rain is only dominant hydro-meteorological resource, contributing to 59% of the annual rainfall [29] . Most of the Himalayan regions receive precipitation in the form of snow and ice in winter. The coastal climate is confined to a shrink belt along the coast and a rise in temperature from 0.60˚C to 1.00˚C has occurred since 1900 [30] . The coastal line of Pakistan faced four major cyclones during 1999-2010 [31] . Hottest months are May and June with average temperature of 51˚C, while in February winter is on peak with average temperature of 60˚C [30] .
Figure 1. Pakistan and it’s host regions location map [5] .
Figure 2. Altitude map of Pakistan with elevations in meters [5] .
The actual thrust of this research work was to devise a workable methodology for carrying out scientific observation of upper atmospheric meteorology over Pakistan in spite of lack of modern equipment and technological resource. Upper meteorological observation and monitoring were also not available in Pakistan. However, Saleem [5] ; Saleem [6] and Wazir [32] were a few dominant initiatives efforts on upper-level atmospheric observations.
2. Material and Methods
2.1. Data Used
AQUA satellite capturing water content from September 2002 to present and AIRS [Atmospheric Infrared Sounder] is a sensor which is mounted on it [22] [33] . AIRS operated in IR [infrared] and MW [microwaves] and it has nearly 2400 bands in thermal and visible regions. AIRS can also operate in 70% cloud fraction [22] [34] . The ground resolution of data is 45 Km2 and grid size is 10 by 10 degree latitude and longitude [35] .
The current study is carried out by using AIRS level 6 version 3, for monthly average relative humidity over 1000 to 100 hPa pressure levels. The studies on its captured data set have been carried out by using balloons, radio sounding, and aircraft observations. Divakarla, Barnet [36] and Tobin, Revercomb [37] highly recommended the checking of AIRS data in the lower troposphere.
2.2. Imputations of Missing Dataset in Relative Humidity
In order to produce the best estimations for this missing data 30% of the relative humidity was used to interpolate from 70% already known relative humidity samples. Mean Absolute Error [AME], Root Mean Square Error [RMSE], Coefficient of Determinations [R2] Correlation Coefficient [Corr] used as performance indicators in this research.
1) Inverse Distance Weight Interpolation (Idw)
It is the deterministic spatial interpolation which based on Tobbular’ Law of geography [7] . Ferrari and Ozaki [38] used Equation (1) as given below: for inverse distance weighting
(1)
where
represents a missing sample of relative humidity,
was the weight factor for
samples, t is the total number of relative humidity samples and r is the degree of the weighting factor. Algorithm of IDW developed by Langella [39] was used in this present research work.
2) Nearest Neighbor Interpolation [Nni]
NNI interpolation replaces gaps in dataset with nearest sample value [38] [40] .
3) Bilinear Interpolation [Bi]
BI refills the gaps in dataset with respect to the best fit linear line in a dataset. Junninen, Niska [1] used Equation (2) for linear interpolation as given below:
(2)
Equation (2) was the simple linear line equation, having
and
sample points with m as their gradient.
4) Natural Interpolation (Ni)
In this method, the missing sample gets value from its natural neighbor and Delaunay triangulation will be used to select natural neighbors sample around the missing value [41] .
2.3. Performance Indicators for Each Interpolation
Robeson [7] ; Price, McKenney [42] ; Junninen, Niska [1] ; Perry and Hollis [43] ; Norazian [4] ; Hofstra, Haylock [44] ; Rahman and Islam [2] ; Ferrari and Ozaki [38] ; Saleem and Ahmed [34] have frequently used, Absolute Mean Error [AME], Root Mean Square Error [RMSE], Coefficient of Determination [R2] and Correction Coefficient [Corr] as performance predictor for these interpolations. The present study was, also carried out in line with the same standard procedure.
1) Root Mean Square Error [RMSE]
Norazian [4] used Equation (3) for RMSE as given below:
(3)
In Equation (3) t was the total number of samples [1] . RMSE gives the difference between original and imputed relative humidity sample and low value of it will show accurate refilling of relative humidity [41] .
2) Mean Absolute Error (MAE)
Junninen, Niska [1] ; Norazian [4] wrote Equation (4) for MAE as given in the following:
(4)
Precise refilling of dataset will be based on MAE value near to 0.
3) Correlation Coefficient (Corr)
It’s value of +1 shows a very good correlation and the good replacement of missing data. Very bad imputation will occur when Corr has value near to 0. Fisher [45] ; Kendall [46] used the following equation this formula for Corr:
(5)
cov [RHpi, RHoi] represents the covariance of RHpi, RHoi while
is the product of standard deviations.
4) Coefficient of Determination (R2)
It tells us about the degree of correlation in the dataset [2] . Its value closed to 1 indicates a perfect fit to the surface. [Norazian [4] ] used Equation (5) for R2 as given below:
(6)
where
was the average value of imputed samples and
is mean of observed samples.
3. Results
The imputation over each pressure level was determined and the results are presented below.
3.1. Inverse Distance Weighting (IDW)
This interpolation technique showed good performance indicators for refilling of relative humidity for 200, 250, 300, 400, and 500 hPa levels (Table 1).
3.2. Bilinear Interpolation (BI)
Performance parameter reveals that refilling of relative humidity at 100, 150, 200, 250, 300, 400 and 500 hPa was accurate and perfect with BI. Besides, for the remaining pressure levels: 600, 700, 850, and 925 hPa, the results were also very accurate and perfect. A strong correlation [0.995] and R2 close to 1, indicating very good imputation of relative humidity for these pressure levels in the atmosphere (Table 2).
3.3. Natural Neighbor Interpolation (NNI)
This interpolation technique sit best for refilling of relative humidity for 100, 150, 200, 250, 300, 400 hPa with less than ±0.5 RMSE value. The refilling of relative humidity for other pressure levels: 500, 600, 700, 850, 925 hPa also show very good results i.e., RMSE values remain close to ±1 with MAE 0.339 along with very strong correlation [0.985]. This interpolation technique show poor refilling of relative humidity data set at 1000 hPa level (Table 3).
3.4. Nearest Neighbors Interpolation (NI)
This interpolation technique showed perfect and accurate refilling of dataset for 150, 200, 250, 300 and 400 hPa levels. This interpolation proved not to be a very accurate one for remaining pressure levels: 100, 500, 600, 700, 850, 925 and 1000 hPa (Table 4).
Table 1. Inverse distance weighting interpolation out come and its performance indicators.
Table 2. Bilinear Interpolation out come and its performance indicators.
Table 3. Natural Neighbor Interpolation out come and its performance indicators.
Table 4. Nearest Neighbor Interpolation out come and its performance indicators.
Figure 3. (a) Natural Neighbors Interpolation for imputation of relative humidity [100 hPa to 400 hPa]; (b) Natural Neighbors Interpolation for imputation of relative humidity [500 hPa to 1000 hPa].
Figure 4. (a) Bilinear Interpolation for imputation of relative humidity [100 hPa to 400 hPa]; (b) Bilinear Interpolation for imputation of relative humidity [500 hPa to 1000 hPa].
4. Discussions
The scatter plots were adapted in order to identify the perfect and accurate interpolation form Natural and Bilinear interpolations. Good results for refilling of relative humidity were found for 100, 150, 200, 250, 300 and 400 hPa through NNI (Figure 3(a)).
However, NNI not able to accurately refill the missing data of relative humidity over 600, 700, 850, 925 and 1000 hPa pressure levels (Figure 3(b)).
Filling of gaps in data with BI seem good for 100, 150, 200, 250, 300 and 400 hPa levels in every month of years (Figure 4(a)).
Besides for remaining pressure levels: 500, 600, 700, 850, 925 and 1000 hPa BI suit to best for refilling (Figure 4(b)).
5. Conclusion
Based on the critical check and evaluation of interpolations regarding their product it concluded that the Bilinear Interpolation was the best and accurate for all pressure levels while Natural Neighbor Interpolation proved to be the second best interpolation to substitute missing relative humidity of 100 to 1000 hPa.
Acknowledgements
We very appreciative to AUQA-AIRS team for their assistance to interpolate AIRS data set. The authors additionally wish to recognize Mr. Alessio Martion, University of the Rome, LaSapienza Italy for his important recommendations to enhance this research.