Accurate Imputation for Relative Humidity over Pakistan Gathered from AQUA Satellite

The relative humidity in the atmosphere captured by AQUA satellite contains missing matrices. In order to fill such missing values four very popular imputation techniques: Bilinear, Inverse Distance Weighting, Natural Neighbor and Nearest Interpolations were tested. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Coefficient of Determination (R2) and Correlation Coefficient (Corr), were used to check the accuracy of these interpolations. It was found that the Inverse Distance Weighting and Nearest Interpolation were proved not to be suited. Natural interpolation gave accurate results than the aforementioned two interpolations. Missing values of relative humidity were accurately refilled with Bilinear Interpolation. This interpolation produced RMSE of ±0.543 for relative humidity over 100, 150, 200, 250, 300, 400, 500 hPa while for 600, 700, 850 and 925 hPa RMSE remainnear to 1. A perfect fit to the surface and very strong correlation (value near to 0.99) was found between actual and imputed relative humidity data through Bilinear Interpolation. Therefore it was concluded that the Bilinear Interpolation is the most accurate and best imputation for missing values of relative humidity form 100 to 1000 hPa levels.


Introduction
Metrological data collected from satellites mostly contain gaps. Such gaps in data set occur due to less efficient sampling of satellite subsystem [1] [2]. Using missing data may produce misleading in purposed outcome of the research [3].
For the best use of relative humidity data, it is necessary to have best estimate of the gaps in satellite captured data with imputation techniques [1] [4] [5] [6]. Sa-How to cite this paper: Saleem, U., Akram, M.S., Ullah, M.F. and Rehman, F. leem [6] mentioned that for the filling of such data, it was mandatory to have the finest understanding of the spatial and temporal variations of climatic variables. Robeson [7]; Junninen, Niska [1]; Norazian [4]; Yozgatligil, Aslan [8]; Saleem [5] used average refilling of meteorological variables however this method of refilling lacks the integrity and quality in data set. Besides mean value imputations also other imputation methods have been produced for filling gaps in meteorological dataset [1] [9] [10].
According to Sun and Oort [11]; Cho, Newell [12]; McCarthy and Toumi [13]; Gettelman, Weinstock [14]; Dessler and Sherwood [15] water vapors were major contributor to cloud formation and greenhouse effect in the atmosphere of our globe and its variation in upper troposphere plays an important role in daily radiation budget of earth [11] [15] [16]. The valuable work on relative humidity in troposphere carried out by Lindzen [17]; Shine and Sinha [18]; Del Genio, Kovari [19]; Sun and Oort [11]; Peixoto and Oort [20]; Harries [21]; Gettelman, Collins [22]. The past practice to collect relative humidity was carried out through radiosonde which was not any accurate method [23] [24]. With the passage of time, the emerging technological development introduced artificial satellites as a platform to observe water vapors in the atmosphere. The first meteorological satellite was Mariner −2 Venus Probe, with the task to determine water content in the planet Venus [25]. After this successful experiment, next two satellites Cosmos 243 and 384 were lunched to measure relative humidity of the earth [26]. Now relative humidity data are being captured by a number of remote sensing satellites with high accuracy and precision [22] [27].
The relative humidity is defined as the relative amount of water vapors in the atmosphere as a percentage of the amount required for saturation at the same temperature. It varies quantitatively and qualitatively throughout the atmosphere. The relative humidity can change in the atmosphere by either changing the number of water vapors or by variation of temperature in the atmosphere [20].
Pakistan has latitudinal spread from the Arabian Sea in the South to the Himalayan Mountains in North with longitudinal extent between Afghanistan and India in West and East ( Figure 1 and Figure 2). Pakistanis located in the subtropic of the partially temperate region and is home of about 200 million people. Its large portion is facing climate change for many decades. Pakistan is an arid to semi-arid territory with changing in a meteorological variable like temperature, humidity, etc. [28]. It is noted that a large variation in rainfall pattern throughout the country with an average annual rainfall equals to 10 inches [5]. The Monsoon rain is only dominant hydro-meteorological resource, contributing to 59% of the annual rainfall [29]. Most of the Himalayan regions receive precipitation in the form of snow and ice in winter. The coastal climate is confined to a shrink belt along the coast and a rise in temperature from 0.60˚C to 1.00˚C has occurred since 1900 [30]. The coastal line of Pakistan faced four major cyclones during 1999-2010 [31]. Hottest months are May and June with average temperature of 51˚C, while in February winter is on peak with average temperature of 60˚C [30].   The actual thrust of this research work was to devise a workable methodology for carrying out scientific observation of upper atmospheric meteorology over Pakistan in spite of lack of modern equipment and technological resource. Upper meteorological observation and monitoring were also not available in Pakistan. However, Saleem [5]; Saleem [6] and Wazir [32] were a few dominant initiatives efforts on upper-level atmospheric observations.  [34]. The ground resolution of data is 45 Km 2 and grid size is 10 by 10 degree latitude and longitude [35].

Material and
The current study is carried out by using AIRS level 6 version 3, for monthly average relative humidity over 1000 to 100 hPa pressure levels. The studies on its captured data set have been carried out by using balloons, radio sounding, and aircraft observations. Divakarla, Barnet [36] and Tobin, Revercomb [37] highly recommended the checking of AIRS data in the lower troposphere.

Imputations of Missing Dataset in Relative Humidity
In order to produce the best estimations for this missing data 30% of the relative humidity was used to interpolate from 70% already known relative humidity

1) Inverse Distance Weight Interpolation (Idw)
It is the deterministic spatial interpolation which based on Tobbular' Law of geography [7]. Ferrari and Ozaki [38] used Equation (1) as given below: for inverse distance weighting x represents a missing sample of relative humidity, r i j S − was the weight factor for ( ) i RH x samples, t is the total number of relative humidity samples and r is the degree of the weighting factor. Algorithm of IDW developed by Langella [39] was used in this present research work.

3) Bilinear Interpolation [Bi]
BI refills the gaps in dataset with respect to the best fit linear line in a dataset.  The present study was, also carried out in line with the same standard procedure.

1) Root Mean Square Error [RMSE]
Norazian [4] used Equation (3) for RMSE as given below: In Equation (3) t was the total number of samples [1]. RMSE gives the difference between original and imputed relative humidity sample and low value of it will show accurate refilling of relative humidity [41].

2) Mean Absolute Error (MAE)
Junninen, Niska [1]; Norazian [4] wrote Equation (4) for MAE as given in the following: Precise refilling of dataset will be based on MAE value near to 0.

3) Correlation Coefficient (Corr)
It's value of +1 shows a very good correlation and the good replacement of missing data. Very bad imputation will occur when Corr has value near to 0. is the product of standard deviations.

4) Coefficient of Determination (R 2 )
It tells us about the degree of correlation in the dataset [2]. Its value closed to 1 indicates a perfect fit to the surface. [Norazian [4]] used Equation (5) for R 2 as given below:

Results
The imputation over each pressure level was determined and the results are presented below.

Bilinear Interpolation (BI)
Performance parameter reveals that refilling of relative humidity at 100, 150,

Natural Neighbor Interpolation (NNI)
This interpolation technique sit best for refilling of relative humidity for 100,

Discussions
The scatter plots were adapted in order to identify the perfect and accurate interpolation form Natural and Bilinear interpolations. Good results for refilling of relative humidity were found for 100, 150, 200, 250, 300 and 400 hPa through NNI (Figure 3(a)). However, NNI not able to accurately refill the missing data of relative humidity over 600, 700, 850, 925 and 1000 hPa pressure levels (Figure 3(b)).
Filling of gaps in data with BI seem good for 100, 150, 200, 250, 300 and 400 hPa levels in every month of years (Figure 4(a)).

Conclusion
Based on the critical check and evaluation of interpolations regarding their product it concluded that the Bilinear Interpolation was the best and accurate for all pressure levels while Natural Neighbor Interpolation proved to be the second best interpolation to substitute missing relative humidity of 100 to 1000 hPa.