^{1}

^{1}

^{2}

^{2}

^{*}

The relative humidity in the atmosphere captured by AQUA satellite contains missing matrices. In order to fill such missing values four very popular imputation techniques: Bilinear, Inverse Distance Weighting, Natural Neighbor and Nearest Interpolations were tested. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Coefficient of Determination (R
^{2}) and Correlation Coefficient (Corr), were used to check the accuracy of these interpolations. It was found that the Inverse Distance Weighting and Nearest Interpolation were proved not to be suited. Natural interpolation gave accurate results than the aforementioned two interpolations. Missing values of relative humidity were accurately refilled with Bilinear Interpolation. This interpolation produced RMSE of ±0.543 for relative humidity over 100, 150, 200, 250, 300, 400, 500 hPa while for 600, 700, 850 and 925 hPa RMSE remainnear to 1. A perfect fit to the surface and very strong correlation (value near to 0.99) was found between actual and imputed relative humidity data through Bilinear Interpolation. Therefore it was concluded that the Bilinear Interpolation is the most accurate and best imputation for missing values of relative humidity form 100 to 1000 hPa levels.

Metrological data collected from satellites mostly contain gaps. Such gaps in data set occur due to less efficient sampling of satellite subsystem [

According to Sun and Oort [

The relative humidity is defined as the relative amount of water vapors in the atmosphere as a percentage of the amount required for saturation at the same temperature. It varies quantitatively and qualitatively throughout the atmosphere. The relative humidity can change in the atmosphere by either changing the number of water vapors or by variation of temperature in the atmosphere [

Pakistan has latitudinal spread from the Arabian Sea in the South to the Himalayan Mountains in North with longitudinal extent between Afghanistan and India in West and East (

The actual thrust of this research work was to devise a workable methodology for carrying out scientific observation of upper atmospheric meteorology over Pakistan in spite of lack of modern equipment and technological resource. Upper meteorological observation and monitoring were also not available in Pakistan. However, Saleem [

AQUA satellite capturing water content from September 2002 to present and AIRS [Atmospheric Infrared Sounder] is a sensor which is mounted on it [^{2} and grid size is 10 by 10 degree latitude and longitude [

The current study is carried out by using AIRS level 6 version 3, for monthly average relative humidity over 1000 to 100 hPa pressure levels. The studies on its captured data set have been carried out by using balloons, radio sounding, and aircraft observations. Divakarla, Barnet [

In order to produce the best estimations for this missing data 30% of the relative humidity was used to interpolate from 70% already known relative humidity samples. Mean Absolute Error [AME], Root Mean Square Error [RMSE], Coefficient of Determinations [R^{2}] Correlation Coefficient [Corr] used as performance indicators in this research.

1) Inverse Distance Weight Interpolation (Idw)

It is the deterministic spatial interpolation which based on Tobbular’ Law of geography [

R H ( x j ) = ∑ i = 1 t R H ( x i ) S i j − r ∑ i = 1 t S i j − r (1)

where R H ( x j ) represents a missing sample of relative humidity, S i j − r was the weight factor for R H ( x i ) samples, t is the total number of relative humidity samples and r is the degree of the weighting factor. Algorithm of IDW developed by Langella [

2) Nearest Neighbor Interpolation [Nni]

NNI interpolation replaces gaps in dataset with nearest sample value [

3) Bilinear Interpolation [Bi]

BI refills the gaps in dataset with respect to the best fit linear line in a dataset. Junninen, Niska [

R H = R H y 1 + m ( R H x + R H x 1 ) (2)

m = R H y 2 − R H y 1 R H x 2 − R H x 1

x 1 < x < x 2 and y 1 < y < y 2

Equation (2) was the simple linear line equation, having ( R H x 1 , R H y 2 ) and ( R H x 2 , R H y 2 ) sample points with m as their gradient.

4) Natural Interpolation (Ni)

In this method, the missing sample gets value from its natural neighbor and Delaunay triangulation will be used to select natural neighbors sample around the missing value [

Robeson [^{2}] and Correction Coefficient [Corr] as performance predictor for these interpolations. The present study was, also carried out in line with the same standard procedure.

1) Root Mean Square Error [RMSE]

Norazian [

R M S E = { 1 t ∑ i = 1 t [ R H o i − R H p i ] 2 } 1 2 (3)

In Equation (3) t was the total number of samples [

2) Mean Absolute Error (MAE)

Junninen, Niska [

M A E = 1 t ∑ i = 1 t | R H o i − R H p i | (4)

Precise refilling of dataset will be based on MAE value near to 0.

3) Correlation Coefficient (Corr)

It’s value of +1 shows a very good correlation and the good replacement of missing data. Very bad imputation will occur when Corr has value near to 0. Fisher [

c o r r = cov ( R H p i , R H o i ) ∂ R H p i ∂ R H o i (5)

cov [RHpi, RHoi] represents the covariance of RHpi, RHoi while ∂ R H p i ⋅ ∂ R H o i is the product of standard deviations.

4) Coefficient of Determination (R^{2})

It tells us about the degree of correlation in the dataset [^{2} as given below:

R 2 = [ 1 t ∑ i = 1 t ( R H p i − R H p i . m ) ( R H o i − R H o i . m ) ∂ p ∂ o ] (6)

where R H p i . m was the average value of imputed samples and R H o i . m is mean of observed samples.

The imputation over each pressure level was determined and the results are presented below.

This interpolation technique showed good performance indicators for refilling of relative humidity for 200, 250, 300, 400, and 500 hPa levels (

Performance parameter reveals that refilling of relative humidity at 100, 150, 200, 250, 300, 400 and 500 hPa was accurate and perfect with BI. Besides, for the remaining pressure levels: 600, 700, 850, and 925 hPa, the results were also very accurate and perfect. A strong correlation [0.995] and R^{2} close to 1, indicating very good imputation of relative humidity for these pressure levels in the atmosphere (

This interpolation technique sit best for refilling of relative humidity for 100, 150, 200, 250, 300, 400 hPa with less than ±0.5 RMSE value. The refilling of relative humidity for other pressure levels: 500, 600, 700, 850, 925 hPa also show very good results i.e., RMSE values remain close to ±1 with MAE 0.339 along with very strong correlation [0.985]. This interpolation technique show poor refilling of relative humidity data set at 1000 hPa level (

This interpolation technique showed perfect and accurate refilling of dataset for 150, 200, 250, 300 and 400 hPa levels. This interpolation proved not to be a very accurate one for remaining pressure levels: 100, 500, 600, 700, 850, 925 and 1000 hPa (

Pressure Level | RMSE | MAE | Corr | R^{2} |
---|---|---|---|---|

100 hPa | 2.99499 | 1.311164 | 0.990843 | 0.976053 |

150 hPa | 1.381706 | 0.626196 | 0.990864 | 0.976079 |

200 hPa | 0.986867 | 0.442843 | 0.983107 | 0.96093 |

250 hPa | 1.430705 | 0.614239 | 0.984604 | 0.963803 |

300 hPa | 1.614868 | 0.684446 | 0.987918 | 0.970294 |

400 hPa | 1.968786 | 0.810043 | 0.986889 | 0.968299 |

500 hPa | 2.572736 | 0.976825 | 0.982391 | 0.959484 |

600 hPa | 2.544889 | 0.991209 | 0.977163 | 0.949285 |

700 hPa | 2.666085 | 1.062507 | 0.969852 | 0.935243 |

850 hPa | 2.635646 | 1.140266 | 0.966787 | 0.929377 |

925 hPa | 3.187562 | 1.328095 | 0.955893 | 0.908431 |

1000 hPa | 2.387433 | 0.946757 | 0.946079 | 0.890205 |

Pressure Level | RMSE | MAE | Corr | R^{2} |
---|---|---|---|---|

100 hPa | 0.436691 | 0.152175 | 0.999766 | 0.993695 |

150 hPa | 0.194324 | 0.074077 | 0.999762 | 0.993687 |

200 hPa | 0.173128 | 0.072259 | 0.998547 | 0.991276 |

250 hPa | 0.233962 | 0.088414 | 0.999077 | 0.992328 |

300 hPa | 0.317036 | 0.116434 | 0.999382 | 0.992932 |

400 hPa | 0.549317 | 0.174408 | 0.998675 | 0.991528 |

500 hPa | 1.094282 | 0.319881 | 0.996549 | 0.987313 |

600 hPa | 1.291553 | 0.361526 | 0.993658 | 0.981599 |

700 hPa | 1.057029 | 0.353584 | 0.995588 | 0.98541 |

850 hPa | 0.870191 | 0.316185 | 0.996085 | 0.986395 |

925 hPa | 1.137214 | 0.405838 | 0.994118 | 0.982515 |

1000 hPa | 2.98034 | 0.735997 | 0.985798 | 0.966279 |

Pressure Level | RMSE | MAE | Corr | R^{2} |
---|---|---|---|---|

100 hPa | 0.399491 | 0.146958 | 0.999806 | 0.993775 |

150 hPa | 0.187512 | 0.072191 | 0.999776 | 0.993715 |

200 hPa | 0.1797 | 0.075983 | 0.998366 | 0.990918 |

250 hPa | 0.287835 | 0.101589 | 0.998262 | 0.990718 |

300 hPa | 0.30227 | 0.114388 | 0.99941 | 0.992988 |

400 hPa | 0.461043 | 0.161557 | 0.999141 | 0.992454 |

500 hPa | 0.980983 | 0.28666 | 0.997148 | 0.988499 |
---|---|---|---|---|

600 hPa | 1.140132 | 0.362032 | 0.995251 | 0.984744 |

700 hPa | 1.025949 | 0.355288 | 0.995727 | 0.985687 |

850 hPa | 0.946461 | 0.339289 | 0.995654 | 0.985543 |

925 hPa | 1.154599 | 0.413291 | 0.993726 | 0.981737 |

1000 hPa | 3.034445 | 0.798869 | 0.985332 | 0.965328 |

Pressure Level | RMSE | MAE | Corr | R^{2} |
---|---|---|---|---|

100 hPa | 2.023075 | 0.836265 | 0.99583 | 0.985886 |

150 hPa | 0.90006 | 0.376398 | 0.995761 | 0.985752 |

200 hPa | 0.705101 | 0.307897 | 0.988958 | 0.972372 |

250 hPa | 0.934865 | 0.389087 | 0.991558 | 0.977475 |

300 hPa | 1.095927 | 0.478559 | 0.993971 | 0.982213 |

400 hPa | 1.407733 | 0.579422 | 0.992967 | 0.980229 |

500 hPa | 2.057877 | 0.723148 | 0.988169 | 0.970786 |

600 hPa | 2.133231 | 0.711181 | 0.982746 | 0.960181 |

700 hPa | 2.166127 | 0.734433 | 0.981541 | 0.957859 |

850 hPa | 2.227724 | 0.735897 | 0.97875 | 0.952426 |

925 hPa | 2.445183 | 0.820366 | 0.976137 | 0.947359 |

1000 hPa | 2.333847 | 0.668129 | 0.972757 | 0.940868 |

The scatter plots were adapted in order to identify the perfect and accurate interpolation form Natural and Bilinear interpolations. Good results for refilling of relative humidity were found for 100, 150, 200, 250, 300 and 400 hPa through NNI (

However, NNI not able to accurately refill the missing data of relative humidity over 600, 700, 850, 925 and 1000 hPa pressure levels (

Filling of gaps in data with BI seem good for 100, 150, 200, 250, 300 and 400 hPa levels in every month of years (

Besides for remaining pressure levels: 500, 600, 700, 850, 925 and 1000 hPa BI suit to best for refilling (

Based on the critical check and evaluation of interpolations regarding their product it concluded that the Bilinear Interpolation was the best and accurate for all pressure levels while Natural Neighbor Interpolation proved to be the second best interpolation to substitute missing relative humidity of 100 to 1000 hPa.

We very appreciative to AUQA-AIRS team for their assistance to interpolate AIRS data set. The authors additionally wish to recognize Mr. Alessio Martion, University of the Rome, LaSapienza Italy for his important recommendations to enhance this research.

The authors declare no conflicts of interest regarding the publication of this paper.

Saleem, U., Akram, M.S., Ullah, M.F. and Rehman, F. (2018) Accurate Imputation for Relative Humidity over Pakistan Gathered from AQUA Satellite. Open Journal of Geology, 8, 987-1001. https://doi.org/10.4236/ojg.2018.810059