AQUA Satellite Data and Imputation of Geopotential Height : A Case Study for Pakistan

In current study an attempt is carried out by filling missing data of geopotiential height over Pakistan and identifying the optimum method for interpolation. In last thirteen years geopotential height values over were missing over Pakistan. These gaps are tried to be filled by interpolation Techniques. The techniques for interpolations included Bilinear interpolations [BI], Nearest Neighbor [NN], Natural [NI] and Inverse distance weighting [IDW]. These imputations were judged on the basis of performance parameters which include Root Mean Square Error [RMSE], Mean Absolute Error [MAE], Correlation Coefficient [Corr] and Coefficient of Determination [R]. The NN and IDW interpolation Imputations were not precise and accurate. The Natural Neighbors and Bilinear interpolations immaculately fitted to the data set. A good correlation was found for Natural Neighbor interpolation imputations and perfectly fit to the surface of geopotential height. The root mean square error [maximum and minimum] values were ranges from ±5.10 to ±2.28 m respectively. However mean absolute error was near to 1. The validation of imputation revealed that NN interpolation produced more accurate results than BI. It can be concluded that Natural Interpolation was the best suited interpolation technique for filling missing data sets from AQUA satellite for geopotential height.


Introduction
Missing data is a big problem encountered at a number of times during envi-U.Saleem [4].A lot of causes such as routine maintenances, sampling errors in satellite sensor, failures of satellite sensor during observations, meteorological abnormalities and human errors are responsible for the discontinuity of data set [3] [4].Geopotential height is the height of a pressure surface in the atmosphere above mean sea level [MSL].The geopotential height data gathered from AQUA satellite contains incomplete data matrices in 24 standard pressures levels [5].A research can become inaccurate if missing data sets are used [4] [6].Geopotential height was the function of air temperature, pressure, winds, and topography of the area, which required a careful method for its imputations.One of the oldest and most suggested methods to fill this missing information was replacing mean values of neighbor samples [1] [2] [3].
Many different interpolation techniques have been developed [2] [6] [7] [8] [9].The best method depends upon the spatial and temporal variations of geopotential height in the atmosphere.Shen, Reiter [10], applied different interpolations on geopotential height keeping in view its variations in the atmosphere.
Pakistan is the central country of South Asia bordered with India to East, China in North, South to Arabian Sea and Afghanistan to West (Figure 1 and Figure 2).It is arid to semi-arid country except in the north areas which received annual rainfall of 760 mm to 2000 mm annually.Pakistan has four provinces, of which Baluchistan is the driest and desert area facing 210 mm of rain averagely [16].3/4 th area of the country is getting no more than 250 mm of rain annually.In summer season relative humidity remains between 20% and 50%.In winter average temperature varies from 4˚C to 20˚C in most areas, while an increasing temperature of 0.6˚C to 1.0˚C is found along the coastal areas [17].
The actual thrust of this research work is to devise a workable methodology for carrying out scientific observations of upper atmosphere meteorology in Pakistan in spite of lacking modern equipment and technological resources in relevant departments.The published literature is not available in Pakistan, however, Saleem and Ahmed [18]; Saleem [19]; Saleem [20] are few initiatives on upper-level atmospheric observations.

Data Used
In this research, the monthly mean of geopotential height [in meters] for the past 13 years, obtained from Atmospheric Infrared Sounder [AIRS] level 3, was used.AIRS was the instrument on AQUA satellite, which launched in May 2002.This satellite has very high spectral resolutions: e.g., it captures climate data through nearly 2382 bands in the electromagnetic spectrum and its geopotential height product is very high resolution 0.5˚ × 0.5˚ grid cell.Version 6 of its product contains fewer biases in geopotential height [5].Besides good quality of climate data, GESDISC 1 provides geopotential height data for the whole global.

1) INVERSE DISTANCE WEIGHTING
This imputation resembles to Tobler's first law of geography in which the weight of the known samples will be determined based on the distances from the imputed sample [Robeson, 1994].More will be the distance of neighbors from a predicted sample less will be their weight in interpolation.Ferrari and Ozaki [9] used Equation ( 1) which is given below: where r aj d − is the weighting factor of distance between the a th original neighbor sample oi z , ij z is j th the point to be estimated, n is the total number of the sample used, and r weighting factor.Langella [23], formula for IDW was used in the missing data imputations.

2) NEAREST NEIGHBORS INTERPOLATION [NN]
Missing values were directly imputed with a most suitable neighbor around the missing sample [24] [25] in this interpolation technique.

3) BILINEAR INTERPOLATION [BI]
Junninen et al. [2004] used Equations ( 2) and ( 3) for Bilinear Interpolations ( ) It was a linear equation with ( ) , o i z z sample values, m being a gradient of this line.

4) NATURAL NEIGHBORS INTERPOLATION [NI]
1 GESDISC stands for Goddard Earth Sciences Data Information Services Center.
This spatial interpolation gives the nearest neighbor value of the sample to the missing geopotential height.D. and Boissonnat and Cazals [25] explain the selection of such natural neighbors for randomly missing data being on Delaunay triangulation.

1) ROOT MEAN SQUARE ERROR [RMSE]
Root Mean Square was calculated by dividing the sum of the square of the difference between imputed geopotential heights and actual value with the total number of samples, and then finally taking the square root of this term [4].
Smaller values indicate a perfect estimation of missing data set.Equation ( 4) was its mathematical formula used in this research.
This parameter calculates the total difference [±] between original and interpolated geopotential height.
MAE value range from 0 to ∞ .Its value close to 1 indicates more accurate and perfect imputation of missing data set.

3) CORRELATION COEFFICIENT [Corr]
Its value of +1 indicates very strong correlation and near to 0 signifies a bad correlation between actual and predicted geopotential height.Equation ( 6) was used for the correlation coefficient in this research.
( ) In Equation ( 6) nominator represents covariance while denominator represents the product of their standard deviations in the data set.

4) COEFFICIENT OF DETERMINATION [R 2 ]
This parameter provides a degree of correlation between the actual and predicted sample geopotential height [1] which varies between 0 and 1. Noor, Abdullah [4], suggested values closer to 1 indicate a perfect fit for the data set.
Rahman and Islam, [2011] used the following formula for R 2 .

( )( )
In Equation ( 7), A i was the average of predicted samples and A o is the average of sample values before prediction.

Results
These were the results of the performance parameter for each interpolation technique.

Performance Parameters from IDW
On all pressure level IDW showed very biased results.IDW produced highest RMSE ± 14.45 m over 1 hPa while lowest value of this error was ±3.66 m at 925 hPa.Actual and predicted values indicating low quality of interpolation for missing values of geopotential height with IDW as correlation coefficient was very low (Table 1).

Performance Parameters from Nearest Neighbor Interpolation
RMSE value remains between ±4.925 and ±11.369 m with Nearest Neighbor Interpolations.Such a large RMSE, poor correlation, and poor fit to the surface indicated bad refilling of data with this interpolation technique (Table 2).

Performance Parameters from Bilinear Interpolation
Bilinear Interpolation appeared to be relatively better as compared to the above mentioned two interpolations.RMSE was ±2.461 to ±5.241 m in refilling of gaps in data up to 1000 hPa.MAE remains less than 1 and strong correlation (0.98) was found in the imputation of geopotential height.Coefficient of Determination was close to 0.98 for imputation over 1, 1.5, 2, 3, 5, 7, 10, 15, 70, 100, 150, 200, 250, 300 hPa (Table 3).

Performance Parameters from Natural Neighbor Interpolation
Reasonable low RMSE come in refilling of geopotential height over 2, 3, 5, 7, 30, 50, 70, 200, 250, 400, 500, 600 hPa.Largest RMSE was ±5.10 m at 10 hPa and lowest RMSE ±2.2 m for refilling of gaps in data at 850, 925, 1000 hPa.A good correlation coefficient [near to 0.99] was come in the refilling of geopotential height.R 2 was near to 1 concluding a good line of fit between actual and predicted data set (Table 4).

Discussions
Refilling of geopotential height over 24 pressure levels was good with Bilinear and Natural Neighbor Imputations (Tables 1-4).In order to nominate optimum interpolation from both of them, scatter plots of original and estimated geopotential heights were investigated.Poor data refilling was come in February and March (Figure 3(a)).

Conclusion
AQUA Satellite data was interpolated for Missing Data of Geopotential height.
Based on critical checks and evaluation of interpolations regarding their product, it concluded that the NN and IDW interpolations for filling of missing geopotential height data were proved not to be best and perfect (Table 1 and   Table 2).Good results were found between BI and NI.However, after examining scatter plots of each month, it was found that NI was more accurate and reliable for missing data of geopotential height over 24 hPa levels.

Figure 1 .
Figure 1.Location map of the Pakistan with it host regions (20).

Figure 2 .
Figure 2. Altitude map of Pakistan showing elevation [in meters] depicted in different color scales (20).

Figure 3 .
Figure 3. (a) Results of interpolation of relative humidity [1 hPa to 15 hPa] with Bilinear Interpolation; (b) Bilinear Interpolation for relative humidity imputation from 20 hPa to 250 hPa; (c) Imputation of relative humidity from 300 hPa to 1000 hPa with Bilinear Interpolation.

Figure 4 .
Figure 4. (a) Imputation of relative humidity (1 hPa to 15 hPa) with Natural Neighbor Interpolation; (b) Imputation of relative humidity (20 hPa to 250 hPa) with Natural Neighbor Interpolation; (c) Imputation of relative humidity (300 hPa to 1000 hPa) with Natural Neighbors Interpolation.
et al.

Table 1 .
Results indicating poor performance parameters with Inverse Distance Weighting Interpolation.

Table 2 .
Results indicating poor performance indicators from Nearest Neighbor Interpolation.

Table 3 .
Results indicating good performance parameters for refilling of gaps in data with Bilinear Interpolation.

Table 4 .
Good results of performance indicators with Natural Neighbor Interpolation.