Homogeneity of Monthly Mean Air Temperature of the United Republic of Tanzania with HOMER

Abstract

The long-term climate datasets are widely used in a variety of climate analyses. These datasets, however, have been adversely impacted by inhomogeneities caused by, for example relocations of meteorological station, change of land use cover surrounding the weather stations, substitution of meteorological station, changes of shelters, changes of instrumentation due to its failure or damage, and change of observation hours. If these inhomogeneities are not detected and adjusted properly, the results of climate analyses using these data can be erroneous. In this paper for the first time, monthly mean air temperatures of the United Republic of Tanzania are homogenized by using HOMER software package. This software is one of the most recent homogenization software and exhibited the best results in the comparative analysis performed within the COST Action ES0601 (HOME). Monthly mean minimum (TN) and maximum (TX) air temperatures from 1974 to 2012 were used in the analysis. These datasets were obtained from Tanzania Meteorological Agency (TMA). The analysis reveals a larger number of artificial break points in TX (12 breaks) than TN (5 breaks) time series. The homogenization process was assessed by comparing results obtained with Correlation analysis and Principal Component analysis (PCA) of homogenized and non-homogenized datasets. Mann-Kendal non-parametric test was used to estimate the existence, magnitude and statistical significance of potential trends in the homogenized and non-homogenized time series. Correlation analysis reveals stronger correlation in homogenized TX than TN in relation to non-homogenized time series. Results from PCA suggest that the explained variances of the principal components are higher in homogenized TX than TN in relation to non-homogenized time series. Mann-Kendal non-parametric test reveals that the number of statistical significant trend increases higher with homogenized TX (96%) than TN (67%) in relation to non-homogenized datasets.

Share and Cite:

P. Luhunga, E. Mutayoba and H. Ng’ongolo, "Homogeneity of Monthly Mean Air Temperature of the United Republic of Tanzania with HOMER," Atmospheric and Climate Sciences, Vol. 4 No. 1, 2014, pp. 70-77. doi: 10.4236/acs.2014.41010.

1. Introduction

The study of climate change and variability in the United Republic of Tanzania (URT) depends on existing longterm observational climate datasets. The value of these datasets, however, strongly depends on its homogeneity [1]. A homogeneous climate time series is defined as the one whose variability is only caused by change in weather and climate [2]. Unfortunately, long instrumental records are rare if ever homogeneous. The inhomogeneity in these datasets is due, for example, to relocations of meteorological station, change of land use cover surrounding the weather stations, substitution of meteorological station, changes of shelters, changes of instrumentation due to its failure or damage, and change of observation hours [2-4]. Most of these changes cause sudden shifts (change-points) in the series of local climate data, while some others (particularly urban development) result in gradually increasing biases from the real macroclimatic characteristics [5,6]. All of these inhomogeneities can bias a time series and lead to misinterpretations of the studied climate [5].

Several techniques have been developed for non-climatic inhomogeneities detection and adjustment [2,4,7- 13]. Recently new procedures were developed to detect and correct multiple change points using reference series [5,13,14]. Review papers and comparison studies on different homogenization techniques have been published regularly in different parts of the globe [1,5,15-17].

More recently a comprehensive analysis to assess different homogenization techniques of climate series was included in scientific programme of the COST Action HOME ES 0601: Advances in Homogenization Methods of Climate series: an integrated approach (HOME). HOME objective was to develop a general homogenization method for homogenizing climate and environmental datasets. This task started in 2007 and was accomplished in 2011 with the release of two new software packages: HOMER (for monthly data) and HOM/SPLIDHOM (for daily data) homogenization [18]. The aim of this paper is to use HOMER software package to homogenize monthly mean minimum (TN) and maximum (TX) air temperature datasets of the URT in the process of constructing reliable long-term datasets from original climate observations.

2. Area of Study

The domain of study is the URT which is located in East Africa between longitudes 29˚E to 41˚E and latitudes 1˚S and 12˚S. The Country lies on an area of 945,000 km2 of which 884,000 km2 is Land mass and 61,000 km2 is Lakes, rivers and seashore. The URT has several physical features that contribute to high local variability in its climate: that include topography ranging from sea level to 1600 m in the west, high mountain Kilimanjaro at 5895 m altitude in the North eastern highland, Lake Victoria in the North, Lake Nyasa and River Ruvuma in the South and Lake Tanganyika in the West. Much of the country lies above 1000 m altitude with many areas above 1500 m in the central and North. It also has a complex seasonality associated with Indian Ocean [19- 22]. The URT is relatively sparsely covered with weather stations that are unevenly distributed and located in low and high altitudes areas. Most of the meteorological station networks that mainly comprise classical weather stations collecting data since 1900s are managed by the Tanzania Meteorological Agency (TMA).

3. Data Description

Monthly mean minimum (TN) and maximum (TX) air temperature from1974 to 2012 were used in the analysis. These datasets were obtained from TMA. Table 1 indicates the geographic information of meteorological stations used in this study.

Table 1. Geographic information of meteorological stations used in this study.

4. Methodological Procedures

HOMER software was used to detect and correct the inhomogeneities in TN and TX datasets. The software is one of the most recent homogenization software and exhibited the best results in the comparative analysis performed within the COST Action ES0601 (HOME) [19]. HOMER comprises additional functions to perform fast quality control of the data, which includes functions of the CLIMATOL R package which allows the user to estimate the station density, correlogram, histograms, box plots, and cluster analysis [2]. For the detection of heterogeneities in the datasets HOMER combines three detection algorithms: pairwise-univariate detection, joint detection and ACMANT-bivariate detection, and correct the datasets using ANOVA [1]. ACMANT is used to detect the most likely month of a change point (break). If the precise month of change is not known, the default is to validate the break at the end of the year, since detection is mainly performed on annual indices [6].

4.1. Missing Data Correction and Outlier Detection

The models used in HOMER for imputation of missing data and for outlier correction are presented in [6]. In these models missing datasets are corrected using ANOVA and Outliers are detected by pairwise comparison of different time series between candidate and best neighbour time series. This is performed by visual inspection of the plots of the difference between candidate and best neighbour time series (Figures 1-3). After a correction step, ACMANT bivariate detection confirms the selection changes on climate data series.

4.2. Development of Reference Time Series and Homogeneity Test

Reference time series must encompass the same climatic signal as the candidate series and are developed using several techniques. For example [22] developed a reference series for a 19-stations network that did not vary with time using arithmetic mean of all the other 18 stations in his network for each candidate. After the homogeneity test was run on all the stations, he created a new reference series as before but excluding those stations with inhomogeneities. Like [22,23] run homogeneity tests, and then uses homogenized data to develop reference time series which is used to rerun the homogeneity tests. Another technique is described in [5], where reference series are created based on correlation coefficients between stations. In this study, reference time series was created as weight average of all 24 stations network of non homogenized datasets, then the homogeneity test was run to assess the quality of homogenization by comparing both non homogenized data (hereafter NH) and homogenized data with reference series using the following methods;

4.2.1. Correlation Analysis

Correlation analysis was applied to annual time series to compute correlation matrix between annual time series of non homogenized data and homogenized data series to solve: 1) the correlation matrix between time series of non-homogenized and homogenized datasets 2) the Spearman Correlation Coefficient (SCC) between the nonhomogenized and homogenized time series. Also correlation analysis was performed between non-homogenized datasets and reference time series and between homogenized datasets and reference series with the objective of assessing the quality of the corrected dataset and to assess potential improvement in the similarity between time series of non-homogenized and homogenized data.

4.2.2. Non-Parametric Mann-Kendal Test for Trend

[24] first suggested using the test for significance of Kendall’s tau where the time (independent variable) is used as a test for trend. The Mann-Kendal test can be stated most generally as a test for whether Y (dependent variable) values tend to increase or decrease with time (monotonic change). In this study, the Mann-Kendal nonparametric test is used to estimate the existence, magnitude and statistical significance of potential trends in the NH, and HH time series, in order to assess the impacts of

Figure 1. Screen capture of HOMER outputs: Mtwara series compared to its neighbours. Pairwise comparisons are sorted according to the increasing values of the noise standard deviation (upper left corner of each plot). The neighbours are sorted based on their cross-correlation with Mtwara. The top panel is the difference time series of Mtwara with Dar es Salaam, which has a standard deviation of 0.14˚C. The second panel is the difference between Mtwara and Kibaha, (0.20˚C). The third panel is the difference of Mtwara and Songea (0.27˚C).

Figure 2. Screen capture of HOMER outputs: Raw data series of Mwanza with outlier and missing values.

Figure 3. Screen capture of HOMER outputs: Corrected data series of Mwanza

homogenization. This test is suggested for trend analysis by the WMO [25] and has been used in many published works on climate change and climate variability [26].

4.2.3. Principal Component Analysis (PCA)

Principal component analysis is the most efficient way of compressing geophysical data both in space and time, as well as separating noise from meaningful data. It enables fields of highly correlated data to be represented adequately by a small number of orthogonal functions and the corresponding orthogonal time coefficients, which account for much of the variances in their spatial and temporal variability. PCA techniques are used to extract from a covariance matrix, robust structures that explain the largest variance of the original matrix and at the same time are uncorrelated. The original data is split into orthogonal spatial patterns (eigenvectors) and corresponding time series coefficients (principal components). An eigenvector pattern that accounts for a large function of the variance (eigenvalues) is considered to be physically meaningful. [27] has provided a lucid outline of the mathematical procedure necessary to define the functions and their time coefficients. The PCA method is capable of extracting the principal components (PCs) of patterns in a time series; each of the PCs is orthogonal to the others. The first PC (PC1) is the most dominant pattern and explains most of the variance; PC2 is the second most dominant PC, followed by PC3, etc. This characteristic of PCA was used in this study to assess homogenization results. The Kaiser criterion of retaining factors with eigenvalues greater or equal to one was used to determine the number of significant PCs [28].

5. Results and Discussion

Results from PCA on non-homogenized and homogenized data sets suggest the following: 1) the explained variance of the principal components of homogenized datasets are higher than explained variance of the principal components of non-homogenized datasets for both maximum and minimum temperature; 2) explained variance of the principal components of homogenized datasets are higher for TX (62%) than TN (53%); 3) the explained variance of the 2 - 5 principal components of non-homogenized datasets are tendentiously higher than for TN than TX (Tables 2-5).

The temporal location and size of the breaks detected in TN and TX are indicated in Table 6. The numbers of detected breaks are lager in TX (12 breaks) larger than in TN (5 breaks).

Results from Correlation matrices between non-homogenized time series of TX and TN air temperature as well as between homogenized time series of TX and TN were computed. Results indicate that the Spearman Correlation Coefficient (SCC) values obtained for homogenized time series are higher in relation to non-homogenized time series. In general, SCC values between homogenized TX and TN time series is higher than those obtained in non-homogenized time series.

The calculated Spearman correlation coefficient values obtained between reference series and non-homogenized and homogenized time series (Figures 4 and 5) reveals that: 1) higher SCC between reference annual series and homogenized time series in most of the stations and for both maximum and minimum temperature than between reference annual series and non-homogenized time series; 2) higher SCC values between reference series and homogenised time series for TX but lower values in 2 weather stations (Tabora and Mlingano); 3) higher SCC values between reference series and homogenised time series for TN but lower values in 4 weather station (Songea, Moshi, Kibaha and Igeri).

Results from Mann-Kendal non-parametric test for trend reveals that the number of significant trend increases with homogenized than non-homogenized datasets. This is more evident for TX than TN, for example the number of significant trend for homogenized maximum and minimum air temperature are 96% and 67% respectively.

6. Conclusion and Recommendations

The TMA meteorological stations network analysed here includes stations located in the Island of Zanzibar and Pemba, also stations near the coastal line and inland

Table 2. Eigenvalues, variance and accumulated variance extracted by non-homogenized maximum air temperature datasets.

Table 3. Eigenvalues, variance and accumulated variance extracted by homogenized maximum air temperature datasets.

Table 4. Eigenvalues, variance and accumulated variance extracted by non-homogenized minimum air temperature datasets.

Table 5. Eigenvalues, variance and accumulated variance extracted by homogenized minimum air temperature datasets.

Table 6. Number of breaks in maximum and minimum air temperature.

Figure 4. Spearman correlation coefficient (SCC) between annual reference series and time series of non-homogenized (NH) and homogenized with HOMER (HH) of maximum air temperature for the period 1974-2012.

stations at high altitudes like Kilimanjaro airport station. Furthermore these weather stations are located in different climatic zones, which may reduce the quality of homogenization. Results indicate that the number of significant trend increases higher with homogenized than non-homogenized datasets. Larger numbers of breaks were identified in TX than in TN.

Correlation, trend and principal component analysis were used to assess the homogenization process performance by comparing the results obtained with using nonhomogenized and homogenized datasets. SCC statistical values obtained between reference series and homoge-

Figure 5. Spearman correlation coefficient (SCC) between annual reference series and time series of non-homogenized (NH) and homogenized with HOMER (HH) of minimum air temperature for the period 1974-2012.

nized time series are higher than those obtained between reference series and non-homogenized time series Trend analysis performed on TX and TN time series reveals an increase of the number of statistical significant trends with homogenized TX (96%) and TN (67%) in relation to the non-homogenized time series.

Results from PCA reveal that homogenization leads to an increase of the similarities in the spatial and temporal variability of TX and TN. This behaviour is more evident for TX than TN. In this study the explained variance for the PCA is higher for homogenized than for non-homogenized datasets independently of the different climates in the region. Finally it should be noted that, to the best of our knowledge, this study is the first effort to homogenize the climate datasets using HOMER in URT. Finally, the presented results show that, homogenized data sets are reliable long-term datasets compared with non-homogenized datasets.

Acknowledgements

The Authors would like to acknowledge the World Meteorological Organization (WMO) and Meteo France for providing funding support on the training course on climatology foundation for climate services, where training on homogenization of climate dataset using HOMER software packages was given. Authors are grateful to the Tanzania Meteorological Agency for providing data used in this study. The Authors are also grateful to Luis Pedro Mendes Freitas of Centre for Research and Technology of Agro-Environment and Biological Sciences (CITAB), University of Trás-os-Montes and Alto Douro, Apartado 1013, 5001-801 Vila Real, Portugal for technical support on development of the reference series and Miss Brigitte Dubuisson of Meteo-France, Direction de la Production for technical support on homogenization procedures.

NOTES

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] V. Venema, O. Mestre, E. Aguilar, I. Auer, J.A. Guijarro, P. Domonkos, G. Vertacnik, T. Szentimrey, P. Stepanek, P. Zahradnicek, J. Viarre, G. Müller-Westermeier, M. Lakatos, C. N. Williams, M. J. Menne, R. Lindau, D. Rasol, E. Rustemeier, K. Kolokythas, T. Marinova, L. Andresen, F. Acquaotta, S. Fratianni, S. Cheval, M. Klancar, M. Brunetti, Ch. Gruber, M. Prohom Duran, T. Likso, P. Esteban and Th. Brandsma, “Benchmarking Homogenization Algorithms for Monthly Data,” Climate of the Past, Vol. 8, 2012, pp. 89-115. http://dx.doi.org/10.5194/cp-8-89-2012
[2] L. Freitas, M. G. Pereira, L. Caramelo, M. Mendes and L. F. Nunes, “Homogeneity of Monthly Air Temperature in Portugal with HOMER and MASH,” Quarterly Journal of the Hungarian Meteorological Service, Vol. 117, No. 1, 2012, pp. 69-90.
[3] H. Tuomenvirta, “Homogeneity Testing and Adjustment of Climatic Time Series in Finland,” Geophysica, Vol. 38, No. 1-2, 2002, pp. 15-41.
[4] L.-J. Cao and Z.-W. Yan, “Progress in Research on Homogenization of Climate Data,” Advances in Climate Change Research, Vol. 3, No. 2, 2012, pp. 59-67. http://dx.doi.org/10.3724/SP.J.1248. 2012.00059
[5] T. C. Peterson, D. R. Easterling, T. R. Karl, P. Groismanm, N. Nicholls, N. Plummer, S. Torok, I. Auer, R. Boehm, D. Gullett, L. Vincent, R. Heino, H. Tuomenvirta, O. Mestre, T. Szentimrey, J. Salinger, E. J. Forland, I. Hanssen-Bauer, H. Alexandersson, P. Jones and D. Parker, “Homogeneity Adjustments of in Situ Atmospheric Climate Data: A Review,” International Journal of Climatology, Vol. 18, No. 13, 1998, pp. 1493-1517. http://dx.doi.org/10.1002/(SICI)1097-0088(19981115)18:13<1493::AID-JOC 329>3.0.CO;2-T
[6] O. Mestre, P. Domonkos, F. Picard, I. Auer, S. Robin, E. Lebarbier, R. Bohm, E. Aguilar, J. Guijarro, G. Vertachnik, M. Klancar, B. Dubuisson and P. Stepanek, “HOMER: A Homogenization Software—Methods and Applications,” Idojaras, Quarterly journal of the Hungarian Meteorological Service, Vol. 117, No. 1, 2013, pp. 47-67.
[7] H. Alexandersson, “A Homogeneity Test Applied to Precipitation Data,” International Journal of Climatology, Vol. 6, No. 6, 1986, pp. 661-675.
[8] P. D. Jones, S. C. B. Raper, R. S. Bradley, H. F. Diaz, P. M. Kelly and T. M. L. Wigley, “Northern Hemisphere Surface Air Temperature Variations: 1851-1984,” Journal of Climate and Applied Meteorology, Vol. 25, No. 2, 1986, pp. 161-179. http://dx.doi.org/10.1175/1520-0450(1986)025 <0161:NHSATV>2.0.CO;2
[9] D. W. Gullet, L. Vincent and P. J. F. Sajecki, “Testing for Homogeneity in Temperature Time Series at Canadian Climate Stations,” CCC Report No. 90-4, Atmospheric Environment Service, Downsview, 1990.
[10] D. R. Easterling and T. C. Peterson, “A New Method for Detecting and Adjusting for Undocumented Discontinuities in Climatological Time Series,” International Journal of Climatology, Vol. 15, No. 4, 1995, pp. 369-377. http://dx.doi.org/10.1002/joc.3370150403
[11] L. Vincent, “A Technique for the Identification of Inhomogeneities in Canadian Temperature Series,” Journal of Climate, Vol. 11, No. 5, 1998, pp. 1094-1104
[12] L. Perreault, J. Bernier, B. Bobée and E. Parent, “Bayesian Change-Point Analysis in Hydrometeorological Time Series. Part. 1. The Normal Model Revisited,” Journal of Hydrology, Vol. 235, No. 3-4, 2000, pp. 221-241. http://dx.doi.org/10.1016/S0022-1694(00)00270-5
[13] F. G. Toreti, A. Kuglitsch, E. Xoplaki and J. Luterbacher, “A Novel Approach for the Detection of Inhomogeneities Affecting Climate Time Series,” Journal of Applied Meteorology and Climatology, Vol. 51, No. 2, 2012, pp. 317-326. http://dx.doi.org/10.1175/JAMC-D-10-05033.1
[14] M. J. Menne and C. N. Williams Jr., “Detection of Undocumented Changepoints Using Multiple Test Statistics and Composite Reference Series,” Journal of Climate, Vol. 18, No. 20, 2005, pp. 4271-4286. http://dx.doi.org/10.1175/JCLI3524.1
[15] J.-F. Ducre-Robitaille, L. A. Vincent and G. Boulet, “Comparison of Techniques for Detection of Discontinuities in Temperature Series. International Journal of Climatology, Vol. 23, No. 9, 2003, pp. 1087-1101. http://dx.doi.org/10.1002/joc.924
[16] J. Reeves, J. Chen, X. L. Wang, R. Lund and Q. Lu, “A Review and Comparison of Changepoint Detection Techniques for Climate Data,” Journal of Applied Meteorology and Climatology, Vol. 46, No. 6, 2007, pp. 900-915. http://dx.doi.org/10.1175/JAM2493.1
[17] A. C. Costa and A. Soares, “Homogenization of Climate Data: Review and New Perspectives Using Geostatistics,” Mathematical Geosciences, Vol. 41, No. 3, 2008, pp. 291-305. http://dx.doi.org/10. 1007/s11004-008-9203-3
[18] HOME, “Homepage of the COST Action ES0601—Advances in Homogenisation Methods of Climate Series: An Integrated Approach (HOME),” 2011. http://www.homogenisation.org
[19] E. Black, J. Slingo and K. R. Sperber, “An Observational Study of the Relationship between Excessively Strong Short Rains in Coastal East Africa and Indian Ocean SST,” Monthly Weather Review, Vol. 131, No. 1, 2003, pp. 74-94. http://dx.doi.org/10.1175/1520-0493(2003)131<0074: AOSOTR>2.0.CO;2
[20] E. Black, “The Relationship between Indian Ocean SeaTemperature and East African Rainfall,” Philosophical Transactions of the Royal Society B, Vol. 1826, No. 1, 2005, pp. 43-47.
[21] R. O. Anyah and F. H. M. Semazzi, “Variability of East African Rainfall Based on Multiyear RegCM3 Simulations,” International Journal of Climatology, Vol. 27, No. 3, 2007, pp. 357-371.
http://dx.doi.org/10.1002/joc.1401
[22] K. W. Potter, “Illustration of a New Test for Detecting a Shift in Mean in Precipitation Series,” Monthly Weather Review, Vol. 109, No. 9, 1981, pp. 2040-2045. http://dx.doi.org/10.1175/1520-0493(1981)109<2040:IOANTF>2.0.CO;2
[23] H. Alexandersson, “A Homogeneity Test Applied to Precipitation Data,” Journal of Climatology, Vol. 6, No. 6, 1986, pp. 661-675. http://dx.doi.org/10.1002/joc.3370060607
[24] H. B. Mann, “Non-Parametric Test against Trend,” Econometrika, Vol. 13, No. 3, 1945, pp. 245-259. http://dx.doi.org/10.2307/1907187
[25] R. Sneyers, “On the Statistical Analysis of Series of Observations,” WMO Tech. Note 143, 145, Geneva, 1990.
[26] F. S. Rodrigo and R. M. Trigo, “Trends in Daily Rainfall in the Iberian Peninsula from 1951 to 2002,” International Journal of Climatology, Vol. 27, No. 4, 2007, pp. 513-529. http://dx.doi.org/10. 1002/joc.1409
[27] H. von Storch and F. Zwiers, “Statistical Analysis in Climate Research,” Cambridge University Press, 1999, 484 p.
[28] H. F. Kaiser, “Computer Program for Varimax Rotation in Factor Analysis,” Educational and Psychological Measurement, Vol. 19, No. 3, 1959, pp. 413-420. http://dx.doi.org/10.1177/001316 445901900314

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.