Predicting Precipitation Events Using Gaussian Mixture Model

In this paper, a Gaussian mixture model (GMM) based classifier is described to tell whether precipitation events will happen on a certain day at a certain time from historical meteorological data. The classifier deals with a two-class classification problem where one class represents precipitation events and the other represents non-precipitation events. The concept of ambiguity is introduced to represent cases where weather conditions between the two classes like drizzles, intermittent or overcast are more likely to happen. Six groups of experiments are carried out to evaluate the performance of the classifier using different configurations based on the observation data released by Shanghai Baoshan weather station. Specifically, a typical classification performance of about 75% accuracy, 30% precision and 80% recall is achieved for prediction tasks with a time span of 12 hours.


Introduction
Predicting precipitation events, as a part of weather prediction, is often done by numerical weather prediction.Numerical weather prediction predicts future weather conditions with the help of partial differential equations.Various attempts to apply machine learning methods to weather prediction have been made but often with other methods than Gaussian mixture model.The earliest attempts to apply machine learning to precipitation prediction were made using perceptrons.More recent researches are often based on artificial neural network [1] and support vector machine [2].A detailed review is available in [2].Gaussian mixture model is a simple but effective model for classification and cluster-Journal of Data Analysis and Information Processing ing compared with other classification models.It has many successful applications in various areas such as computer vision, digital signal processing, etc.For example, some recent researches use Gaussian mixture model for object tracking and segmentation [3] [4].In this paper, we attempt to predict precipitation events using Gaussian mixture model (GMM).Six groups of experiments are carried out to evaluate the performance of the classifier based on the observation data.Furthermore, instead of predicting accurate precipitation, we only considered a single class of precipitation events regardless of how much precipitation is observed.These are the main points how this paper differs from other researches.The rest of this paper is organized as follows.In Section 2, we briefly describe Gaussian mixture model, expectation-maximization (EM) algorithm and our classifier.In Section 3, details of implementing the model are discussed.
In the last section, the experimental results are given and an analysis of the results is also presented.

Gaussian Mixture Model
Given an n-dimensional vector x , a Gaussian mixture probability density function can be written as follows, ( ) ( ) where m represents the number of mixture components, and mixture weights w i Given the value of m, the value of 3m parameters, i w , i µ and Σ i ,  , can then be determined.EM algorithm is used to estimate these parameters.For a classifier with K classes, a GMM is trained for each class.

EM Algorithm
In this paper, the parameters of GMMs are estimated using Expectation Maximization algorithm (EM), an algorithm to find the maximum likelihood estimate of unknown parameters.For a data set with g feature vectors { } 1 , , g x x  , the likelihood function of GMM can be written as follows ( ) A detailed description of EM algorithm can be found in [5].
Since EM typically converges to a local optimum and involves random initialization, the estimated parameters may sometimes result in poor model performance.To solve this problem, a workaround is proposed as described later.

Precipitation Events Classifier
For a classifier with K classes , 1, 2, ,  , a feature vector x is assigned to the class with the greatest posteriori probability.That is, assign x to class Using Bayes' theorem, this can also be written as where ( ) k p λ stands for the priori probability of class k λ .λ is used to denote the class of precipitation events, and 2 λ is used to de- note the class of non-precipitation events.The precipitation events classifier deals with a two-class classification problem.In this paper, we let ( ) ( ) thus a vector x is assigned to the class with the greatest Gaussian mixture density value.That is, the classifier reports precipitation events if ( ) ( ) and reports non-precipitation events otherwise.In practice, these values are computed and compared in their log form, thus the above inequality is evaluated as follows ( ) ( ) For the precipitation events prediction problem, feature vectors of different classes can appear very close to each other in terms of distance.In such cases, the prediction results are often inaccurate.For this reason, the prediction results are flagged as ambiguous if which is the same as When classified as ambiguous, the prediction results are considered close to the cases where weather conditions are between the two classes like drizzles, intermittent or overcast.However, the authenticity of the above claim is not tested since doing so will make a multiclass classification problem.In the case of evaluating the classification performance, data points flagged as ambiguous are not involved in the evaluation process.From our experimental results, we found that

Data Acquisition and Feature Extraction
In this paper, the meteorological data of Shanghai, China is used for experiments.The data are obtained from Shanghai Baoshan weather station, station id 58,362 (Historical data obtained from http://www.meteomanz.com/).The station issues observation data 8 times each day, with a fixed interval 3 hours.
We have chosen temperature, relative humidity, sea level pressure, wind direction, wind speed, total cloud cover and precipitation as features.Thus, a set of × feature vectors can be obtained after feature extraction.Some fields of the observation data are omitted, this is done to avoid the need to cope with too many missing data.Specifically, when wind speed is equal to 0, we let wind direction be 0. When converting original data to feature vectors, a normalization process is applied to ensure all components of the feature vectors have a lower bound of 0 and an upper bound of 100.This is simply done by linear transformations.All the features used by our model are listed in Table 1.

Preprocessing
Since observation data are given in the SYNOP format (FM-12), all possible weather conditions in observation data are known (see http://weather.unisys.com/wxp/Appendices/Formats/SYNOP.html for detail).These weather conditions are divided into the two classes and the corresponding feature vectors are accordingly classified for training.Specifically, fog, mist, haze and overcast are considered non-precipitation events, intermittent, drizzle and snow are considered precipitation events.
Even though features in observation data that contain too many missing data are omitted, there are still cases where data can be absent due to difficulty of observation etc.In such cases, these data rows are simply removed since removing these data have no effect on training or testing the classifier.This step can cause a data loss of about 60%.
When training GMMs, diagonal covariance matrices are used instead of full covariance matrices.This is done because it has been found that doing so will not only make perform better in practice but will also significantly reduce the computation needed since inversion of matrices is computationally intensive [6].

Performance Evaluation
For our classification model, we refer to the class of precipitation events as positive class and non-precipitation events as negative class.Subsequently, we denote the number of actual positive data points being classified as positive by true positives (TP) or by false negatives (FN) if being classified as negative.Similar definitions can be given for true negatives (TN) and false positives (FP).To evaluate the performance of the classifier, the definition of classification accuracy is introduced.Instead of defining classification accuracy as the ratio of correctly classified samples to all samples in the data set, we define classification accuracy as follows Classification accuracy is defined this way because precipitation events happen less often than non-precipitation events, precipitation events data typically take up only 10% of all data, which will cause FN and TP have little effect on classification accuracy.Precision and recall are also used as key factors to evaluate classification performance, defined as follows Since EM typically converges to a local optimum and involves random initialization, a single test is not enough to assess model performance.Thus, the classification accuracy, precision and recall are averaged over 10 trials and the averages are used as metrics for performance evaluation.

Experimental Results
In this section, the experimental results obtained from six groups of experiments carried out to evaluate the performance of the model with different configura- Similarly, GMMs of different number of mixtures are used, namely 16, 32, 64, 128.64 mixtures are used except for 1, where the effect of number of mixtures is assessed.A time span of 12 hours is used for predictions except for experiment 5.
In the first experiment, we compared GMMs of different number of mixtures and found that these models have similar performance regardless of their number of mixtures from the results shown in Figure 1.This could mean that a number of mixtures as small as 16 may already be enough to represent the distribution of the feature vectors when there is enough training data.
We can tell that GMM generalize to the observation data well from the fact that there is little performance loss for test data compared with training data.
In the second experiment, the 3-year data set is used to train the GMMs and test their performance.A significant decrease in both accuracy and recall is observed in the experimental results shown in Table 2.
This could be a clue that 2 years of training data may not be enough as opposed  In the last experiment, we tried adding more information to the original feature vector by appending a feature vector of observation data 12 hours before the prediction is being made.Doing so forms a new 14 1 × feature vector that is virtually two combined 7 1 × feature vector.The test results indicate a slightly negative effect on classification performance as compared with the first experiment, which is shown in Table 4.The increase in the complexity of feature vectors may have made it harder for GMM to model the relationship between features and classes.
In this paper, the same priori probabilities are chosen for both classes and log2 as the threshold for determining ambiguity.This is done because, for one thing, we want to ensure the availability of enough training data since model generalize poorly with insufficient training data.For another, the purpose of this paper is

1 n
density function of a Gaussian distribution parameterized by a × mean vector i µ and a n n × covariance matrix Σ i .Component densi- ties can be written as follows

1
tions are described.To illustrate the effect of the amount of data, two sets of data are used, one of which contains 3 years of historical data and the other set contains 11 years of historical data.2015 and 2016 are chosen as the source of the 3-year data set and year 2006 to 2016 as the source of the 11-year data set.Specifically, a subset of the whole data set is chosen as training set and the remainder as test set, the number of data points is about 2:1 for training set and test set respectively.The 11-year data set is used for most of our experiments apart from experiment 2, where the model performances with different data sets are com-H.T. Ling, K. P. Zhu DOI: 10.4236/jdaip.2017.54010136 Journal of Data Analysis and Information Processing pared.

Figure 1 .
Figure 1.Comparison of classification performance for different number of mixture components.

Table 1 .
All features used in feature vectors.

Table 2 .
Experimental results for experiment 2, comparison of data sets.

Table 3 .
Concretely, a set of GMMs is trained solely for daytime observation data and another set of GMMs for all nighttime observation data in experiment 3.In experiment 4, a set of GMMs is trained solely for spring observation data and another set of GMMs for summer observation data.In either case, the model performance is tested against their corresponding test data.The test results suggest that separating training data has no obvious effect on model performance.In experiment 5, we measured how fast the predicting power of the model decrease with increasing prediction time span.The results are illustrated in Figure2.Little prediction power is observed particularly for the 48-hour time span model, with an extremely low precision and a classification accuracy of about 50%.This suggests that the GMM based classifier may have little application value beyond the point of 24 hours.

Table 3 .
Experimental results for experiment 3 and 4, separating day/night spring/summer data.