^{1}

^{*}

^{1}

Considering the recent developments in deep learning, it has become increasingly important to verify what methods are valid for the prediction of multivariate time-series data. In this study, we propose a novel method of time-series prediction employing multiple deep learners combined with a Bayesian network where training data is divided into clusters using K-means clustering. We decided how many clusters are the best for K-means with the Bayesian information criteria. Depending on each cluster, the multiple deep learners are trained. We used three types of deep learners: deep neural network (DNN), recurrent neural network (RNN), and long short-term memory (LSTM). A naive Bayes classifier is used to determine which deep learner is in charge of predicting a particular time-series. Our proposed method will be applied to a set of financial time-series data, the Nikkei Average Stock price, to assess the accuracy of the predictions made. Compared with the conventional method of employing a single deep learner to acquire all the data, it is demonstrated by our proposed method that F-value and accuracy are improved.

Deep learning has been developed to compensate for the shortcomings of previous neural networks [

However, while it is clear that deep learning is considered to underpin artificial intelligence and because the brain’s information processing mechanism is not fully understood, it is possible to develop new learners by imitating what is known about the information processing mechanisms of the brain. One way to develop new learners is to use a Bayesian network [

In this research, we develop a new learner using multiple deep learners in combination with Bayesian networks as the selection method to choose the most suitable type of learner for each set of test data.

In time-series data prediction with deep learning, overly long calculation times are required for training. Moreover, a deep learner does not converge due to the randomness of the time-series data. There is also an issue with employing a Bayesian network. In this paper, we try to reduce the computation time and improve convergence by dividing training data into specific clusters using the K-means method and creating multiple deep learners from the learning derived from the divided training data. We also simplify the problem of ambiguity by using a Bayesian network to select a suitable deep learner for the task of prediction.

To demonstrate our model, we use a real-life application: predicting the Nikkei Average Stock price by taking into consideration the influence of multiple stock markets. Specifically, we estimate the Nikkei Stock Average of the current term based on the Nikkei Stock Average of the previous term as well as overseas major stock price indicators such as NY Dow and FTSE 100. We evaluate the validity of our proposed method based on the accuracy of the estimation results.

In this section, we introduce the related works of multiple learners.

In ensemble learning, outputs from each learner are integrated by weighted averaging or a voting method [

Our proposed method, which will be described later, is based on the same notion as the bagging method used in ensemble learning where training data are divided and independently learned. The difference between our proposed method and the bagging method is the division method, the integration of multiple learners (the method of selecting suitable learners for each set of test data) to improve learners’ accuracies in their acquisition of the material. Therefore, similar to Takahashi and Asada [

As we mentioned, because the information processing mechanism of the brain is not fully understood, it is possible to develop new learners by imitating the information processing mechanism of the brain. In this research, we hypothesize that the brain forms multiple learners in the initial stage of learning and improves the performance of each learner in subsequent learning while selecting a suitable learner.

To design learners based on this hypothesis, it is necessary to find ways of constructing multiple learners, selecting a suitable learner, and improving the accuracy of each learner by using feedback from a particular selected learner. Hence, we assume that multiple learners have the same structure. The learners are constructed by the clustering of input data. Selection of a suitable learner is conducted with a naive Bayes classifier that forms the simplest Bayesian network. Furthermore, after fixing learners, we construct a Bayesian network and predict outcomes without changing the Bayesian network’s construction. However, it is preferable to improve each learner’s performance and the Bayesian network by using feedback gained from the selected learners. This will form one of our future research topics.

In the next section, we propose a method of constructing a single, unified learner by using multiple deep learners. Moreover, in Section 3.2, we propose a method of selecting a suitable learner with a naive Bayes classifier.

In the analysis of time-series data with a deep learner, the prediction accuracy is uneven because the loss function of certain time-series data does not converge. It is commonly assumed that the learning of weight parameters does not work due to the non-stationary nature of the data. This problem often occurs when multiple time-series data are used as training data. In addition, the long computational times that are required is also an issue.

To solve these problems, we think it is effective to apply clustering methods, such as K-means, SOM, and SVM, to training data; creating clusters; and constructing learners for each cluster. This is because training data divided into some clusters and multiple learners constructed for each cluster enables us to extract better patterns and improve convergence of the loss function compared to constructing a single classifier from all the training data. This method also enables the reduction of the computational time required. Moreover, classifiers for selecting a suitable learner are constructed from clustering the results of training data. This classifier achieves the task of associating test data to a suitable learner.

_{1}, ∙∙∙, C_{k}) and constructed k deep learners for each class.

vance when we employ K-means. We decided the optimal number of clusters using an X-means algorithm, which calculates the optimal number of clusters best for K-means with the Bayesian information criterion. The X-means algorithm was presented in Pelleg and Moore’s work [

Next, we use three types of deep learners, namely, deep neural network (DNN), recurrent neural network (RNN), and long short-term memory (LSTM). They are all well-established deep learning methods. To identify which deep learner was most suitable for each test data, we also used a naive Bayes classifier (the simplest type of Bayesian network). A naive Bayes classifier was constructed from the clustering results of the training data.

In this paper, we use a naive Bayes classifier to select a suitable deep learner for each set of test data. This method solves the classification problem using Bayes’ theorem. The method hypothesizes conditional independence between feature values and is the simplest type of Bayesian network.

Let

Furthermore, conditional probability

In the case of prediction, the predicted class

Let

We hypothesize that each training dataset is generated independently. Let

Moreover,

Let us assume that

The number of a cluster is defined as

Therefore, in the learning of a naive Bayes classifier, Expressions (5) and (6) are derived from training data X and correct data Y. In selecting a suitable deep learner for test data, we use a naive Bayes classifier that is already trained. The predicted class y is the class of the largest probability for test data, and it is determined by Expression (7). A naive Bayes classifier associates each test data to k deep learners.

As a case study, we predicted the future return of Nikkei Stock Average by applying six economic time-series datasets to our proposed method. From the previous day’s data, we predicted whether the return of the next day’s Nikkei Stock Average would be larger than the average return of Nikkei Stock Average or not.

We predicted the financial time-series using the method proposed in the previous section. The time-series data used in this case study were the closing prices of the daily data of the Nikkei Stock Average, New-York DOW, NASDAQ, S& P500, FTSE100 and DAX from January 1, 2000, to December 31, 2014. The New-York DOW, NASDAQ, and S & P500 are U.S. stock indicators. FTSE100 is a U.K. stock indicator and DAX is a German stock indicator. These data were sourced from Yahoo Finance [

However, some dates do not show all 6 stock prices because the dates of holidays in each country are different. In such cases, we assumed that markets that had no data due to holidays remained unchanged and adopted the previous day’s stock prices. We defined data from 2000 to 2013 as training data and data from 2014 as test data. Because time-series data typically has strong non-stationary tendencies, it is difficult to deal with them in their raw format. Thus, we transformed stock price data to returns.

Let time-series data be

Return

We conducted the Dickey-Fuller test to check stationarity of return

We now present the experimental results of the prediction of financial time-series data. From today’s data

As ^{2} [s], 3.796 ´ 10^{2} [s], and 1.531 ´ 10^{2} [s], respectively.

In

F-value | accuracy | |
---|---|---|

Conventional method | 0.6140 ± 0.00293 | 0.6577 ± 0.0148 |

Proposed method | 0.5854 ± 0.01131 | 0.6877 ± 0.00411 |

where the superscript (n) denotes the n-th number of the training data.

Next, we present the results of our proposed method. In this experiment, we constructed multiple deep learners in accordance with our method. The construction of the multiple learners and the production of the predictions are as follows.

After we applied X-means to training data and determined the optimum division of number K, we constructed k clusters with K-means and k deep learners. We applied test data to a naive Bayes classifier learned by clustering the results of training data. With this naive Bayes classifier, we associated each test dataset to a suitable deep learner and predicted whether the following day’s retun of Nikkei Stock Average would be above the average or not. Five experiments were also conducted in order to measure F-value, accuracy, and computational time.

F-value of multiple DNN, RNN, and LSTM were 58.54%, 72.40%, and 81.42%, respectively. The accuracy of multiple DNN, RNN, and LSTM were 68.77%, 72.62%, and 69.08%, respectively. In addition, the computational time of multiple DNN, RNN, and LSTM were 2.064 ´ 10^{2} [s], 2.077 ´ 10^{3} [s], and 1.533 ´ 10^{3} [s], respectively.

The results of each experiment are summarized in Tables 1-6. The top row of each

Computational time [s] | |
---|---|

Conventional method | 3.235 ´ 10^{2} ± 2.578 |

Proposed method | 2.064 ´ 10^{2} ± 6.642 |

F-value | accuracy | |
---|---|---|

Conventional method | 0.7155 ± 0.02712 | 0.7169 ± 0.02657 |

Proposed method | 0.7240 ± 0.01284 | 0.7262 ± 0.01389 |

Computational time [s] | |
---|---|

Conventional method | 3.796 ´ 10^{3} ± 1.259 ´ 10^{1} |

Propose method | 2.077 ´ 10^{3} ± 3.591 ´ 10^{1} |

F-value | accuracy | |
---|---|---|

Conventional method | 0.6973 ± 0.02192 | 0.5369 ± 0.02462 |

Proposed method | 0.8142 ± 0.000615 | 0.6908 ± 0.002107 |

Computational time [s] | |
---|---|

Conventional method | 1.532 ´ 10^{3} ± 6.041 |

Propose method | 1.533 ´ 10^{3} ± 6.797 |

Moreover, we show the change in error functions when our method was applied. The optimum division number derived from X-means varies depending on how the initial clusters in the algorithm of X-means were decided although the behavior of the error functions showed similarity.

As an example, we present graphs illustrating how loss functions changes. With the X-means algorithm, the optimum division number N was determined and training data was divided into N classes from C_{1} to C_{N}. The number of each cluster for three deep learners is as follows.

In our research, we hypothesized that the brain forms multiple learners at the initial stage of learning and improves the performance of each learner while selecting the most suitable learner in subsequent learning tasks. In this paper, we proposed a method of constructing multiple learners and a method of selecting a suitable learner for each dataset.

Our proposed method is as follows:

1) The optimum division number of clustering is determined using X-means.

2) Training data is divided using K-means and multiple learners for each cluster constructed with DNN, RNN, and LSTM.

3) A naive Bayes classifier is constructed by the clustering result of training data.

4) A suitable deep learner for each test dataset is selected with the constructed naive Bayes classifier.

5) Prediction is conducted by the selected learner.

Predictive experiments on financial time-series data of six stock indicators were performed using the proposed method. Our experiments suggest that when multiple learners are used, most loss functions decrease compared with the case

Cluster | Training | Test |
---|---|---|

C_{1} | 37 | 2 |

C_{2} | 2069 | 184 |

C_{3} | 74 | 2 |

C_{4} | 71 | 2 |

C_{5} | 131 | 7 |

C_{6} | 11 | 0 |

C_{7} | 234 | 3 |

C_{8} | 427 | 20 |

C_{9} | 598 | 40 |

Cluster | Training | Test |
---|---|---|

C_{1} | 28 | 0 |

C_{2} | 2129 | 188 |

C_{3} | 14 | 0 |

C_{4} | 138 | 8 |

C_{5} | 44 | 0 |

C_{6} | 8 | 0 |

C_{7} | 96 | 2 |

C_{8} | 403 | 18 |

C_{9} | 191 | 4 |

C_{10} | 568 | 38 |

C_{11} | 4 | 2 |

C_{12} | 5 | 0 |

C_{13} | 6 | 0 |

C_{14} | 3 | 0 |

C_{15} | 15 | 0 |

Cluster | Training | Test |
---|---|---|

C_{1} | 2129 | 187 |

C_{2} | 37 | 2 |

C_{3} | 84 | 2 |

C_{4} | 74 | 2 |

C_{5} | 133 | 7 |

C_{6} | 568 | 39 |

C_{7} | 224 | 3 |

C_{8} | 403 | 18 |

where all data are learned by a single learner. In the case of using multiple LSTM, F-values are improved greatly compared to using multiple DNN and RNN. Conversely, the accuracy in the case of using multiple LSTM was a little higher than that of multiple DNN. However, it was a little lower than the accu-

racy in the case of use of multiple RNN. Furthermore, when LSTM was used as multiple learners, the computational time became shorter than when an RNN was used.

These results indicate that our proposed method enables us to deal with the non-stationary nature of time-series data and extract more accurate patterns.

We suppose that LSTM is especially effective in the prediction of time-series data that have remarkable features. In our proposed method, the division of time-series data by K-means clustering corresponds to extracting the remarkable features of such data. We believe that it is possible to improve the proposed method further by determining more suitable parameters for deep learners according to each cluster.

We propose a new method of constructing multiple deep learners and determining which deep learner is in charge of the test data with a naive Bayes classifier. Experiments suggested that when multiple learners were used, the loss functions showed a decreasing trend as compared with the case where all the data were learned by a single learner. As a result, F-values and the accuracy of our method are better than those of the conventional method. Moreover, our proposed method also shortens the computational time required.

Concerning this research topic, the future issues under consideration are as follows:

First, the validity of the method of assigning test data will be considered. In this paper, we used a naive Bayes classifier to assign test data to a suitable learner. However, in terms of the prediction method, it is also possible to use the K-means method or SVM instead of the naive Bayes classifier. It is necessary to compare the experimental results of our method with research using K-means or SVM.

Second, improving each learner and the Bayesian network itself by using feedback from a selected learner is considered. In this paper, after fixing multiple learners, we constructed a Bayesian network and performed predictive experiments without changing the construction of the Bayesian network. However, considering the information processing mechanism of the human brain, it is preferable to give feedback on prediction result to learners and the Bayesian network.

Third, case studies will be conducted using the proposed method with different data. In this paper, we applied financial time-series data to our method. It is considered that depending on the data, the deep learner’s method of producing optimum prediction results and the method of assigning test data to multiple learners change. We experimented after deciding the learner’s method and the method of assigning test data in advance. However, a future development would be to construct a framework that could mechanically determine which model would give the best predictions based on the data provided.

Kobayashi, S. and Shirayama, S. (2017) Time Series Forecasting with Multiple Deep Learners: Selection from a Bayesian Network. Journal of Data Analysis and Information Processing, 5, 115-130. https://doi.org/10.4236/jdaip.2017.53009