Possibility for Short-Term Forecasting of Japanese Stocks Return by Randomly Distributed Embedding Theory

In this work, we use the model-free framework, named randomly distributed embedding, which is the method that randomly selects variables from the values of many observed variables at a certain time and estimates the state of the attractor at that time, to predict the future return of Japanese stocks and show that the prediction accuracy is improved compared to the conventional methods such as simple linear regression or least absolute shrinkage and selection operator (LASSO) regression. In addition, important points to be considered when applying the randomly distributed embedding method to financial markets, and specific future practical applications will be presented.


Introduction
For the portfolio management in the stock market, predicting accurately the return of stocks to be traded is an important issue. However, the prediction is not easy because financial data have a very low signal to noise ratio, the relationship between the data is intertwined complicatedly and it is difficult to obtain a sufficient number of samples in the time series.
On the other hand, among financial assets, the stock market has the characteristic that the number of stocks is very large and simultaneous measurement is possible although the amount of data in the time series direction is not large. Therefore, it is considered that the randomly distributed embedding method (RDE) [1] has high affinity with the return prediction in the stock market. RDE is a mathematical framework to predict future changes of important target variables with high accuracy from the short-time series data consisting of simultaneous measurements of multiple variables, proposed in October 2018.
In this work, we will evaluate the effectiveness of the randomly distributed embedding method by comparing with the results of the methods using the simple linear regression and the least absolute shrinkage and selection operator (LASSO) regression.

Reconstruction of Attractors
We review the reconstruct theory according to [1].
Analysis of irregular time series signals observed in nature has been studied as "chaos time series analysis". In order to analyze irregular time series data from the viewpoint of deterministic dynamical systems, it is necessary to reconstruct the attractors [2], [3].
The most common method of attractor reconstruction is the reconstruction using the delay attractor.
The delay attractor is a reconstructed attractor of a dynamical system using the delay coordinate system x t x t x t τ τ + +  with respect to a certain variable ( ) k x t (t is time and τ is an interval.).
If the dimension of the delay coordinate system is larger than a certain level, there is an embedding Φ into the reconstructed attractor M from the original attractor of the dynamical system according to Takens' embedding theorem [3] and the generalized embedding theorem [2].
On the other hand, the non-delay attractor is a reconstructed attractor of dynamical system using randomly select m valuables from (m is the same number as the dimension of the delay coordinate system) and the coordinate system composed of them There is also an embedding Γ from the original attractor to the reconstructed attractor N ( [2], [4], [5]).

Randomly Distributed Embedding Method
Randomly distributed embedding method is the method proposed by Aihara et al. [1] in October 2018 for predicting high-order, short-term time-series data with high accuracy.
First, we reconstruct the delay attractor and the non-delay attractor with respect to the observation data According to the embedding theory, there is a diffeomorphism

Application to Japanese Stocks
Next, we consider how to apply the above randomly distributed embedding method to the return prediction of Japanese stocks. The point to be noted in applying this method is that each variable in the observation data is a result from the same dynamical system.
Risk factors often used in the return prediction are unlikely to be attributed to the same dynamical system. On the other hand, each return of individual stocks included in the same industry is likely to be due to the same dynamical system.
So, in this work, we aim at predicting the return of a specific stock using the returns of individual stocks included in the same industry.

Gaussian Process Regression
Gaussian process regression is a nonparametric regression model [4]. Let us as- Then, the optimal estimate is given by ( )

Verification Procedure
The universe is TOPIX500 constitutive brand which is the top 500 stocks with high market capitalization and liquidity of the TOPIX adopted stocks. We apply the randomly distributed embedding method in the several industries using TSE 33 industry. In this work, seven types of industries, construction, chemistry, food, machinery, electronics, pharmaceuticals, and transportations are targeted for forecasting, because the number of stocks is within the industry to some extent and changes in the results of domestic and external demand are also to be examined. With regard to the randomly distributed embedding method, the prediction is performed according to the following procedure according to [1].
The given data is the data at time 1 After that, we estimate the probability density function ( ) p x by performing kernel density estimation from the set of estimates obtained by calculating one step estimation And we calculate the skewness γ of the probability density function, and if γ is 0.5 or less, it is adopted and is determined as estimation. If not, we correct the estimate as follows.
We calculate the in-sample error ( ) ( ) In this work, the estimation period is 2018 and the estimation is performed with 10 L = and 3 s = . The data is the intraday returns of each stocks included in each industry. Then, we predict the intraday returns of each stocks in each industry one period each, and calculate the average value of the mean squared error (MSE) in the whole industry from the actual intraday return over the entire prediction period is an index for prediction accuracy.
As a comparison target, we calculate the average value of MSE with the actual return when each stock is predicted by simple linear regression and LASSO regression when 10 L = using other stocks of the industry without using the randomly distributed embedding method.

Verification Result
We show the result of the experiment in Table 1. As a result, the random distribution embedding method became the most accurate method in all industries. Compared with the other industries, the scope of improvement of this method is larger in food and electronics. As a premise, in order for the randomly distributed embedding method to work, the variables to be analyzed must be in the same attractor. In that sense, compared to the other industries, we can guess that the stocks included in the food and electronics industry are on the same attractor, that is, the relationship between the stocks is relatively close.

Conclusions
In this work, we showed that we could improve the prediction accuracy when we use the randomly distributed embedding method, which is the method of randomly selecting variables from the values of many observational variables at a certain time and estimating the attractor state at that time, for predicting future returns of Japanese stocks comparing with the time when we use simple linear regression or LASSO regression. In addition, it can be inferred that the improvement range of the prediction accuracy is different depending on the type of industry, the nature of the stock group included in the type of industry and the degree to which these stocks are in the same attractor.
As a future perspective of this work, it is possible to aim for more accurate forecasting accuracy by applying randomly distributed embedding method to financial instruments that are likely to be on the same attractor, such as multiple volatility indexes. In addition, it is possible to aim to improve the prediction accuracy by using an algorithm such as LSTM as a regression method used for the randomly distributed embedding method. Furthermore, it is possible to use, for example, for stock selection filtering in investment methods in which the closeness of the nature between stocks is important, such as pair trade, by using the prediction accuracy improvement range from the conventional method according to the randomly distributed embedding method.