_{1}

In this paper, we provide insights on the prediction of asset returns via novel machine learning methodologies. Machine learning clustering-enhanced classification and regression techniques to predict future asset return movements are proposed and compared. Numerical experiments show good applicability of the methodologies and backtesting unveils superior results in China A-shares markets.

Predicting asset returns is of central importance for empirical, theoretical and practical considerations. Essentially, prediction is the computation of the orthogonal projection of future asset returns, i.e., asset price movements, often modeled as stochastic processes, onto the information structure that we observe today. Described in mathematical language, prediction involves the computation of conditional expected asset returns.

From an empirical point of view, [

In this paper, we propose new methodologies to compute the conditional expected asset returns under the risk-decomposition framework of [

The contribution of this paper is three-folds. First, methodologically speaking, we combine unsupervised learning and supervised learning techniques to enhance the computation efficiency for regression and classification problems. Second, under our framework, machine learning techniques and classical function approximation methodologies jointly deliver high-performance methods. This helps because, under limited computational resources, we can use as many past data as possible to train the model and meantime enjoy a fast-computational speed. Third, in terms of prediction, we propose to measure the expected absolute returns and the signs of the future price movements separately to increase the forecasting accuracy. Moreover, new stock selection criteria are proposed. This methodology enables us to use a larger number of factors and past data than the artificial neural network approach and meanwhile achieving a faster computational speed. Empirical studies in China A-share markets reveal superior results.

The organization of this paper is as follows. Section 2 describes the proposed techniques. Section 3 discusses backtesting methodologies and the results and Section 4 concludes. All the theoretical justifications can be found in the Appendix.

^{1}That is, when the number of clusters is 1, our method degenerates to the brute-force machine learning regression and classification algorithms.

In this section, we introduce the main methodologies of this paper. We show that both the currently used machine learning regression and classification problems can be embedded into our theoretical framework as special cases1. Then, we show that we can, instead of predicting asset returns in a brute force manner, enhance the prediction precision by separating the prediction of magnitude of the asset price changes with the directions.

Let us first assume that the conditional expected asset returns can be expressed as continuous functions of risk factor values. The case with discontinuous functions is analogous with mollifiers. According to the definition of a continuous function, if its argument values are sufficiently close, then the dependent variable values are also very close. If we further assume first order differentiation of the target function, it can be shown that, in a small region of the domain of the continuous function, we can approximate it well with linear functions. This observation inspires us to use a clustering-based approach to enhance the classification or regression prediction for asset returns.

Suppose that the risk factors are denoted by an r-dimensional vector X t . The target function is φ . We are trying to compute φ ( t , h , X t ) = E t [ R t + h ] . Suppose that at time t, the state space of the risk factor X t is D t . In what follows, we are seeking a partition of this state space, denoted by { U t k } k = 1 K such that in each of the subspace U t k , we use a linear function φ k ( t , h , X t ) = a ( t , h , k ) + b ( t , h , k ) X t to approximate φ . The rigorous mathematical justifications of this approach are given in the Appendix.

The steps are as following. Given m assets, whose rate of return processes are denoted by { R t i } i = 1 m , suppose that we want to consider T periods of data. Therefore, there are m × T observations in total for the r-dimensional factors. Partition, using MiniBatchKMeans function in Python, the m × T observations into K clusters. In each of the cluster, use a simple neural network or just a linear regression model to fit the data via equation E t [ R t + h i 1 X t ∈ U t k ] = a ( t , h , i , k ) + b ( t , h , i , k ) X t . Then, for each new observation X t + h , we first use predict function in python to decide which cluster it belongs to, then use equation E t [ R t + 2 h i 1 X t ∈ U t k ] = a ( t , h , i , k ) + b ( t , h , i , k ) X t + h to compute the expected return.

Taking a two-category logistic regression-based classification as an example, we know that a classification problem is essentially a regression one. Therefore, we can use the clustering-based method introduced in Section Clustering-Based Regression to run the regression and conduct the classification. Multi-category classification problems are analogous.

In this empirical study, we use two sets of machine learning architectures, which will be documented below.

Previous methods try to build regression models to forecast the future asset returns. Assume that the confidence interval and point estimate are ( c l , c u ) and c p . If c u − c l is large, the point estimate is useless since it is indicated that the forecasting errors might be large, and the realized values can deviate from the point estimate c p . We hope to propose a method to narrow the confidence interval and therefore make the point estimate more reliable. The key is to separate the estimation of the magnitude and sign of the future asset price returns. Denote by R t the asset return at time t. Then, we will try to take two steps. The first step is to compute E t [ | R t + h | ] , E t [ R t + h 2 ] and therefore V A R t [ | R t + h | ] . The second step is to use a two-category classification algorithm to label 0 if R t + h ≤ 0 and 1 otherwise. If the probability associated with the classification of label 0 is larger than a threshold α , then we categorize that the future return will be negative. On the other hand, if the probability of label 1 is larger than α , then we categorize that the future return will be positive. After we determine the sign of the future expected returns, we can use the result from the first step, i.e., the estimates of E t [ | R t + h | ] as the magnitude of the expected returns. If R t + h is estimated to be positive and E t [ | R t + h | ] − q α V A R t [ | R t + h | ] > θ t or R t + h is estimated to be negative and E t [ | R t + h | ] + q α V A R t [ | R t + h | ] < − θ t , then we go long or short the asset accordingly, where q α is an appropriate quantile and θ t is the return deduction because of the transaction cost. The regression and classification can, of course, be done via deep learning techniques. To reduce computational resource requirement, both the regression and classification can be done by introducing the clustering method described above. We will mainly test this methodology with China A shares.

This method is a direct application of the clustering-based regression method introduced in Section 2.1. For each asset, we forecast E t [ R t + h ] , E t [ R t + h 2 ] and V A R t [ | R t + h | ] . Decide on a percentage α and compute the information ratio E t [ R t + h ] V A R t [ | R t + h | ] . Rank the information ratio in the cross-section of asset universe, long the top α percent and short the bottom α percent. However, for this methodology, we try not only to predict the forward 1 period return for each asset, but the entire forward n period-curve as well. This means at each moment in time t, we will predict { E t [ R t + i h ] } i = 1 n . Then, long at bottom and sell at peak of the curve. Of course, we can consider the forecasting accuracy by looking at the confidence intervals. That is, whenever the accuracy exceeds some predetermined thresholds, we can view the forecasted values as valid. We will test this method using China A-shares.

Because we are forecasting short-term asset returns, we use five technical factors, namely: volatility, skewness, kurtosis of asset returns, past 1 period and T period moving-average of asset returns as our predictive features.

The China A-share market data, including the stocks traded in Shanghai and Shenzhen stock exchanges, are downloaded from Wind terminal. Time ranges from 2008-1-2 to 2019-5-23.

For Strategy 1, we use a rolling window of 250 days to compute the factor values. In order to train the clustering and regression model, we use a panel data of past rolling 100 days. To compute the regression model, we use a clustering-based approach with 100 clusters in all and we choose the five clusters with top performance to trade. This strategy is long only. Transaction cost and slippage are assumed to be unilateral 0.15%.

For Strategy 2, we use a rolling window of 250 days to compute the factor values. In order to train the regression model, we use panel data of past rolling 100 days. To compute the regression model, we use a clustering-based approach with 100 clusters in all and we choose the five clusters with top performance to trade. This strategy is long-short.

The NAV plot of a long-only strategy based on the methodology in Section 3.1.2 is shown in

Annual Ret | Annual Vol | Info Ratio | Hit Rate | Calmar | MDD |
---|---|---|---|---|---|

203.56% | 36.94% | 5.49 | 72.47% | 69.47 | 2.93% |

Annual Ret | Annual Vol | Info Ratio | Hit Rate | Calmar | MDD |
---|---|---|---|---|---|

283.79% | 46.87% | 6.05 | 73.21% | 82.98 | 3.42% |

In this paper, we propose a clustering-based methodology to compute the expected asset returns and create trading strategies based on it. Numerical results show superior performance in China A-Share markets. Future research includes applying the proposed approach in LSTM or reinforcement learning contexts.

The author declares no conflicts of interest regarding the publication of this paper.

Zhang, L.L. (2019) Asset Return Prediction via Machine Learning. Journal of Mathematical Finance, 9, 691-697. https://doi.org/10.4236/jmf.2019.94035

We will only consider the case where the functions are continuously defined on a compact domain of R r . The extension to general functions which are defined in R r is straightforward with mollifiers and the assumption that the distributions of asset returns are exponentially decaying at tails. We first need the following assumption.

Assumption A.1 (On Function Representation). For any asset return R, we have φ ( t , X t ) = E t [ R t + h ] , i.e., the conditional expected asset returns can be expressed as functions of state variables.

Lemma A.2 (On Lead-Lag Regression). Suppose that Φ is an appropriate function space. Then, we have

arg min φ ∈ Φ E [ | ψ ( X T ) − φ ( t , X t ) | 2 ] = arg min φ ∈ Φ E [ | φ * ( t , X t ) − φ ( t , X t ) | 2 ]

where φ * ( t , X t ) = E t [ ψ ( X T ) ] .

Proof of Lemma A.2. The proof of this lemma follows from Theorem 8 of [

Theorem A.3 (On Polynomial Regression). Assume that ψ is a continuous function defined on a compact domain U, { U t k } k = 1 K is a partition of domain U and

φ ^ k , J ( t , X t ) = arg min p J ∈ P J ( U t k ) E [ | ψ ( X T ) − p J ( X t ) | 2 ]

where P J ( U t k ) is the space of all polynomials, whose coefficients depend on time t and T, with degree less or equal to J. Then, we have

φ ( t , X t ) = lim max 1 ≤ k ≤ K d ( U t k ) → 0 ∑ k = 1 K φ ^ k , J ( t , X t ) 1 X t ∈ U t k

here distance d ( U ) = sup x , y ∈ U | x − y | .

Proof of Theorem A.3. The proof of this theorem follows from Lemma A.1, Theorem 23 of [

Under Assumption A.1 and Theorem A.3, increasing the computational budget will ensure that we will obtain the true solution asymptotically.