Optimal Kelly Portfolio under Risk Constraints

Xiaoyu Xing; Ziyue Wang; Mingzhou Zhang

doi:10.4236/eng.2025.173014

Engineering > Vol.17 No.3, March 2025

Optimal Kelly Portfolio under Risk Constraints

Xiaoyu Xing^*, Ziyue Wang, Mingzhou Zhang
School of Science, Hebei University of Technology, Tianjin, China.
DOI: 10.4236/eng.2025.173014 PDF HTML XML 32 Downloads 169 Views

Abstract

The Kelly strategy is renowned for its theoretically optimal long-term growth, however, its practical application in financial markets is constrained by several limitations, including high-risk exposure and the absence of clearly defined profit-loss ratios. These challenges make it difficult to widely adopt the Kelly strategy, especially in market characterized by high volatility. To address these issues, this paper integrates contraction estimation and ridge regression techniques into the Kelly framework. By quantifying portfolio unit risk and incorporating it as a penalty term in the optimization model, we refine the asset allocation process. Additionally, machine learning methods are employed to enhance portfolio construction, where clustering is used for asset selection, and neural networks are applied to predict return performance. Empirical analysis using data from the A-share stock market demonstrates that the proposed approach not only preserves the high return potential of the Kelly strategy, but also effectively mitigates the risks associated with market volatility, delivering superior performance in medium-term to long-term investments.

Keywords

Asset Selection, Kelly Strategy, Portfolio Selection, Shrinkage Estimation

Share and Cite:

Xing, X. , Wang, Z. and Zhang, M. (2025) Optimal Kelly Portfolio under Risk Constraints. Engineering, 17, 222-240. doi: 10.4236/eng.2025.173014.

1. Introduction

To diversify risk and enhance returns, constructing a financial investment portfolio is of considerable theoretical and practical significance. Determining the optimal asset allocation ratio is a crucial step in this process. Markowitz [1] proposed the mean-variance framework, in which the core principle is to balance the risk and return of assets. Because the mean-variance portfolio fails to fully account for the risks associated with extreme market events and assumes constant returns, a portfolio constructed solely on the mean-variance theory is likely to go bankrupt [2]. In contrast, the Kelly portfolio proposed by Kelly [3] and Latane [4] avoids bankruptcy when implemented repeatedly, assuming asset returns are independent and identically distributed [5]. In other words, the Kelly model enhances and extends the mean-variance framework by focusing on long-term asset management. Moreover, due to its objective of maximizing logarithmic return, the Kelly portfolio consistently outperforms other portfolios over the long term with probability one [6] [7].

However, since the Kelly formula aims to maximize the expected geometric mean return of the portfolio without considering risk factors, this strategy is very aggressive [8]. Secondly, maximizing the geometric mean rate of return also requires a precise estimate of the expected rate of return. This sensitivity to errors in parameter estimation leads to various problems, many of which are similarly encountered in mean-variance models. Such challenges encompass sensitivity to input data [9] [10], inadequate diversification [11] [12], and poor out-of-sample performance [13] [14]. Furthermore, some assumptions of the Kelly formula are often impractical in reality. For instance, investors are required to not only accurately estimate the returns and risks of each asset, but also accurately define the probability of obtaining profits and the associated odds. Computational evidence indicates that it would take at least 4700 years for the Kelly portfolio to consistently outperform other portfolios with a 95% confidence level. In summary, using the traditional Kelly model presents challenges in achieving the dual objectives of maximizing long-term wealth while minimizing risks in short-to-medium-term in practical applications.

In view of the issues with parameter sensitivity and practicability, many scholars suggest using shrinkage estimation methods to improve its robustness and stability within practical portfolio construction. The fundamental principle of shrinkage estimation is to adjust sample estimates towards target estimates [15] [16], aiming to balance the low bias of sample estimates against the low variance of target estimates, thereby reducing estimation errors. Empirical studies demonstrate that incorporating shrinkage estimation into mean-variance optimization methods makes them more robust and effective in real portfolio selection, particularly in balancing risk and return across different market conditions [17] [18].

This paper first addresses the high-risk nature of the Kelly formula by introducing a risk term through shrinkage estimation. Traditional shrinkage estimation balances sample estimates and target estimates through weighting. To achieve the objective of minimizing risk while maximizing returns under the Kelly criterion, we incorporate the concept of ridge regression, adding risk as a penalty function to the original formula. Specifically, we introduce a term representing the distance between sample estimates and risk estimates, wherein a greater risk imposes a higher penalty on the sample estimates. To determine the shrinkage intensity parameter, we construct and solve a bi-level optimization problem. By incorporating risk as a penalty term and utilizing shrinkage estimation, the proposed portfolio aims to maximize asset growth while minimizing risk.

After obtaining the optimization model, we use machine learning methods to construct the investment portfolio. First, for asset selection, based on Markowitz’s mean-variance theory, to diversify risk, the assets in the same portfolio should have low correlations. We use an unsupervised clustering algorithm to categorize A-share assets based on the correlation between their return sequences. By data-driven methods, we aim to achieve low intra-cluster correlation and high inter-cluster correlation. Then, one stock is selected from each cluster to form the investment portfolio. Secondly, to predict the return sequences required in the target optimization function, we choose the Long- and Short-Term Memory (LSTM) neural network algorithm, achieving a 70% similarity in predicting returns over the ensuing 180 days. Finally, in solving the optimization objective, we select the Simulated Annealing algorithm, which is effective in solving complex optimization problems by simulating the gradual lowering of temperature to find the global optimum, thus avoiding the pitfalls of local optima [19].

This paper makes several significant contributions to the existing literature. First, to address the high-risk issue inherent in Kelly portfolios, we deviate from the shrinkage estimation methods that set target weights. Instead, we incorporate ridge regression into the objective function to achieve the dual goals of maximizing returns and minimizing risk from a model properties perspective. Second, regarding the selection of the shrinkage coefficient, we formulate a bi-level optimization problem to determine its optimal value, rather than merely discussing its impact on portfolio weights. Third, we employ clustering algorithms for asset selection, setting ourselves apart from traditional industry-based classifications by utilizing actual data correlations. Additionally, we integrate machine learning algorithms at various stages, leveraging data-driven methods to avoid the parameter definition issues and unrealistic assumptions inherent in the full Kelly formula. This approach ensures that the constructed investment portfolio is more aligned with real-world conditions and suitable for practical investment. Therefore, this study is of both theoretical and practical significance.

This paper proceeds as follows. Section 2 introduces the methodology, including a review of the Kelly model, a brief introduction to the shrinkage estimation method, and the proposal of an improved Kelly portfolio objective function. Section 3 presents an empirical analysis based on market data, highlighting the machine learning algorithms employed at each step, supported by real data. Finally, Section 4 concludes the paper.

2. Model Setup

2.1. Kelly Portfolio Optimization Theory

The Kelly criterion states that in a repetitive gambling or investment scenario with positive expected returns, the optimal proportion to bet each period should be determined [3]. Unlike Markowitz’s mean-variance model, which focuses on both risk and return, the Kelly strategy emphasizes the long-term growth of portfolio wealth. Assuming the gambler has perfectly accurate information, the optimal choice in this case is to bet all the funds, using the indicator $G$ to measure the geometric growth rate of the funds:

$G = {(\frac{V_{T}}{V_{0}})}^{\frac{1}{T}} = \lim_{t \to \infty} \frac{1}{N} \log \frac{V_{T}}{V_{0}},$ (1)

where the initial wealth of the investor is $V_{0}$ , the investor’s final wealth is $V_{T}$ , and $t = 1, 2, \dots, T$ .

Subsequent scholars extended Kelly’s ideas to fields similar to gambling, including stock investment, futures, and other financial areas [20] [21], because investors in the financial industry also have the need to maximize returns. For the asset allocation of a single position, each position adjustment can be regarded as a bet, and determining the optimal asset allocation ratio for each adjustment can be seen as the optimal bet ratio in a gamble.

In the context of a multi-asset investment portfolio, assuming there are $n$ assets available for investment in the market, let $r_{t} = (r_{t 1}, r_{t 2}, \dots, r_{t n})$ represent the vector of asset returns during investment period $t$ . Denote $w_{t} = (w_{t 1}, w_{t 2}, \dots, w_{t n})$ as the asset weight vector. After period $T$ , the final wealth $V_{T}$ is given by:

$V_{T} = \prod_{t = 1}^{T} (1 + w_{t}^{T} r_{t}) V_{0} .$

Substituting this into the previous Equation (1), we get:

$G = {\prod_{t = 1}^{T} (1 + w_{t}^{T} r_{t})}^{\frac{1}{T}},$ (2)

We take the natural logarithm of Equation (2), which is a monotonic transformation, and denote the wealth growth rate as $G^{r}$ :

$G^{r} = \log {\prod_{t = 1}^{T} (1 + w_{t}^{T} r_{t})}^{\frac{1}{T}} = \frac{1}{T} \sum_{t = 1}^{T} \log (1 + w_{t}^{T} r_{t}),$

Assuming that the asset returns are independent and identically distributed over the entire investment period, by the law of large numbers, we get:

$lim_{T \to \infty} G^{r} \to E [log (1 + w^{⊤} r)] .$

According to Samuelson [22], asset prices exhibit minimal fluctuations in the short term; consequently, the associated returns are relatively low, close to zero. For cases where short-term returns are near zero, the lower-order terms of the Taylor expansion provide a good approximation without the need to compute higher-order complex terms. In practical applications, the error from using the Taylor expansion for approximation is negligible. Subsequent empirical studies have also confirmed this conclusion [23]. Therefore, we can obtain:

$\begin{matrix} E [log (1 + w^{⊤} r)] \approx E [w^{⊤} r] - \frac{1}{2} E [w^{⊤} r^{2}] \\ = w^{⊤} E (r) - \frac{1}{2} w^{⊤} [Var (r) + E (r) E {(r)}^{⊤}] w . \end{matrix}$ (3)

Based on the above, the Kelly portfolio aiming to maximize logarithmic asset returns can be formulated as the following optimization problem:

$\begin{array}{l} max w^{⊤} E (r) - \frac{1}{2} w^{⊤} [Var (r) + E (r) E {(r)}^{⊤}] w \\ s .t . \sum_{i = 1}^{n} w_{i} = 1. \end{array}$ (4)

2.2. Shrinkage Estimates

Due to the sensitivity of the Kelly formula to input parameters (such as returns), which can lead to instability, enhancing its robustness and stability has been a key focus of research. Shrinkage estimation methods have been employed to adjust the Kelly formula because they can improve estimation stability by accounting for the uncertainty in parameter estimates [24]. In portfolio selection, Bayesian shrinkage is a commonly used approach. By incorporating prior knowledge into the model, it improves the reliability of estimates and the generalization ability of the model, especially in situations with limited data or high noise [25]. The basic form of shrinkage estimation can be expressed as:

$w^{*} = (1 - ϕ) w_{sample}^{*} + ϕ w_{target}^{*},$

where $w_{sample}^{*}$ denotes sample weights, $w_{target}^{*}$ represents weights derived from the target for shrinking sample values, and $ϕ$ signifies the shrinkage intensity. Three forms of shrinkage estimation parameters exist: asset return vectors, covariance matrices, and portfolio weights [18]. If $ϕ$ is accurately computed, the out-of-sample performance of the portfolio will significantly improve.

However, adjusting the optimal weights derived from the modified Kelly formula in the form of Equation (4) by combining them with weights of another portfolio contradicts the Kelly criterion’s objective of maximizing return. Therefore, we need to reconsider how to integrate risk control with shrinkage estimators. Considering that solving portfolio weights is essentially a bi-objective optimization problem—maximizing return and minimizing risk—we can incorporate a penalty term into the objective function inspired by ridge regression. This approach helps enhance shrinkage estimation and contributes to controlling portfolio volatility.

Ridge regression introduces a penalty term to address multicollinearity and improve estimation stability. Given a standard regression model $Y = X β + ϵ$ , Ridge regression minimizes the following objective function:

$\min_{β} \sum_{i = 1}^{n} {(y_{i} - X_{i} β)}^{2} + λ {‖ β ‖}^{2}$

where $λ$ is the shrinkage intensity. This formulation is equivalent to imposing a Gaussian prior on $β$ , leading to a Bayesian interpretation. Similarly, in portfolio optimization, introducing a penalty term can mitigate estimation error and improve robustness, especially when dealing with unstable return estimates.

Specifically, we expand the objective function Equation (3) as follows:

$\begin{array}{l} max (1 - ϕ) {w^{⊤} E (r) - \frac{1}{2} w^{⊤} [Var (r) + E (r) E {(r)}^{⊤}] w} - ϕ {‖ w - w_{target}^{*} ‖}_{2} \\ s .t . \sum_{i = 1}^{n} w_{i} = 1 \end{array}$

For the setting of the $w^{*}$ term, we aim to achieve a smaller weight for assets with higher risk. Therefore, we select the Sharpe ratio $\frac{E [r_{i}] - r_{f}}{σ_{i}}$ as the standard for measuring the unit risk and return. A higher value of the Sharpe ratio indicates lower return per unit risk, hence the corresponding weight should be reduced. By normalizing the Sharpe ratio, we obtain $w^{*}$ . The penalty term represents the distance between the model weights and the shrinkage target.

The choice of the Sharpe ratio as the shrinkage target is motivated by its ability to adjust returns based on risk. However, alternative risk-adjusted return measures, such as the Sortino ratio (which penalizes downside volatility) or the Calmar ratio (which accounts for maximum drawdown), could also be considered. Empirical studies [2] [13] have shown that Sharpe ratio-based portfolio adjustments enhance stability under normal market conditions, but sensitivity to tail risk remains a concern.

Consequently, the final form of the optimization model is:

$\begin{array}{l} max (1 - ϕ) {w^{⊤} E (r) - \frac{1}{2} w^{⊤} [Var (r) + E (r) E {(r)}^{⊤}] w} \\ - ϕ {‖ w - \sqrt{\sum_{i = 1}^{n} {[w_{i} - (\frac{{\bar{r}}_{i} - r_{f}}{σ_{i}}) (\frac{1}{\sum_{j = 1}^{n} \frac{{\bar{r}}_{j} - r_{f}}{σ_{j}}})]}^{2}} ‖}_{2} \\ s .t . \sum_{i = 1}^{n} w_{i} = 1. \end{array}$ (5)

This adjustment not only retains the robustness improvement of the original shrinkage estimation method, but also significantly changes the model setting goal. According to the definition of the Euclidean norm, the penalty term quantifies the distance between the full Kelly model and the shrinkage target. It reflects the effect and degree of interference of risk on the Kelly portfolio weights. This approach fundamentally differs from traditional shrinkage estimation methods by genuinely incorporating the objective of risk minimization.

2.3. Parameter Adjustments

In traditional shrinkage estimation problems, there are three principal criteria for measuring the shrinkage intensity $ϕ$ :

1) Minimization of Expected Quadratic Loss: This criterion minimizes the expected squared Frobenius norm between the estimated covariance matrix $Σ_{shrink}$ and the true covariance matrix $Σ$ .

$ϕ^{*} = arg min_{ϕ} E [{‖ Σ_{shrink} - Σ ‖}_{F}^{2}],$

where $Σ_{shrink}$ is the shrunk covariance matrix, $Σ$ is the true covariance matrix, and $w$ represents the portfolio weights.

2) Minimization of Portfolio Variance: This approach selects the shrinkage intensity $ϕ$ to minimize the variance of a specific portfolio under the estimated covariance matrix $Σ_{shrink}$ .

$ϕ^{*} = arg min_{ϕ} w^{T} Σ_{shrink} w .$

3) Maximization of Portfolio Sharpe Ratio: This criterion maximizes the Sharpe ratio of the portfolio:

$ϕ^{*} = arg max_{ϕ} \frac{E [R_{p} (ϕ)] - R_{f}}{\sqrt{V [R_{p} (ϕ)]}},$

where $E [R_{p} (ϕ)]$ is the expected portfolio return, $R_{f}$ is the risk-free rate, and $V [R_{p} (ϕ)]$ is the portfolio return variance.

The Sharpe ratio measures the excess return per unit of risk and helps find a balance between risk control and return maximization. By considering both expected returns and volatility, this method effectively addresses market fluctuations and uncertainties, enabling investors to achieve optimal portfolio allocations that balance risk and reward. This approach is crucial in practical investment decisions for enhancing portfolio returns under controlled risk conditions.

Combining Equation (5), we choose the method of Maximization of Portfolio Sharpe Ratio, and formulate the following optimization problem:

$\begin{array}{l} max_{ϕ} \frac{μ_{0}^{T} w}{w^{T} \bar{Σ} w} \\ s .t . w = arg max_{w} (1 - ϕ) {w^{⊤} E (r) - \frac{1}{2} w^{⊤} [Var (r) + E (r) E {(r)}^{⊤}] w} \\ - ϕ {‖ w - \sqrt{\sum_{i = 1}^{n} {[w_{i} - (\frac{{\bar{r}}_{i} - r_{f}}{σ_{i}}) (\frac{1}{\sum_{j = 1}^{n} \frac{{\bar{r}}_{j} - r_{f}}{σ_{j}}})]}^{2}} ‖}_{2}, \end{array}$ (6)

where $\bar{Σ}$ denotes the sample covariance matrix.

As previously discussed, the value of shrinkage intensity $ϕ$ also represents the optimal perturbation path. By employing the criterion of maximizing the Sharpe ratio, which selects the scenario with the highest risk-return trade-off in numerous worst-case scenarios, the optimization problem Equation (6) offers significant advantages in enhancing the conservatism of the original robust optimization problem Equation (5).

Since the objective function is a fraction and exhibits non-convexity, the overall model is also non-convex. Therefore, numerical methods are required to find the optimal solution. Standard optimization techniques, such as gradient descent or quadratic programming, may not guarantee finding the global optimum and are prone to getting stuck in local minima. In this paper, we choose to use the simulated annealing algorithm for optimization. The advantage of this method is that it does not depend on the convexity or gradient information of the objective function, making it effective for solving non-convex optimization problems, such as the one presented in this model. Equation (6) represents a bilevel optimization model, and after solving the constraints, we can use the corresponding outer-layer value as the criterion for the simulated annealing algorithm.

Specifically, for each iteration of the simulated annealing algorithm, we first generate a new solution $ϕ_{new}$ based on the current $ϕ_{k}$ :

$ϕ_{new} = ϕ_{k} + ϵ, ϵ ~ U (- δ, δ)$

where $δ$ is a parameter that controls the magnitude of the perturbation, and the random perturbation $ϵ$ is uniformly sampled from the interval $[- δ, δ]$ . Next, we solve the inner optimization problem to obtain the new weight vector $w_{new}^{*}$ , i.e.

$\begin{array}{l} w_{new}^{*} = arg max_{w} (1 - ϕ) {w^{⊤} E (r) - \frac{1}{2} w^{⊤} [Var (r) + E (r) E {(r)}^{⊤}] w} \\ - ϕ {‖ w - \sqrt{\sum_{i = 1}^{n} {[w_{i} - (\frac{{\bar{r}}_{i} - r_{f}}{σ_{i}}) (\frac{1}{\sum_{j = 1}^{n} \frac{{\bar{r}}_{j} - r_{f}}{σ_{j}}})]}^{2}} ‖}_{2}, \end{array}$

Then, we calculate the new objective function value $f (ϕ_{new})$ :

$f (ϕ_{new}) = \frac{μ_{0}^{⊤} w_{new}^{*}}{{(w_{new}^{*})}^{⊤} \bar{Σ} w_{new}^{*}},$

If $f (ϕ_{new}) > f (ϕ_{k})$ , the new solution is accepted directly. If $f (ϕ_{new}) \leq f (ϕ_{k})$ , the solution is accepted with a certain probability, given by:

$P = exp (\frac{f (ϕ_{new}) - f (ϕ_{k})}{T_{k}}),$

where $T_{k}$ is the current temperature, which gradually decreases. The temperature is updated as follows:

$T_{k + 1} = α T_{k},$

where $α = 0.9$ is the cooling rate, which adjusts the convergence speed and is typically chosen based on experience. Through this process, the simulated annealing algorithm effectively performs a global search within the solution space, avoiding local optima and thus finding the optimal or near-optimal solution.

3. Empirical Analysis

This section validates the performance of the proposed model using empirical data, detailing the steps and results of constructing an investment portfolio based on the improved Kelly criterion. First, it outlines the preliminary asset selection process and the data preparation required for applying Equation (6). Next, four classic portfolio models are introduced as benchmarks to comprehensively evaluate and compare the weighting method derived from the improved Kelly criterion. Finally, the portfolio performance under different rebalancing frequencies is compared to analyze the strengths and weaknesses of this approach in both short-term and long-term investments.

3.1. Asset Selection

We utilized Akshare to obtain historical stock data from China’s major securities exchanges, covering the period from June 2014 to June 2024. The data includes daily opening and closing prices, highest and lowest prices, trading volume and amount, outstanding shares, and turnover ratio.

Initially, from the obtained set of 5312 stocks, we excluded ST stocks, stocks with less than five years of historical data, and those with more than 5% missing data over the past year. This process yielded 2553 eligible stocks. For the missing values of the selected stocks, linear interpolation is applied using the preceding and following prices.

After data preparation and preprocessing, we need to cluster the stocks based on their correlations and select a smaller subset to construct the investment portfolio. First, we apply clustering based on the correlation between price series to differentiate A-share assets. According to Markowitz [1], investors should aim for cross-industry diversification when selecting stocks, as the covariance between companies in different industries tends to be lower. Our chosen clustering method focuses on selecting assets with minimal correlation, aligning with the principle of risk diversification in modern portfolio theory. Therefore, we classify the assets such that the correlation between different categories is minimized, while the correlation within the same category is maximized.

The current authoritative classification standard is the China Securities Market Industry Classification Standard (CSRC), established by the China Securities Regulatory Commission. It categorizes listed companies into several major sectors: Financials, Real Estate, Manufacturing, Information Technology, Energy, Consumer Goods, Transportation, Utilities, and Construction. However, as many companies now invest across multiple industries, these boundaries have become increasingly blurred. As a result, asset selection based solely on industry classification may lead to biases, lag effects, and certain limitations.

In contrast, the categories derived from clustering are based on the correlation of price series, treating correlation as the distance function. Assets with high correlation are grouped into the same cluster, while those with low correlation are separated into different clusters. This approach is entirely data-driven and not constrained by industry definitions, making the classification more reflective of the actual business scope of companies.

The following are the specific steps for the clustering calculation: Firstly, construct the correlation matrix:

$ρ_{i j} = \frac{cov (R_{i}, R_{j})}{σ_{i} σ_{j}},$

where $cov (R_{i}, R_{j})$ is the covariance of the return series of assets $i$ and $j$ , and $σ_{i}$ and $σ_{j}$ are the standard deviations of the return series of assets $i$ and $j$ respectively.

Next, we adopt a hierarchical clustering method, which provides strong interpretability, to analyze the correlation patterns among assets. The correlation matrix is used as the distance matrix to construct a dendrogram. According to Reilly and Brown [26], around 90% of the maximum diversification benefit is achieved with portfolios of 12 to 18 stocks. Therefore, we set a cutoff criterion of 15 clusters for our analysis.

Figure 1. Cluster compositions on 2024-05-01 compared with CSRC sectors.

To examine the composition of the clusters, we compare them with sectors defined by the China Securities Market Industry Classification Standard (CSRC). Each cluster comprises assets with similar correlations, while the correlations between different clusters remain minimal, as illustrated in Figure 1. This correlation-based clustering approach diverges from traditional industry classifications.

We observe that several clusters significantly overlap with CSRC sectors, such as Group 6 (Wholesale and Retail), Group 8 (Information Transmission, Software, and Information Technology Services), and Group 13 (Water Conservancy, Environment, and Public Facilities Management). However, the manufacturing sector classified by CSRC is notably scattered across various clusters. This dispersion is primarily due to over 70% of the more than 5000 assets in the A-share market being classified under manufacturing, highlighting a limitation in classification standards based solely on business scope, which fail to precisely define enterprise operations.

Moreover, there is considerable internal variation within clusters. For instance, Group 4 predominantly consists of real estate companies but also includes businesses from related sectors, such as cement manufacturing, which is an upstream industry for construction. Similarly, Group 11 includes pharmaceutical firms like Jiangzhong Pharmaceutical, biotech companies like Nanhua Biotech, enterprises in health and social work like Aier Eye Hospital, and wholesale retail companies like China National Pharmaceutical. These classifications are logically coherent, as the fluctuations in construction-related assets are highly correlated with building material prices, and pharmaceutical sales are closely linked to the volatility of drug manufacturing prices.

These examples clearly demonstrate that the CSRC industry classification standards cannot accurately categorize assets based on their relevant business activities. In contrast, data-driven clustering algorithms can effectively identify stock groups based on correlation and similarity, which addresses this shortcoming. This characteristic not only allows for more accurate classification of companies but also provides the advantage of discovering “less-known potential stocks” whose price movements resemble those of major firms. Such alternatives can replace “popular” stocks (like Moutai), which are often overheld and, consequently, overpriced. Identifying substitutes can be highly beneficial in actual investment scenarios.

After obtaining the clustering results, we need to select representative assets from each cluster to construct the portfolio. Common criteria for selecting assets include volatility and the Sharpe ratio. Ideally, if returns were perfectly aligned with the clustering results, we would select the stock with the lowest volatility. However, real-world data often deviates from this ideal, making the selection criteria crucial. We use the Sharpe ratio instead of volatility alone for two main reasons:

1) The Sharpe ratio measures return relative to risk, providing a clearer view of excess return per unit of risk, which better aligns with our optimization goals.

2) Focusing solely on volatility without considering returns may not achieve the optimal risk-return balance. The Sharpe ratio offers a simple way to compare risk-adjusted returns, making it more practical for investment decisions.

For these reasons, we select the 15 assets with the highest Sharpe ratios from each cluster.

3.2. Return Forecasting

According to our optimization model Equation (6), accurate forecasting of future asset returns is required to determine $r_{f}$ . This study employs the Long- and Short-Term Memory (LSTM) neural network architecture. Compared to traditional Recurrent Neural Networks (RNNs), LSTM networks utilize gating mechanisms that better capture long-term dependencies and effectively mitigate the vanishing gradient problem. LSTM networks introduce memory cells, replacing the artificial neurons in traditional neural network hidden layers. These memory cells enable the network to retain information efficiently and handle long-range inputs, demonstrating strong predictive capabilities in dynamically evolving data structures.

We divided the dataset into 70% for training and 30% for testing. Using the LSTM network, we predicted future return trends for 90 days, 180 days, and 360 days. The results are as follows.

Figure 2. Forecast 180-day stock return volatility.

Figure 3. Comparison of LSTM-based forecasts and actual price trends.

Figure 2 shows a 180-day price prediction simulation for one of the assets. Figure 3 compares the actual values with the predicted values, where we can observe that under the Long- and Short-Term Memory (LSTM) neural network algorithm, the predicted trend closely aligns with the actual trend. The fluctuations are similar, with both values oscillating within the same range. Based on numerical experiment results, in the 360-day prediction interval, this algorithm achieves a match rate of 70% or higher between the predicted and actual closing prices. The accuracy is even higher for the 180-day and 90-day predictions. This is logical, as shorter prediction intervals tend to be more accurate with the same size of the training set. Since portfolios often require periodic monitoring and adjustments, a one-year prediction horizon is suitable for most portfolio construction strategies.

Figure 4. Training cycle loss function.

Figure 4 shows the loss function for one asset during the long- and short-term forecasts as it changes with training epochs. It can be observed that as the number of training epochs increases, the loss function gradually decreases and stabilizes around 0.21. The training loss for other assets also remains below 0.3, indicating that the model effectively learns the sequence patterns without overfitting.

3.3. Portfolio Backtesting

We construct a portfolio using the selected 15 assets and calculate the corresponding weights based on the proposed optimized Kelly criterion Equation (6). For benchmarking, the results are compared against three classic weighting methods: the mean-variance approach without short-selling, the full Kelly criterion, and the risk parity strategy, which is one of the most popular approaches among portfolio managers.

Table 1 summarizes the performance and risk characteristics of the four investment portfolios constructed using different methodologies over a five-year period. Notably, the optimized Kelly criterion achieved an impressive annualized return of 22.86%, even in a market environment where the other three strategies delivered negative returns. However, it is important to note that this reported return of 22.86% may be subject to biases such as data mining bias and overfitting, especially in the context of backtesting. To mitigate these concerns, we performed out-of-sample testing using a rolling-window approach, recalibrating portfolio weights every 90, 180, and 360 days. This method helps ensure that the reported returns are not overly reliant on past performance and reflect the robustness of the strategy across different market conditions.

Despite its strong theoretical foundation, the Full Kelly strategy can result in significant losses during volatile market conditions, thereby limiting its practical

Table 1. Portfolio performance indicators.

Statistical Indicators	Optimized Kelly	Full Kelly	Risk Parity	Mean-Variance
Sharpe Ratio	1.15	0.97	1.31	1.35
Annualized Volatility	6.35%	4.50%	2.54%	2.86%
Annualized Return	22.86%	−21.44%	−11.13%	−10.83%
Wealth Growth Multiple	1.23	0.79	0.89	0.89
Maximum Drawdown	125.82%	129.98%	44.96%	67.28%
Average Gain/Average Loss	1.07	0.96	1.13	1.11
Standard Deviation of Returns	0.03	0.11	0.03	0.03
Skewness	15.82	−2.15	13.11	−5.80
Kurtosis	375.24	530.07	280.51	352.69
Maximum Consecutive Wins	7	13	7	10
Maximum Consecutive Losses	12	26	10	12

applicability. This is evident in its annualized return of −21.44%, along with a maximum drawdown of 129.98% and a streak of 26 consecutive losses. On the other hand, the mean-variance and risk-parity approaches offer a more balanced risk-return framework, focusing on minimizing risk while aiming for stable portfolio returns. However, their annualized returns of −11.13% and −10.83%, respectively, highlight a trade-off between stability and the potential for profitability.

The price trends of the portfolios under the four strategies, as shown in Figure 5, further illustrate that the risk-parity strategy is relatively the most stable. In contrast, the Full Kelly strategy experienced the steepest decline, with a much sharper slope than the other strategies, especially during the significant market downturn from August 2020 to June 2021. Although the optimized Kelly strategy also exhibited slightly higher volatility and drawdown, it demonstrated a substantial improvement in risk control compared to the Full Kelly approach.

Moreover, the optimized Kelly strategy consistently seized market opportunities during periods of growth, outperforming the other strategies in terms of price

Figure 5. Portfolio price trends under four strategies from 2019 to 2023.

acceleration. For instance, during the market rallies in April 2020, July 2021, and May 2022, the optimized Kelly portfolio experienced rapid upward movement, even surpassing the Full Kelly strategy in terms of speed. Overall, these results indicate that the optimized Kelly strategy effectively balances risk mitigation and return generation, closely aligning with the objectives of our model optimization.

To further demonstrate that the advantages of our portfolio construction method extend beyond simple weight calculations, we selected the SSE 500 and CSI 300 ETFs (stock codes 510500 and 510330) as benchmarks for comparison. These two ETFs are among the most representative in China, designed to track the SSE 500 and CSI 300 stock market indices. Both indices promote industry diversification while optimizing sector allocation. In addition, we perform robustness checks by comparing our strategy’s performance to the benchmarks over different market periods, such as the pre-pandemic, during-pandemic, and post-pandemic phases. This helps ensure the stability and generalizability of the proposed approach. Figure 6 illustrates the changes in return values of these portfolios.

Figure 6. Comparison of the optimized Kelly strategy returns with the SSE 500 and CSI 300 ETFs.

Focusing on the onset of the pandemic (December 31, 2019), we observe that in the initial phase around April 2020, panic induced by lockdown measures led to a rapid market decline. However, unprecedented stimulus measures from governments and central banks, coupled with strong performance in technology stocks, drove the market into a sustained upward trend. During this period, the price of the optimized Kelly strategy surged from a low of approximately 10 to a peak of 18. After August 2020, market adjustments due to policy changes and concerns about overheating and bubble risks resulted in price declines. While both market indices remained relatively stable during this period, the optimized Kelly strategy experienced a rapid drop, leading to cumulative returns that fell below those of the market. This underscores that the foundational goal of our proposed strategy is to pursue high returns. Although it mitigates risk, it cannot completely avoid the impact of downturns. However, as vaccinations rolled out and the market began to recover, the optimized Kelly strategy again exhibited rapid growth.

Performance data from the onset of the pandemic to the present reveals that, although the trends of increases and decreases are generally similar across all strategies, the optimized Kelly strategy stands out in its ability to seize opportunities, leading to a more pronounced upward trajectory. Additionally, it effectively mitigates risk during downturns, further enhancing its long-term performance.

The above results compare the overall performance of the portfolio over the entire period. To address potential overfitting concerns, we further conducted out-of-sample tests using different market periods and recalibrated asset weights every 90, 180, and 360 days. The results indicate that the optimized Kelly strategy maintains its competitive edge over the long term, even when applied to different time windows.

To facilitate a more comprehensive comparison, we employ a rolling window method for analysis. Specifically, we recalibrate asset weights every 90 days, 180 days, and 360 days based on historical performance data. To emphasize the performance of portfolio construction methods over different time lengths, our comparison does not involve reselecting assets but rather adjusting their proportions. We focus on a representative and volatile period from December 31, 2019, to December 31, 2021, which encompasses the onset and conclusion of the pandemic.

Figure 7. Portfolio price trends of three strategies from 2019 to 2021.

Figure 7 illustrates the cumulative returns of the three investment strategies over different time windows during the past two years. It is evident that the three strategies show relatively similar performance in the early stages and final cumulative values, but there are significant differences during the intermediate period. This is primarily because each recalibration of weights takes into account the performance of the previous window. Specifically, the rolling returns for the 90-day window decreased significantly between March and June 2020, largely due to the high volatility observed from January to March, which increased the predictive uncertainty for subsequent periods, leading to a greater penalty in the model, i.e. increased risk constraints. Similarly, the 180-day window showed significantly lower return volatility from June to December 2020 compared to the previous period. In contrast, the performance of the 360-day rolling strategy remained unaffected by such constraints.

Another noteworthy period is the rapid growth phase from July to September 2021, where data indicate that shorter windows exhibit higher growth rates. This is logical, as short-term adjustment strategies are more sensitive to larger fluctuations, allowing for stability before July. The stable volatility in the previous period results in reduced constraints for the new period, enabling quicker gains when opportunities for rapid growth arise, even surpassing the 360-day adjustment strategy.

However, during the growth phase from March to August 2020, the short-window strategy did not demonstrate the advantage of rapid growth due to its previous high volatility. Regardless of how frequently the portfolio weights are adjusted, a long-term hold of the optimized Kelly portfolio consistently yields returns above the market. In stable market conditions, frequent rebalancing can increase both the potential for profits and losses, as risk constraints are lower; conversely, during significant market fluctuations, frequent adjustments can hasten the return to stability, but while increasing risk constraints to minimize losses, they may also limit potential gains.

Overall, these additional tests demonstrate that the optimized Kelly strategy is robust across various time windows and market conditions, and its performance holds up under different sample periods. The out-of-sample tests help mitigate the concerns of overfitting and provide stronger evidence for its real-world applicability.

4. Conclusions

This paper aims to control the risk associated with constructing investment portfolios based on the Kelly criterion. We achieve this by incorporating a penalty term through the concept of shrinkage estimation and formulating an optimization model in a bi-level structure. Additionally, during the asset selection phase, we cluster the assets based on their joint correlation with all other assets. We provide a rigorous analysis of the construction methodology and offer practical guidance derived from theoretical results. Numerical experiments demonstrate that the optimized Kelly portfolio performs well over the long term compared to market benchmarks and other portfolio strategies. Furthermore, under our asset selection approach, portfolios calculated using three different criteria exhibit a significant advantage over market indices. This indicates that our portfolio construction method excels in both asset selection and weight calculation.

Our work can be extended in several directions. One avenue is to further adjust the penalty term by considering other factors in the investment process, such as turnover rate and transaction costs. This could involve integrating multiple factors, for example:

$\begin{array}{r} P e n a l t y = ϕ_{1} {‖ w - w_{t a r g e t} ‖}_{2}^{2} + ϕ_{2} \cdot T r a n s a c t i o n C o s t (w) + ϕ_{3} \cdot T u r n o v e r R a t e (w) . \end{array}$

Alternatively, we could employ adaptive algorithms from machine learning to adjust the constraints. This approach would allow the model to comprehensively consider various factors in investment, rather than just price volatility. In situations involving short-term holdings or frequent trading, this could reduce the lagging effects of price fluctuations and better leverage the advantages of the Kelly criterion.

Another direction is to refine the price prediction methods. Although the accuracy of predictions is relatively high, the precision of daily price forecasts is lacking. Improving daily price prediction accuracy may require using point-in-time data instead of just daily data. From a modeling perspective, we could change the utilization of predicted values by opting not to rely solely on individual predictions, but instead focusing on performance metrics over a period, such as maximum drawdown or volatility.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Markowitz, H. (1952) Portfolio Selection. The Journal of Finance, 7, 77-91. https://doi.org/10.1111/j.1540-6261.1952.tb01525.x
[2]	Roll, R. (1973) Varying Beta Estimates and the Behavior of Stock Prices. The Journal of Finance, 28, 917-930.
[3]	Kelly, J.L. (1956) A New Interpretation of Information Rate. Bell System Technical Journal, 35, 917-926. https://doi.org/10.1002/j.1538-7305.1956.tb03809.x
[4]	Latane, B. (1959) The Dangers of Overconfidence in Group Decision Making. The Journal of Abnormal and Social Psychology, 58, 267-275.
[5]	Hakansson, N.H. and Miller, M.H. (1975) The Theory of Investment Value. The Journal of Finance, 30, 555-576.
[6]	Breiman, L. (1961) Principal Components Analysis and SVD. The Annals of Mathematical Statistics, 32, 1-11.
[7]	Cover, T.M. (1988) Maximum Likelihood Estimates of the Entropy of a Multivariate Distribution. IEEE Transactions on Information Theory, 34, 1121-1126.
[8]	Ziemba, W.T. and Hausch, D.B. (1986) The Effect of Risk Aversion on Portfolio Performance. The Journal of Portfolio Management, 12, 25-30.
[9]	Michaud, R.O. (1989) The Markowitz Optimization Enigma: Is ‘Optimized’ Optimal? Financial Analysts Journal, 45, 31-42. https://doi.org/10.2469/faj.v45.n1.31
[10]	Xidonas, P., Kourentzes, N. and Psarakis, S. (2017) Forecasting Stock Market Indices Using Support Vector Regression and Ensemble Learning. Expert Systems with Ap-plications, 88, 233-247.
[11]	Green, R.C. and Hollifield, B. (1992) The Effect of Market Frictions on Portfolio Optimization. Journal of Financial and Quantitative Analysis, 27, 397-420.
[12]	Shen, Y., Harris, N.C., Skirlo, S., Prabhu, M., Baehr-Jones, T., Hochberg, M., et al. (2017) Deep Learning with Coherent Nanophotonic Circuits. Nature Photonics, 11, 441-446. https://doi.org/10.1038/nphoton.2017.93
[13]	DeMiguel, V., Garlappi, L. and Uppal, R. (2007) Optimal versus Naive Diversification: How Inefficient Is the 1/N Portfolio Strategy? Review of Financial Studies, 22, 1915-1953. https://doi.org/10.1093/rfs/hhm075
[14]	Kritzman, M. (2010) Risk Parity and Risk Budgeting: How to Create a Better Portfolio. The Journal of Portfolio Management, 36, 37-49.
[15]	Ledoit, O. and Wolf, M.N. (2003) Honey, I Shrunk the Sample Covariance Matrix. The Journal of Portfolio Management, 29, 24-36.
[16]	Ledoit, O. and Wolf, M. (2004) A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices. Journal of Multivariate Analysis, 88, 365-411. https://doi.org/10.1016/s0047-259x(03)00096-4
[17]	Kan, R. and Zhou, G. (2007) Optimal Portfolio Choice with Parameter Uncertainty. Journal of Financial and Quantitative Analysis, 42, 621-656. https://doi.org/10.1017/s0022109000004129
[18]	DeMiguel, V., Garlappi, L. and Uppal, R. (2013) Optimal Versus Naive Diversification: How Different Is the Real World? The Review of Financial Studies, 26, 1-24.
[19]	Kirkpatrick, S., Gelatt, C.D. and Vecchi, M.P. (1983) Optimization by Simulated Annealing. Science, 220, 671-680. https://doi.org/10.1126/science.220.4598.671
[20]	Thorp, E.O. (1992) The Invention of the Options Market. The Journal of Portfolio Management, 18, 14-20.
[21]	MacLean, L.C. (2004) The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market. The Journal of Portfolio Management, 30, 13-24.
[22]	Samuelson, P.A. (1975) Optimization in the Presence of Randomness. The Review of Economics and Statistics, 57, 67-70.
[23]	Andersen, T.G. and Benzoni, L. (2010) The Intertemporal CAPM and the Term Structure of Interest Rates. The Review of Financial Studies, 23, 719-752.
[24]	Browne, M.W. (1996) A Survey of Factor Analytic Methods. Springer.
[25]	Frost, P.A. and Savarino, J.S. (1986) An Empirical Bayes Approach to the Optimal Portfolio Problem. The Journal of Portfolio Management, 12, 27-33.
[26]	Reilly, F.K. and Brown, K.C. (2012) Investment Analysis and Portfolio Management. 10th Edition, Cengage Learning, Mason.

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies