Design of Cross-Product Arbitrage Strategy in Forward Market

All investors are speculators. They profit from longing an asset and selling it at a higher price or shorting an asset and buying it at a lower price. This is the fundamental concept of arbitrage. Although it sounds simple, arbitrage does not always work. Therefore, researchers have developed systematic and scientific statistical arbitrage approaches for investigation. In this article, we dived into forming pair trading portfolios by using the cointegration analysis method. The objects we investigated are egg, corn, and soybean meal in the future market of China. In the forming stage of the strategy, we proved the existence of a cointegration relationship among the three pairs, namely the egg-corn pair, the egg-soybean meal pair (and the corn-soybean pair. In the back-test study, both the egg-corn pair and egg-soybean meal pair are profitable.


Introduction
The future contract allows traders to lock the prices of underlying commodities or assets. Future contracts have lower transaction cost, more flexible operating system, and future contracts are typically highly leveraged which would create more potential profits or risks, which attracts many speculators and institutional investors. Three major exchanges in China were established in the 1990s, respectively Zhengzhou Commodity Exchange (ZCE) in 1990, Dalian Commodity Exchange (DCE) in 1993, and Shanghai Futures Exchange (SHFE) in 1999. In specific, SHFE trades metals and energy mostly, while ZCE and DCE mainly trade agricultural commodities. One of the features of China's commodity future market is that a large proportion of trade occurs domestically even though China Technology and Investment is one of the largest commodity importers in the world. In addition to that, as a means of curbing excessive speculation and preventing distortions in the spot market, the Chinese government implements stringent price limits and position limits according to Ao, J. & Chen, J. (2020). As a result of frequent government intervention, the development of future market is sluggish. In our research, we found that the egg has great market potential as the demand is numerous in China and the egg market is not overly saturated. There are roughly around 3.95 billion laying hens in the world and China accounts for 35% of that volume according to Wu, Y.F. & Fu, Q. (2018). Nowadays, increasing numbers of specialized and scaled hen breeders have shown a promising future for the egg market.
There are approximately 67,500 firms related to egg production in China in 2022. Meanwhile, the price fluctuation of eggs is evident since egg is easily affected by climate, pandemic, etc. According to Wu, Y.F. & Fu, Q. (2018), they used descriptive statistics and BP models to analyze the price fluctuation of eggs in both the long-run and short-run. This high volatility led to uncertainty of breeders' profits and given the increasing numbers of people participating in the egg industry, we can foreshadow that the numbers of traders in egg future contract will increase to avoid risks. Based on the context of egg spot and future market, we managed to use corn and soybean meal, two highly related products to form arbitrage pairs. In the production of eggs, solely chicken fodder takes accounts for around 60% -70%. Most importantly, corn and soybean meal are the essential ingredient where corn takes about 64% of fodder while soybean meal takes around 26% based on research of Kaiqi, Z., Ronghua, J. & Zhinan, L. (2019).
The figure below demonstrates the total future turnover of three agricultural futures, namely the egg, soybean meal, and corn. The turnover data started from 2013 to 2022, and is collected from CSMAR ( Figure 1).  Soybean meal had the greatest volume in turnovers and presented the most volatile trends. It peaked around 2016 with turnovers of around 8.0e. In contrast, egg, and corn futures turnover showed a relatively unpopular market where the turnover for corn fluctuated between 2.0e. Moreover, since the egg market was listed on the future market in 2013, the turnover had been moving between 1.0e. Nevertheless, all three futures turnovers hit the bottom in the first two quarters of 2022.
According to the publication of the pair trading articles by Gatev et al. (2006), which are frequently implemented in financial markets, we intend to find the pairs that possess the characteristics of mean reversion. In other words, the pair which used to share synchronizing prices will eventually return to the average price of the entire data set. On the premise of this property, we adopted cointegration and time series analysis approach to form arbitrage strategies and empirical models. However, time series analysis is often considered to be non-stationary which is a stochastic progress and seems to be the nature of economics. Nevertheless, by using the cointegration method we can testify the stationarity of a time series. In a short version, there are two steps. First, construct linear functions for pairs and run regression on the time series. Second, use the ADF test for testing the stationarity of estimated residuals of the time series. The fundamental tick is to construct a linear stationary time series from two non-stationary time series. The cointegration underlies the linear relationship of two non-stationary time series to be traded as one asset and was first discovered by Engle and Granger (1987). Cointegration can be applied in many areas of study, for instance, Yang, J., Li, Z. & Wang, T. (2021) conduct their result of price discover function of future market using cointegration. In this article, we abstracted our data with two trading frequencies from the Wind database, respectively five minutes of high-frequency trading data and daily trading data. The results indicated that three pairs, namely the egg-corn pair, the egg-soybean meal pair, and the corn-soybean pair, have a cointegration relationship. The importance after we find the cointegrated pairs is to confirm an estimated hedge ratio, and the parameter of selling out and buying in, etc. Implement different indexes and adopt JoinQuant a quantitative back-test platform, performing back-test data we have concluded different profitable results in two frequencies.
The difference in this paper is that regardless of the prosperity of the commodity future market in China, it is still unsophisticated comparing to western countries. There are only a handful of people who investigated these three products through the cointegration and time series analysis approach and de- nutes high-frequency data, which will deliver us varied results from data in the same period.
To be more comprehensive, the following article is arranged in this sequence.
Section 2 will articulate the designing of arbitrage strategy step-by-step from the examination of the cointegration relationship to the settings of the back-test.
Section 3 will thoroughly demonstrate the statistical results and the most important indexes in the back-test results. Section 4 summarize the results of our backtest and evaluate the main contribution and innovation of this paper.

Methodology
In this paper, we investigated the possibility of two-by-two arbitrage combinations of egg, soybean meal, and corn, as soybean meal and corn are the two most important ingredients in chickens' fodder and account for the major costs in the production of egg. Stübinger, J. & Bredthauer, J. (2017) they construct statistical arbitrage strategy based on different approaches, however, our core strategy is applying statistical arbitrage pair trading with different frequency data. In statistical arbitrage, the arbitrage opportunity occurs as a consequence of market inefficiency. According to Gatev et al. (2006), pair trading is a quantitative arbitrage strategy which is based on two steps. First, find historically price-synchronous futures that moved together. Then, upon divergence occur, long undervalued future while short overvalued future to form a hedge or arbitrage portfolio. In Shen, L., Shen, K., Yi, C., & Chen, Y. (2020), they have explained and demonstrate it thoroughly. We collected our data from Dalian Commodity Exchange on the Wind database and to be accurate, we selected five minutes closing price of the three commodities from 12/31/2021 9:05 to 5/31/2022 14:59 as the first testing sample and daily closing price of the three commodities from 12/31/2021 to 5/31/2022 as the second testing sample. The data we selected would be used to testify the cointegration relationship between two commodities and if the relationship exists, we will adopt JoinQuant,the quantitative trading platform, to back-test the pairs in the same period.

Unit Root Test
The assumptions we made before examining the cointegration relationship were both pairs are non-stationary time series and of order 1. Presume that two time series t X and t Y are integrated of order 1, denoted as I (1). The Cointegration relationship only exists if the two-time series can combine to a linear function of z Y X = − α . Henceforth, we need to examine whether time series in pairs are I (1).
In order to find the cointegration relationship, we first need to form an arbitrage combination.
The variables used in the combination are listed as follows (Table 1).
In this paper, we first analyze the correlation of the trading assets by using STATA. Then for each variable, we apply the Augmented dickey-fuller test for L. Y. Huang If reject the null hypothesis, there is no unit root: Thus, the time series is stationary.
In this paper, our Cointegration models are as followed:

Cointegration Test
The cointegration relationship is the prerequisite for us to find the ratio of pairs trading and the spread series of two commodities.
The mean spread series is defined as:

JoinQuant
To back-test our cointegrated future pairs, we need to find the ratio of trading. We did this on the foundation of OLS regression. OLS regression will provide the constant value of β term that indicates the ratio. Applying the actual ratio of fodder and egg in egg-laying chicken breeding enterprises, we found that 2.1 kg of fodder can produce approximately 1 kg of egg. Thus, as mentioned previously about the ingredient of fodder, we concluded that for each 500 kg of egg, a breeder enterprise needs to use 315 kg of soybean meal and 735 kg of corn. Given that information, we established a parameter to adjust the ratio of trading to correspond to the realistic laying hen industry. Eventually, our trading ratios are 50:24 for egg and soybean meal, and 16:40 for egg and corn. We will not investigate the pair of corn and soybean meal, as our priority is to form a strategy based on egg.
On JoinQuant, we need several basic indexes to set up a back-test including commission charge for buy, sell and close, initial margin, initial capital (Table 2).
In addition to the basic indexes, we need to set up a series of signals for taking the position, closing the position, and stopping loss. We used the Z-value in the normal distribution (denoted as Q) to times the standard deviation σ from Mspread as the threshold for trading signals. According to previous study, Vidyamurthy (2004), who implied the optimal threshold where open a position is 0.75σ, close a position as 2σ and stop-loss as 0. Moreover, based on the studies of Gatev et al. (2006), who set a fixed threshold defined two unconditional standard deviations from sample spread. However, we noticed that in our work, most historical spread will either not reach 2 standard deviations or performs low profitability at Q = 0.75. Therefore, we have implemented hundreds of times of adjustments to settle the range of Q value and the signal for close position and stop-loss for each pair. The range of Q value will be between 0.45 -0.75 and the signal for close position and stop-loss for each pair will be listed in next table (Table 3 and Table 4).    (Table 5).  Table 6). The long-run trend for the three markets is similar regardless of the different levels of fluctuation. The graph shows the BC has the most evident fluctuation, then EC, and lastly the least fluctuated CC. Corn takes account for the greatest proportion of cost in egg production and therefore they are related the strongest with 0.908. CC and BC as two highly related agricultural products who share a relatively strong correlation of 0.768. Finally, EC and BC are correlated by 0.606.

2) Unit root test
After establishing the existence of the possibility of mean reversion and arbitrage, we use the ADF test to identify the stability of our time series and avoid spurious regression.
ADF test (Table 7) indicates that three variables are insignificant and fail to reject the null hypothesis. Therefore, the unit root exists in the original form and the time series are non-reposeful. We then constructed difference equations to testify whether the variables are first-order integrated (Figures 4-6).

3) Cointegration test
Residual constant term is as followed ( Table 9).
As if the ADF test (Table 10) indicates the corresponding residuals are firstorder integrated time series, then the corresponding variables will have a cointegration relationship that allows us to build arbitrage strategies.
Thereafter, we showed all three combinations have cointegration relationships and proved the existence of mean reversion and arbitrage opportunity.
Then, the following is to find the Mspread in to determine the trading signals (Figures 7-9).
Based on      EC, BC, and CC have relatively strong correlations and the two-by-two combination attributed with the relatively stable price difference. Considering the cost distribution of egg production spent on corn (64%), it is reasonable to suggest that the correlation between EC and BC has a relatively significant correlation. To analyze the data from the graph, our results of correlation are as shown (Table 12).
According to the matrix of correlation (Table 12), the among three two-bytwo combination EC and CC have the strongest correlation coefficient (0.909), while BC and CC followed with 0.770, and EC and BC with 0.604. Using both graphic and correlation analyses, we have established the possibility of mean reversion and arbitrage opportunity.

2) Unit root test
Applying Stata to run the ADF test (Table 13) indicates that BC and CC are significant and reject the null hypothesis whereas P-value for EC is insignificant, which fails to reject the null hypothesis. Therefore, BC and CC are stationary time series while EC is non-reposeful time series and has presence of unit root.
Next step is to exam whether EC is integrated of order 1.
The graph ( Figure 11) and the ADF test (Table 14) show that EC is firstorder integrated and has no unit root; it rejects the null hypothesis. EC is a stationary time series after diff. This allows us to examine the cointegration relationship.

3) Cointegration test
The constant is residual term is as followed (Table 15).
To testify for the cointegration relationship of three combinations, we proceeded an ADF test on the residual term (Table 16). Table 12. Correlation-5-min.

Significant
It is inferable that e 1 is a non-reposeful time series in its original function since Egg is a first-order integrated time series. Henceforth, we implemented the first order difference to e 1 (Table 17).
Thereafter, we showed all three combinations have cointegration relationships and proved the existence of mean reversion and arbitrage opportunity.
To find the trading signals, we need to calculate the Mspread (Figures 12-14). Descriptive statistics Based on Table 18, Mspreads have 4395 number of observation and the approximate means are zero. Mspread 2 has the smallest standard deviation of 117.148 followed by 224.121 (Mspread 1) and 226.745 (Mspread 3). The price fluctuation for the three combinations is around the mean value 0. This further satisfies the condition for mean reversion and arbitrage. By abstracting data in sample to back-test our strategy while applying threshold from the adjustment phase, as well as setting corn dominant contract as a benchmark and frequency as 5-min every bar, we have back-tested our three combinations as followed (Table 19).

Stimulation Result
To interpret our results, we found from our table that 0.5 for EB pair and 0.45 for EC pair maximized the profit. Especially the result for EB, gaining 14.69% rate of return and 42.38% annual rate of return and this outperform all other pairs.  Daily frequency Daily frequency in comparison to 5-mins high frequency showed less sensitivity. The changes in Q value affected the result insignificantly as a result of less population of data. Nevertheless, the results of two pairs in the same range shared some similarities as below (Table 20).

Conclusion
A statistical arbitrage strategy designed for the three commodity future contracts is discussed in this paper. We used the cointegration method to confirm that these three future contracts have the characteristics of mean-reverting. These three future contracts were used to form pairs to arbitrage. We first set the trading ratio as 50:24 for egg-soybean meal, and 16:40 for egg-corn. Then we specifically focus on the threshold trading on the back-test platform. We found when the Q (t-value) is at 0.5, the performance of the egg-soybean meal pair in 5-min every bar excels yielding 14.69%. Egg-soybean meal pair in daily frequency is optimal at the Q value = 0.45 with a yield of 16.61%. Meanwhile, the egg-corn pair in both 5-mins and daily frequency yield 4.01% and 4.33% respectively. It performs best when Q (t-value) is 0.45. Although daily frequency results outperformed 5-mins results, this is because our trading signals may not be optimal.
Since the 5-mins back-test presents better flexibility.
This paper shows that despite China's prosperity, the commodity futures To analyze the effect of hedging, they used OLS, B-VAR, and ECM models. Alternatively, we are dedicated to arbitrage. As well, this article has the advantage of combining daily frequency data with five minutes high-frequency data, resulting in a variety of results from data acquired in the same period.
Nonetheless, this paper still has many flaws. The threshold is set in an invariant trading trigger which is rather inefficient as in a normal distribution mean, since a certain percentage of the population is neglected during the back-test.
Hence in the future, we will implement machine learning to compare different approaches for selecting thresholds. For instance, the artificial neural network thresholds used in Roa, A. A. (2018) may make our strategy more precise and accurate. Furthermore, Zhao, Z., Zhou, R., & Palomar, D. P. (2019) research on unified optimization framework is also intriguing.
In conclusion, we illustrated the feasibility of pair trading for egg, corn, and soybean meal future contracts. Investors and arbitragers can use this article as a reference to apply to their own situations.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.