Application of Rescaled Range Analysis Method to Ship Flow in Container Ports: Time Series Structure and Long-Range Dependence Analysis

Based on the time series of the number of ships 
calling at the container port, this paper takes Shanghai Port and Singapore 
Port as examples to analyze the statistical characteristics of ship flow time 
series in container ports. Then, the rescaled range analysis (R/S) method is 
used to study the long-range dependence (LRD) of ship flow sequence structure 
in maritime container ports and further analyze the influence of trend 
component and periodic component on Hurst exponent. The results indicate that 
both the ship flow time series of Shanghai Port and Singapore Port show the 
feature of nonlinear and nonstationary from 2013 to 2017, and the data of the 
two ports have sudden changes in August 2016. They all show specific long-range 
dependence behavior, but the long-range dependence of ship flow series in 
Singapore Port is relatively weak. Besides, the trend component and periodic 
component have significant impact on the Hurst exponent of the time series. 
Hence the trend component and periodic component should be removed in advance 
for long-range dependence analysis.


Introduction
Maritime ship flow is the traffic flow which is formed by a large number of ships sailing along the route, and it also is the main research object of maritime transportation management. The analyses of the trends, sudden changes, periodicity and long-range dependence (LRD) for the time series of the number of ships calling at container ports are the basic way to understand the dynamic evolvement of ship flow in the global container port system. The analysis of long-range dependence (LRD) is the premise of the existence of fractal in traffic flow. In addition, it is possible to make short-term prediction of traffic flow only when there is fractal in traffic flow. As a consequence, the analysis of LRD has widely applied in modeling, prediction, and complexity analysis of maritime transportation.
The researches on container shipping industry are mainly focused on port competition and cooperation (Kou & Luo, 2016;Lam & Yap, 2011;Liu, Wilson, & Luo, 2016;Yap, Lam, & Notteboom, 2006), port spatial evolvement (Cullinane & Khanna, 2000;Hayut, 1981;Hayuth, 1988;Le & Ieda, 2010;Li, Luo, & Yang, 2012;Notteboom, 2010), route planning, and shipping network design at present (Gelareh, Nickel, & Pisinger, 2010;Jin, Li, & Chen, 2012;Zhao et al., 2016). Based on the ship AIS data, the characteristics of maritime traffic flow can be mined directly from the actual operation data, and the data can reflect the inherent regularity of maritime traffic flow precisely. There are not too many data mining analyses based on the measured data in the previous researches, and compared with the analysis considering the port as the research object, relatively more studies using AIS data to consider ship obstacle avoidance and maritime traffic accidents (Felski & Jaskólski, 2012;Wu et al., 2016;Zhang et al., 2016;Zhang et al., 2015).
The global container shipping system is a complex system coupled with economic, trade, political factors, and natural factors, and the ship flow in the shipping system shows high nonlinearity. There is little application and research in the field of shipping though nonlinear theory has been widely used in the analyses of road traffic and air traffic flow systems (Wang & He, 2003;He et al., 2016;He & Feng, 2004). The rescaled range analysis (R/S) and detrended fluctuation analysis (DFA) are widely used to study the LRD behavior of nonlinear time series in hydrology (Guo, Liu, & Zhang, 2013;Wang, Jiang, & Chen, 2006), finance (Wang & He, 2018;Yu & Wu, 2015), and air traffic flow (Wang, 2019).
In these researches, He et al. (2016) analyzed the traffic flow time series of urban expressway ramps by using DFA method. As a result, they found that the data on weekdays have more significant LRD behavior than that on weekends. Wang (2019) used surrogate data method to test the nonlinearity of air traffic flow time series. After taking the time series of 5-minute scale as an example, they analyzed the data by R/S method. The results showed that air traffic flow time series has the characteristics of strong self-similarity and LRD. When it comes to the ship flow research, the key question is whether the maritime ship flow data have nonlinear and LRD characteristics, just like highway traffic flow and air traffic flow.
This study intends to analyze the statistical characteristics, stationarity, trends, periods, and sudden changes of the time series of ship flow in container ports. After that, it intends to study the LRD characteristic for time series of ship flow in container ports by using the Hurst exponent based on the R/S method and further study the specific effects of the trend component and periodic component in the time series on the Hurst exponent. The purposes of this paper are to reveal the regularity and influence factor of ships calling at container ports, verify the applicability of the R/S method in the analysis of ship flow time series, and provide a research basis for the modeling and prediction of maritime transportation.

R/S Method
R/S method was first proposed by British scientist Hurst in 1951 when he was studying hydrological time series. Since then, it has been widely used in the study of long-range dependence in nonlinear time series (Bassingthwaighte & Raymond, 1994). Long-range dependence, also known as long memory, is mainly used to study the self-similarity between partial and overall data series from the perspective of long memory or persistence.
Correlation coefficient or power spectral density analysis is generally considered in the discussion of long-range dependence. However, it is difficult to comprehensively reveal and study the internal characteristics of nonstationary time series by traditional methods. As a method of analyzing nonlinear time series, R/S is widely used because of its nonparametric characteristic. Besides, it is not necessary to assume that the potential distribution is Gaussian distribution (He & Feng, 2004;Wang, Jiang, & Chen, 2006;Zhao & Wu, 2014;Jiang & Deng, 2004). Researches have shown that a large number of hydrological data and financial data show specific LRD behaviors, especially the time series of China's stock market. The calculation process of R/S is as follows: where ( ), 1, 2, 3, t t ξ = is a time series set, τ is any positive integer, and (ξ) τ is the mean sequence of the original series. X(t, τ) is the cumulative deviation, R(τ) is the range of the cumulative deviation sequence, and S(τ) is the standard deviation. Hurst exponent can be calculated by least squares method or slope of the graph for (lnτ, ln(R/S)), according to (7).

Hurst Exponent and Long-Range Dependence (LRD)
Hurst exponent is a nondimensional scale parameter, which is one of the important indicators of the law of scale invariance of dynamical systems. The value of Hurst exponent is greater than 0, generally between 0 and 1. When H ≥ 1, it indicates that the time series process has infinite variance and is not a smooth series. We can judge whether the time series is completely random or there is the trend component, and the trend is persistent or anti-persistent according to the value of Hurst exponent. The relationship between the value of Hurst exponent and the LRD of series are as follows: If 0 < H < 0.5, it indicates that the time series has long-range dependence, and the overall trend in the future is opposite to that in the past. In other words, the increasing trend in the past indicates a decrease in the future, and vice versa. This phenomenon is called anti-persistence. The closer the H value is to 0, the stronger the anti-persistence is.
If H = 0.5, it indicates that there is no long-range dependence between the past and the future, and the time series is a completely independent process.
If 0.5 < H < 1, it indicates that the series has a long-range dependence, and the change in the future has a consistent trend with the change in the past. The overall increasing trend in the past indicates that the overall trend will increase in the future, and vice versa. This phenomenon is called persistence. The closer H is to 1, the stronger the persistence is.

Data Collecting
This paper constructs time series based on the number of container ships calling at Shanghai Port and Singapore Port from January 1 st , 2013, to December 31 st , 2017, to analyze the time series structure and LRD of ship flow in container ports. The source of the data is the major shipping companies, which are fetched by the shipping big data platform built by our research group. Our research group dedicates to capturing the public data including freight rates and routes of shipping companies and shipping websites by using python, and we established a shipping big data platform for analyzing the shipping market after integrating and analyzing data. The data is processed into a ship flow time series with monthly time scale. The original data are shown in Figure 1. As shown in Figure 1, there are significant differences in the number of ships calling at port between Shanghai Port and Singapore Port from observation period. From the perspective of the number of ships calling at the port, the number of Shanghai Port showed a trend of annual growth from 2013 to 2016. As the most important ports in China, especially in terms of containers, Shanghai Port also is one of the most important transit ports in northeast Asia. The number in Singapore Port fluctuates, but the number of ships calling at Singapore port is more than that at Shanghai port, because Singapore port is closer to the main shipping channel of the world than Shanghai port, and it is also one of the most important transit ports in the world. In addition, they both fluctuate up and down with time, showing nonlinear and nonstationary characteristics. However, this fluctuation is generally not chaotic, and it connects with the period of seasons and months. Moreover, both fluctuations plummeted from August to December in 2016, which may be related to the bankruptcy of Hanjin Shipping, the seventh largest shipping company in the world and the largest shipping company in South Korea, at the end of August 2016. Besides, the number of ships calling at Shanghai Port showed an overall upward trend before August 2016. In contrast, the ship flow in the port of Singapore Port fluctuated greatly during this period. Table 1 shows the basic statistics of the number of ships calling at the two major container ports, Shanghai Port and Singapore Port. According to the mean value, Singapore Port is still the busiest maritime transit terminal in the world during the observation period, with an average of monthly calling container ship is 1019, which is much higher than that of Shanghai Port though the container throughput of Shanghai Port ranks first in the world. The mainly reason is that the port of Singapore is based in the shipping route between the Pacific Ocean and the Indian Ocean and has a unique geographical location. In terms of the standard deviation, it can be seen that the value of Singapore Port series is less than that in Shanghai Port. It indicates that the number of calling ships in Singapore Port is relatively stable, and the higher fluctuation range of the time series of Shanghai Port.

Time Series Structure Analysis of Ship Flow in Container Ports
Skewness can reflect the symmetry of the distribution. As shown in Table 1, the skewness of the ship flow series of the two ports is greater than 0 in the ship flow series with monthly time scale. It indicates that the ship flow series is a graph of right-skewed distribution. The frequency and probability density distribution graph of the two ports' monthly ship numbers in Figure 2 can further  The results in Figure 2 show that the series of the two ports are right-skewed, which verifies the results about the skewness of the two ports in Table 1.
Kurtosis reflects the sharpness of the image: the greater the kurtosis is, the sharper the center point on the image is. It can be used to measure the degree of data aggregation in the center. In this paper, the "excess-kurtosis" method is used to calculate the kurtosis. The value of kurtosis in the normal distribution is subtracted from the original kurtosis in order to make the comparison standard 0. E k > 0 means peak, E k = 0 means flat peak, and E k < 0 means low peak. As the results shown in Table 1, the E k values of Shanghai Port and Singapore Port are both less than 0, which are all low peak. The results indicate that the observed port ship flow data are not so concentrated. And there is a tail shorter than the normal distribution, which is similar to the rectangular uniform distribution. As a consequence, the monthly ship flow series of the two ports are not normal distribution.
ADF test and KPSS test are complementary in the stationarity test of time series. Hence the combined application of the two methods can improve the reliability of the results (Li, Liu, & Yang, 2017). In this paper these two methods are jointly used to test the ship flow time series. The ADF test results show that the p values of each series are greater than 0.05 and have unit roots. As a result, the original hypothesis (ADF null hypothesis is nonstationary time series) cannot be rejected. On the contrary, the KPSS test results show that the p values are less than 0.05 and reject the original hypothesis (KPSS null hypothesis is stationary or has a stationary trend). Therefore, the time series of ship flow in Shanghai Port and Singapore Port are nonstationary time series.
Consider Seasonal-Trend decomposition procedure based on Loess (STL) decomposition as the method of analyzing the stationarity, trends, and seasonality of the number of container ships calling at each port. As a common algorithm in time series decomposition, STL decomposition is used to decompose X t , a set of time series into trend component, seasonal component, and remainder component based on locally weighted regression (LOESS), as shown in (8): STL decomposition can estimate trend items and seasonal items steadily without being distorted by abnormal behavior in the data. It can be applied to assign that the period of seasonal items is any integer multiple of the sampling interval. In addition, it can also decompose time series with missing values. A set of time series may be formed by random component and some or all of the components including trend, season, and cycle terms. Hence the STL method is used to decompose the time series of the number of container ships calling at Shanghai Port and Singapore Port from January 1 st , 2013, to December 31 st , 2017. The results are shown in Figure 3. In Figure 3, the four images in each subgraph from top to bottom are original series, seasonal component, trend component and residual error component. The following three components in each picture explain the composition of the raw data above. Certain upward trends emerge in the ship flow of the two ports from 2013 to 2016 based on the trend terms, which is more significant in Shanghai Port. The ship flow series of Singapore Port shows higher fluctuation. The downward trends in the two ports appear in 2016-2017 and begin to rise again after 2017. The frequency change is close to the seasonal frequency, and the period is one year from the point of view of the seasonal terms. The seasonal subseries plot of the ship numbers in the two ports is drawn to further research of its seasonal periodic variation, as shown in Figure 4.
It can be seen from Figure 4 that the container ship flow series of Shanghai Port and Singapore Port basically show a consistent monthly cycle. The ship flow in the two ports increases slowly from January to March, and the sequences remain unchanged from April to June. In the second half of the year, the ship flow series in the two ports increase significantly in July, and the flow begin to decline in August and September. Then the sequences decrease again in October, and the series begin to rise again in November and December. Shipping demand comes from trade demand, so the periodic change of maritime container ship flow sequence is closely related to the global economy and trade. The increase in ship flow in November and December is mainly due to the increase in people's material demand because of the influence of Christmas and Spring Festival.

Long-Range Dependence Analysis of Ship Flow in Container Ports
LRD has wide existence in natural systems. LRD refers to the significant autocorrelation in the observations of different time series, the impact on the state at present and the future from the past, and mostly the attenuation of autocorrelation function of time series in the form of power law. Jiang and Deng (2004) and Taqqu and Teverovsky (1997) pointed out the great influence on the estimation of Hurst exponent in the time series because of the period and trend terms. Consequently, the analyzed time series should be preprocessed to eliminate the influence of period and trend. Therefore, this paper analyzes the time series of ship flow in the two ports after removing trend component and periodic component by using R/S method based on the results of STL decomposition, and the estimated result of Hurst exponent is shown in Figure 5. Based on Figure 5, the Hurst exponent of container ship flow time series in Shanghai Port and Singapore Port are both greater than 0.5 and less than 1. It indicates that the container ship flow series of the two ports have certain long-range dependence, and the series of the two ports have fractal characteristics on the monthly time scale. However, it needs to further study whether the container ship flow series of the two ports have fractal characteristics on other time scales.
Moreover, the standards of LRD between the two ports are different. The Hurst exponent of the container ship flow series in Shanghai Port is 0.637. It indicates that the container ship flow series in Shanghai Port is not a completely random time series, and it has a specific positive LRD and self-similarity. Nonetheless, the experimental results can be used as a basis to support the predictability of the monthly flow time scale of container ships in Shanghai Port at least. The Hurst exponent of Singapore Port is only 0.522, which is very close to 0.5. The standard of LRD in Singapore Port is weaker than that of Shanghai Port, and the LRD behavior is not significant. In other words, the number of ships calling at Singapore Port reflects the quantity information of a series of ship flow in history to a certain extent. However, more information needs to be considered to predict the future according to the current ship flow data. The specific factors to be considered will be made a further multifractal study on the ship flow series in the following research, which will be determined according to the scaling theory.
Scholars did not specify particularly how the trend component and periodic component affect the Hurst exponent of time series though Jiang and Deng (2004) pointed out that trends and periodic components have impact on the calculation of Hurst exponent of time series. Hence this paper further studies the The Hurst exponent is calculated by R/S method after the seasonal period of the original series is removed. The result is shown in Figure 6. It can be seen that the Hurst exponent of the two ports increase after removing the seasonal period, and the value is still between 0.5 and 1, and is close to 1, indicating that the series has significant long-range dependence behavior. The apparent trend component is found in the time series structural analysis of the container ship flow series of the two ports, which means that the long-range dependence behavior is more significant after only the seasonal component is removed. Figure 7 shows the results of the Hurst exponent calculated by the R/S method after the trend component is removed from the original sequence. The value of the Hurst exponent in the two ports is reduced to about 0.3, less than 0.5 after only the periodic component is removed from the original ship flow series. It indicates that the time series have long-range dependence, but anti-persistence. That means the port ship flow series show certain mean-reverting behavior after  removing the trend component. In other words, the decreasing trend in the past may lead to an increasing trend in the future, that is, negative correlation.

Conclusion
Taking Shanghai Port and Singapore Port as examples, this paper is based on the analysis of the structural characteristics of monthly time series in container ship flow calling at ports, such as trend, periodicity, and stationarity. The LRD characteristics of monthly container ship flow series in Shanghai Port and Singapore Port from 2013 to 2017 are empirically studied by using R/S analysis method. In addition, the specific effects of trend component and periodic component on the LRD of time series are further discussed. The following conclusions are drawn: 1) There are significant differences in the number of monthly container ships calling at Shanghai Port and Singapore Port from 2013 to 2017, and the series in both ports fluctuate up and down with time, showing nonlinear and nonstationary characteristics. Moreover, the monthly ship flow time series of Shanghai Port and Singapore Port are not normal distribution.
2) The bankruptcy of Hanjin Shipping, the seventh largest shipping company in the world, had significant impact on ship flow in major ports around the world at the end of August 2016. The number of container ships calling at Shanghai Port and Singapore Port plummeted during this period.
3) Singapore Port is still the busiest maritime transit terminal in the world during the observation period, with an average of monthly calling container ship is 1019, which is much higher than Shanghai Port though the container throughput of Shanghai Port ranks first in the world. 4) A number of monthly container ships calling at Shanghai Port and Singapore Port have certain characteristics of long-range dependence, which can be used for the short-term prediction to some extent. However, the value of Hurst exponent in Singapore Port is only 0.522, and the LRD is weak. Therefore, more information needs to be considered to predict the future based on the current ship flow data. The specific factors to be considered will be made a further multifractal study on the ship flow series in the following research, which will be determined according to the scaling theory.

5) The trend component and periodic component have significant impacts on
the Hurst exponent of time series, so they must be removed in advance when analyzing the long-range dependence of time series. The trend component will make the calculated Hurst exponent greater than true value, leading to the data with no long-range dependence behavior come to the wrong conclusion of long-range dependence; similarly, the existence of significant periodic component will make the value of the Hurst exponent smaller, and draw the conclusion that the data have anti-persistence in a certain time range.
In this paper, the R/S method is used to discuss the structural characteristics and long-range dependence of container ship flow time series. The results show that the R/S method is applicable to the study of Marine ship flow, and it can make a practical contribution to the modeling, forecasting and shipping market analysis of transportation. However, there is no doubt that there are still some data quality problems in the data of AIS-based container ship calling at ports used in this paper. The data used do not reflect the real absolute data, but reflect the structural relative data because the AIS tracks of some ships are artificially closed or the coverage of AIS base stations in some ports is limited. This research mainly considers the structural characteristics of data, so the existing data are able to support the conclusion, i.e., the applicability of R/S method in the research of time series of container shipping. However, the missing real data may affect the accuracy of the application of the method to some extent.