^{1}

^{1}

^{1}

In order to satisfy the consumers’ pursuit of diversification of goods, new retail enterprises begin to gradually produce small quantities and various kinds of products, which makes the sales data become more complex and various, and then makes the inventory management more difficult. Therefore, it is very necessary to establish an accurate demand prediction model for the sub-category stratum. In this paper, we firstly consider the effect of external macrofactors on sales, and establish a multiple linear regression model to forecast the sales of the target products. Then we consider the regularity and trendency of previous sales, comparing the fitting degree of different parameter ARIMA models, and finally establish the ARIMA (2, 2, 1) model with the best prediction effect. Finally, in the light of the fitting degree, the two models are given different weights, and a predictive model that combines multiple linear regression and ARIMA (2, 2, 1) is established. It can be shown from the results that the prediction effect of combined model is better and it can accurately predict needs for new retail goods, thereby reducing the difficulty of inventory management and improving corporate competitiveness.

In the context of the quick increase of the Chinese commodity economy and the comprehensive popularization of Internet technology, new retail enterprises which combine the Internet technology, big data technology and logistics technology emerge as the times require. However, physical retail industries, which take commodities as the core and only focus on the inventory management of commodities, cannot adapt to the digital era and fully satisfy the demands of consumers, new retail enterprises are different from the traditional retail enterprises, on the one hand, new retail enterprises are the combination of e-commerce platforms and store scene consumption. It brings together consumers from multiple sales channels such as online e-commerce and physical stores, through online platform construction and offline immersive scene consumption, to provide consumers with full service and increase consumers’ shopping experience. On the other hand, it is more humanized and more focused on the service to consumers, and the core of the business is transformed from the previous commodity to the commodity plus service. With the increase of people’s income and the great abundance of material, the consumption willingness and consumption level of residents are also improved, and the demands of consumers have various types. New retail enterprises use big data mining technology, combined with consumers’ hobbies, behaviors, habits and other aspects of user characteristics, continuously to improve the production model, further subdivide the product hierarchy, and produce more diverse, beautiful and fashionable target products to satisfy the diverse, fashionable, and personalized demand of consumers. Although this production mode can serve consumers better, predicting consumer needs is difficult when the sales data is complex, which also leads to a variety of challenges, such as the production plan is difficult to formulate, inventory is hard to administer and so on. Therefore, considering the effect of external macro factors and the regularity and trend of historical sales data, this article builds a model on the basis of the multiple linear regression and ARIMA (2, 2, 1) in order to provide a more accurate demand analysis and sales forecast for regional level, sub category level and even store skc level, and further make inventory management simple and enhance the profitability and competitiveness of new retail enterprises.

(Gong & Huang, 2017) combined grey theory and exponential smoothing method to establish a model to predict product demand . However, the gray theory is not good for long-term prediction and is only sui for small samples. (Miao, Tang, & Luo, 2020) used the ARIMA model to forecast the sales of new energy vehicles, taking into account the seasonal factors of historical sales data. (Dong, Dong, Zhang, & Cui, 2020) used the redesigned traditional data as the actual input of the exponentially weighted average method, which improved the accuracy of corporate sales forecasts . (Rong & Guo, 2019

The source of the data in this paper is the 2020 Mathorcup College Mathematical Modeling Challenge. We use Excel to select out the data required for the corresponding questions. First, we filter out the top ten target sub categories of sales from June 1 to October 1, 2019, and then process the data of these 10 target sub-categories in 2019, and summarize the daily data into weekly data. A total of 520 sets of data are collected. Each target sub category summarizes 52 weeks data, including sales volume and inventory, actual price, label price, discount, etc. In addition, some missing sales data or influencing factor data in the target sub category are also found when we sort out the data. Therefore, there are four methods to fill in the data. First, if the index value is smooth, we can use the previous data; second, if the data before and after are available, the average value can be used as the missing data; third, if the two groups are similar, we can replace the missing data in a group with the same value in another group; fourth, we use interpolation method to fit the data. After sorting out the complete data, the data of the first nine months of 2019 can be applied to establish a model for fitting, so as to forecast the sales of the top 10 target sub categories in each month of the three months after October 1, 2019, and then determine a model that can accurately predict the demand of new target products.

Multiple linear regression is a prediction method by establishing the regression function expression making use of the influence of the independent variable on the dependent variable. Various factors all influence the sales volume of target products. The optimal association of many external factors, can help to forecast the future trend of sales data more accurately. Therefore, on the basis of the selected sales data, inventory, actual price, discount, holidays and other factors data. We forecast the sales volume of the target sub category in the next three months (13 weeks) of 2019 through external macro factors.

Before performing multiple linear regression, we first make scatter plots of sales volume, price and inventory, and observe the correlation between influencing factors and sales volume.

From

We take the sales volume of the target sub category as the dependent variable, and the actual price, inventory, holidays as the independent variables. In fact, holidays are also significantly influence the sales volume of target goods. Generally, before and after New year’s day, National day, Double 11 and Double 12, the sales volume of retail enterprises will increase obviously above the normal levels. Therefore, we need to set this factor as a dummy variable. If the week contains holidays, we will take the holiday factor as 1, otherwise we will take it as 0, and establish the following multiple linear regression equation.

{ y i = β 0 + β 1 ln x i 1 + β 2 x i 2 + β 3 x i 3 + ε i ε i ∼ N ( 0 , δ 2 ) , i = 1 , ⋯ , n (1)

The least squares estimation method is used to gauge the parameters, and we make the error sum of squares are smallest.

Q = ∑ i = 1 n ε i 2 = ∑ i = 1 n ( y i − y ^ i ) 2 = ∑ i = 1 n ( y i − β 0 − β 1 ln x i 1 − β 2 x i 2 − β 3 x i 3 ) 2 (2)

∂ Q ∂ β j = 0 , j = 0 , 1 , 2 , ⋯ , n (3)

After sorting out the normal equations, solving the normal equations are as follows

[ β ^ 0 , β ^ 1 , β ^ 2 , β ^ 3 ] = ( X T X ) − 1 X T Y (4)

We use the collected and filtered data for the first 39 weeks of 2019 to solve the multiple linear regression model. In

From

β ^ 0 = 1588.977 , β ^ 1 = - 923.520 , β ^ 2 = 72.576 , β ^ 3 = 773.444

F | Prob > F | R-squared | Adj R-squared |
---|---|---|---|

35.16 | 0.0000 | 0.7617 | 0.7401 |

y | Coef | Std. Err | t | P > |t| |
---|---|---|---|---|

lnprice | −923.520 | 502.731 | −1.87 | 0.035 |

inventory | 72.576 | 8.776 | 8.27 | 0.000 |

Holidays | 773.444 | 181.074 | 4.27 | 0.000 |

constant | 1588.977 | 2283.289 | 0.70 | 0.491 |

Then the multiple linear regression model is

y ^ i = 1588.977 − 923.520 ln x i 1 + 72.576 x i 2 + 773.444 x i 3 (5)

The hypothesis test of the model is as follows:

H 0 : β 1 = β 2 = β 3 = 0 ; H 1 : β 1 , β 2 , β 3

f = R 2 / d f e ( 1 − R 2 ) / d f r ∼ F ( d f e , d f r )

We can see from

Hypothesis testing of regression coefficients is as follows:

H 0 : β j = 0 ; H 1 : β j ≠ 0 , j = 1 , 2 , 3

T = β ^ j − β j S e ( β ^ j ) ∼ t ( d f )

It can be shown from

R 2 = E S S T S S = 1 − R S S T S S , 0 ≤ R 2 ≤ 1

For a model, a large coefficient of determination usually corresponds to a high fitting degree. From

Among the time series models, the ARIMA model is more commonly used. It only needs to use internal previous data and does not need other exogenous variables. The model is denoted as ARIMA (p, d, q), where p is the autoregressive parameter, d is the number of differences required to transform the original non-stationary series into a stationary series, and q is the moving average parameter . Its main modeling steps are shown in

Since the establishment of the ARIMA model needs to ensure that the sequence is stable, we use Eviews software to make a sequence diagram for the sales of the target subcategory in the first 39 weeks of 2019. In order to detect stationarity of the sales volume sequence, the consequences can be observed from the following figure.

It can be seen from

ADF test is also a widely used method to examine the stability. The existence of unit root is the standard to judge whether the sequence is stable or not. Generally, if the unit root does not exist, the sequence can be judged to be stable, otherwise, it is not stable. This is because when the unit root exists, the regression

is pseudo regression, that is, the error of residual sequence will not decrease with the increase of sample size. Therefore, apart from the timing diagram, ADF test method is also used to further judge the stationarity of the sequence. From the following table, we can get the results.

As can be shown from

Only a stationary time series can meet the modeling requirements of the ARIMA model, so we need to perform a difference transformation on the non-stationary series.

The ARIMA model is

y ′ t = α 0 + ∑ i = 1 p α i y ′ t − i + ε t + ∑ i = 1 q β i ε t − i (6)

y ′ t = Δ d y t = ( 1 − L ) d y t

The ARIMA difference model is

( 1 − ∑ i = 1 p α i L i ) ( 1 − L ) d y t = α 0 + ( 1 + ∑ i = 1 q β i L i ) ε t (7)

We use the Eviews software to perform the first-order difference on the original sequence, and find that the transformed sequence is still not stable, and then perform the second-order difference on it. From

t-Statistic | Pro.* | ||
---|---|---|---|

Augmented Dickey-Fuller-Fuller test statistic | −2.414323 | 0.1448 | |

Test critical values | 1% level | −3.621023 | |

5% level | −2.943427 | ||

10% level | −2.610263 |

t-Statistic | Pro.* | ||
---|---|---|---|

Augmented Dickey-Fuller-Fuller test statistic | −5.746084 | 0.0000 | |

Test critical values | 1% level | −3.646342 | |

5% level | −2.954021 | ||

10% level | −2.615817 |

The stationary sales series data processed by the second-order difference has reached the modeling requirements of the ARIMA model. Then, we use Eviews software to make the autocorrelation graph ACF and partial autocorrelation graph PACF of the sales series and determine the value of parameters p and q by the correlation characteristics of the graphs.

We can observe from

Because there is a little error between autocorrelation graph and partial autocorrelation graph in determining model parameters, sometimes they can not be determined completely and accurately, we compare ARIMA (2, 2, 1) with ARIMA (1, 2, 1) and ARIMA (1, 2, 2) to determine the optimal order and establish a model with the highest fitting degree.

We use Spss to analyze the fitting degree of the three models, and we can get the consequences from

From

Model | Stable R Square | Standard BIC | Significance |
---|---|---|---|

ARIMA (2, 2, 1) | 0.449 | 13.910 | 0.357 |

ARIMA (1, 2, 1) | 0.402 | 13.959 | 0.316 |

ARIMA (1, 2, 2) | 0.439 | 13.977 | 0.288 |

p = 2, d = 2, q = 1, and establish an ARIMA (2, 2, 1) model.

We can judge whether the residual is a white noise sequence and whether the ARMIA model can well identify the sales volume data by observing the correlation characteristics of the autocorrelation graph and partial autocorrelation graph of the residual.

We can see from

We compare the actual sales volume in the first 9 months of 2019 with the sales volume fitted by the ARIMA (2, 2, 1) model. We can observe from

that the change trend of the real sales data and the fitted sales data are roughly the same. Therefore, we can observe that the ARIMA (2, 2, 1) model has a better fitting effect.

We first establish a multiple linear regression model and consider macroscopic influencing factors when predicting the sales volume of the target subcategory. Secondly, because the time series use the regularity of their own data to predict, the previous sales data will influence the current sales, so we build an ARIMA model, and compare the ARIMA models with different parameters respectively, and finally establish the optimal ARIMA (2, 2, 1) model. Analyzing these two models, time series analysis can find trends and seasonal factors, such as the holiday factors and the internal law of one’s own data can be fully utilized. But time series analysis does not consider macroscopic factors. Multiple linear regression thinks over macroscopic factors, but it cannot use the trend and seasonal characteristics of the data and if the two change, it can’t cope well. Therefore, which prediction method is used alone is relatively one-sided. However, a combined model that makes full use of the advantages of the two models will make the prediction results more accurate and more robust.

Synthesize the above research, we integrate the multiple linear regression model with ARIMA (2, 2, 1) model, and assign different weights to the prediction values of the two single models on the basis of the degree of fit, and further predict the true values. We take y ^ 1 as the predicted value of multiple linear regression and y ^ 2 as the predicted value of the ARIMA (2, 2, 1) model. Then, in view of their respective degrees of fit, a weight of 0.4 is assigned to y ^ 1 , and a weight of 0.6 is assigned to y ^ 2 , then we can build a combination model.

MAPE = 1 n ∑ i = 1 n 100 | y i − y ^ i | y i (8)

Finally, we use formula (8) to calculate the average MAPE values of the multiple linear regression model, the ARIMA (2, 2, 1) model, and the combined model to be 13.462, 11.826, 9.437, respectively. As can be shown that the combined model not only considers the impact of internal preliminary data, but also considers external factors, so that the forecast accuracy is further improved, and the needs prediction for the target goods of the new retail enterprise is more accurate.

In the period of the new retail era, consumer experience and needs are the most significant aspects for an enterprise to be concerned with. In order to satisfy the decentralized and differentiated needs of consumers, enterprises need to provide consumers with more kinds of goods, which also needs enterprises to have excellent abilities to manage inventory and formulate reasonable and effective production plans, so that the goods provided can meet the needs of consumers without causing inventory accumulation and waste of resources. The precise forecast of the demand for retail goods with complex levels and various varieties will be the prerequisite for enterprises to make reasonable decisions. Therefore, on the one hand, this article builds a multiple linear regression model to research the forecast of actual price, inventory, and holiday on sales. On the other hand, we utilize the characteristics of the historical data tendency of the target goods, through parameter estimation and fitting degree comparison, to establish the optimal ARIMA (2, 2, 1) model. Since a single prediction is difficult to accurately predict the target product with complex levels, we finally combine the advantages of the two models to establish a combined prediction model on the basis of multiple linear regression and ARIMA (2, 2, 1). We can get the consequences that the MAPE value of the combined model is 9.437, the prediction effect is better. It can achieve precise forecast of various target goods at distinct levels, and help business leaders make scientific and effective management decisions, thereby reducing the difficulty of inventory management, reducing capital occupation, increasing economic benefits, meeting consumer demand, and enhance the brand influence of enterprises, enhance their competitiveness, and promote the further development of new retail enterprises.

The combined forecasting model based on multiple linear regression and ARIMA established in this paper only studies the linear characteristics of the target product sales data. In the future, a new combination forecasting model can be established on this basis to further study the nonlinear characteristics of the sales data to obtain more accurate prediction results.

The authors declare no conflicts of interest regarding the publication of this paper.

Jiang, J. L., Yao, W. W., & Li, X. Y. (2021). Precise Demand Forecast Analysis of New Retail Target Products Based on Combination Model. Open Journal of Business and Management, 9, 1312-1324. https://doi.org/10.4236/ojbm.2021.93071