Logistics Demand Forecast of Fresh Food E-Commerce Based on Bi-LSTM Model

Fresh products have the characteristics of perishable, small batch and high frequency. Therefore, for fresh food e-commerce enterprises, market demand forecasting is particularly important. This paper takes the sales data of a fresh food e-commerce enterprise as the logistics demand, analyzes the influence of time and meteorological factors on the demand, extracts the charac-teristic factors with greater influence, and proposes a logistics demand forecast scheme of fresh food e-commerce based on the Bi-LSTM model. The scheme is compared with other schemes based on the BP neural network and LSTM neural network models. The experimental results show that the Bi-LSTM model has good prediction performance on the problem of logistics demand prediction. This facilitates further research on some supply chain issues, such as business decision-making, inventory control, and logistics capacity planning.

the hoarding of goods in logistics warehousing.
Traditional demand forecasting models include linear regression models, machine learning models, and so on. Jishan Zhu [1] established a multiple linear regression model with POS data as a predictor to forecast the demand of supply chain terminal business. This kind of simple demand forecasting model is mainly built based on historical sales data, which is suitable for the short-term demand situation of the object of prediction in a stable state and does not take too much consideration into the changes in community distribution and weather factors. Xiuzhu Zhang constructed a Bayesian network model for uncertainty demand forecasting [2]. Slimani I et al. focused on the realization and configuration of neural networks in daily demand forecasting work based on historical sales information [3]. Hongwei Ding et al. used the Adam optimization algorithm to replace the stochastic gradient descent algorithm to better reduce the prediction error of the BP neural network model [4]. Hao Xu et al. forecast supermarket sales based on univariate time series and the LSTM model [5]. Yuanming Wang introduced multivariate influencing factors into the LSTM model for sales forecasting in the e-commerce industry, which has great reference significance for the forecast of fresh food e-commerce logistics demand [6]. Ying Zhao et al. constructed an LSTM-Prophet composite prediction model. The experiments show that the LSTM model has good portability and application scenarios [7]. Lama A et al. propose an LSTM freight forecasting framework based on a k-means clustering method, which supports short-or long-term freight demand forecasting and allows forecasting to closely follow recent market trends and fluctuations [8]. The LSTM model can solve the problem of long-term dependence and use forward information to predict future demand, but ignore the impact of backward information on current data, and the prediction accuracy and model universality still need to be improved [9].
In the field of supply chain demand forecasting, most of the research usually proposes suggestions from the perspective of macro-control, without considering the real-time impact of complex factors on demand changes. As can be seen from the above, some studies have tried to use regression models, machine learning and other technical means to solve the problem of demand forecasting in traditional industries. However, as an emerging industry, the fresh food e-commerce industry, which requires extremely high timeliness and has variable influencing factors, has little research in the field of logistics demand forecasting. To achieve an effective forecast of fresh food e-commerce logistics demand, enterprises must not only take into account the overall trend and fluctuation of demand changes but also take into account the different degrees of influence of various factors on demand. Therefore, this paper proposes a fresh food e-commerce logistics demand forecasting scheme based on the Bi-LSTM model, which takes the historical sales data of fresh food e-commerce as the logistics demand, and analyzes the influence of various factors on the logistics demand. Factors that have a greater impact are retained. Then, the demand data set and the feature data set are used to construct the data set as the input of the Bi-LSTM forecasting model.

Logistics Demand Forecasting Process
The logistics demand forecasting process includes: data acquisition and storage, data preprocessing (outliers and vacancies processing, data standardization, analysis and screening of influencing factors), model training, model prediction, and model evaluation (as Figure 1). Specifically, it is necessary to build a feasible

LSTM Unit Structure
The LSTM network improves the problem that the RNN network cannot solve long-term dependencies and avoids the phenomenon of the gradient explosion of the RNN network [11]. The LSTM unit structure is shown in Figure 2.   The LSTM memory unit mainly includes four structures: cell state (C t ), forget gate, input gate, and output gate. The cell state is used to hold the memory information at time t, which runs directly on the entire chain so that the information can remain unchanged when transmitted. When the information passes through the gate structure, it will selectively remove or add information to the cell state [12]. The following is the overall framework of the LSTM network: In Equation (1) to Equation (5), tanh is the hyperbolic tangent activation function; f W and i W are the weight matrices corresponding to the forget gate and the input gate respectively; c W and o W are the weight matrices corresponding to the generated candidate joining information; the bias terms of each layer.

The Principle of the Bi-LSTM Network
The one-way LSTM network can only make information predictions from the front to the back. Based on the one-way LSTM network, Schuster proposes a Bi-LSTM network that considers both early information and late information [13]. The structure diagram of the Bi-LSTM network is shown in Figure 3.
The Bi-LSTM network performs forward computation and backward computation synchronously. In Figure 3, the horizontal direction represents the bidirectional flow of information along the time series, and the vertical direction represents the unidirectional flow of information from the input layer to the hidden layer, and then to the output layer [14]. Among them, the hidden layer  comprises two LSTM models, namely the forward transfer layer and the backward transfer layer [15]. After the data is input, the model will generate new latent vectors t h and t h from the former latent vector In Equation (8), hs W and hs W are weight matrices; y b is the bias term.
The Bi-LSTM network performs forward calculation and backward calculation synchronously, which can memorize the overall trend and fluctuation of demand. Input the factors that affect the logistics demand into the model, so that the model can memorize the influence degree of the changes of each factor on the demand.

Experimental Data Set and Data Analysis
In the experiment, the data visualization method was used to analyze the spatial and temporal demand distribution characteristics of the historical data, and the correlation degree of each meteorological influencing factor was analyzed through the heat map. Then, the meteorological influencing factors whose associated value is lower than the threshold are removed, and the remaining influencing factors are used to construct the model feature set, which together with the historical sales data constitute the model input. Mining the implicit correlation information between data can provide decision support for enterprises in logistics capacity and logistics warehousing planning.

Data Collection
The meteorological data set used in the experiment comes from the Shenzhen

Municipal Government Data Open Platform and the Shenzhen Meteorological
Bureau, and the sales data set comes from the historical sales data of pork products of a fresh food e-commerce company in a certain outlet in Shenzhen. The time series of the dataset ranges from January 1, 2020, to March 31, 2022, with a total of 821 pieces of data. There are a few missing values in the data set, and the average value of the data before and after the missing value is taken to replace the missing value so that the subsequent experimental work can be carried out smoothly. The data items of the dataset are shown in Table 1 and Table 2. Figure 4 is a time series analysis of the company's fresh food product sales data from January 2020 to March 2022. It can be seen that changes in market demand for fresh products are more time-sensitive and have obvious regularities of change.

Time Factors
In general, in a year, the sales curve of fresh products generally increased first and then decreased. Occasionally, there is a surge in the sales curve. Judging from the time, it is obviously related to the typhoon season in Shenzhen. Every year from March, the sales volume of fresh products has soared, forming a clear    Figure 5 represents the average, and MED represents the median. It can be seen from the figure that the sales volume of fresh products from Monday to Thursday is stable. When approaching the weekend, the sales volume will rise sharply, and the sales volume on holidays is also significantly higher than that on non-holidays. In terms of months and seasons, June to August is the peak period of the year, and December to February is the

Meteorological Factors
The demand for fresh products is significantly affected by meteorological factors.
As shown in Figure 6, the sales volume of fresh products varies with the type of weather. The sales volume on rainy days is significantly higher than that on sunny days, and typhoon weather has the most significant impact on sales volume.

Preliminary Work
The preliminary work is mainly to build the experimental environment, prepro-Journal of Computer and Communications cess the data set and establish the evaluation indicators of the model prediction results.

1) Experimental Configuration
The experiment was carried out in the Windows10 system, using Anaconda Navigator (Jupyter notebook) and Python3.9 as the experimental platform, and using the neural network model such as LSTM provided by the Tensorflow framework for experimental simulation.

2) Data Set Construction
According to the above analysis, the dataset selects seven influencing factors as the feature dataset, including WEEKDAY, MONTH, HOLIDAY, SEASON, WEATHER, MAXT, and MINT. Meanwhile, SALE is used as the label dataset.
The training set ratio is 0.8 and the test set ratio is 0.2.

3) Data Standardization
In order to compare and weight indicators of different units or magnitudes, it is necessary to standardize the continuous values in the data set and convert them into dimensionless pure values. The standard deviation standardization method was used in the experiment, and the formula is as follows: In Equation (9), mn x is the original data, mn X is the normalized data, µ is the mean value of the feature data, and σ is the standard deviation of the feature data.

4) Establishment of Evaluation Index
In the experiment, Coefficient of Determination ( 2 R ) and Mean Square Error (MSE) were selected as evaluation indicators to evaluate the prediction performance of the model [16].
The coefficient of determination can be used to determine the goodness of fit between the predicted value and the actual value. The closer 2 R approaches 1, the higher the goodness of fit. The formula is as follows: In Equation (10), i y is the true observation, i y is the mean of the true observation, and ˆi y is the predicted value. When 2 1 R = , it means that the model predicted value is equal to the actual value, there is no error, and the model prediction accuracy is high; when 2 0 R = , it means that the model prediction value is equal to the mean value and the model prediction accuracy is low; when

1) Feature Dataset and Batch Data Construction
The experiment sets the time step of the Bi-LSTM model to 7 and then utilizes the normalized original dataset to construct training and testing datasets (scales are 0.8 and 0.2, respectively). Set the buffer size to 1000 and the batch size to 128, and then use the training and testing datasets to construct training and testing batch datasets. Using a fixed-size buffer can shuffle the data randomly while setting a buffer larger than the number of data items ensures that the data is completely shuffled.

2) Model Building and Compilation
In

Training and Comparative Analysis of Logistics Demand Forecasting Model
The

1) Logistics Demand Prediction
The experiment uses three trained models to predict the logistics demand of the test batch data set. Through inverse normalization, the predicted results of the model and the actual sales data are restored to the original value. The comparison between the predicted value and the real value of the Bi-LSTM model after inverse normalization is shown in Figure 9. It can be seen from the figure that Bi LSTM model has a relatively accurate prediction effect on the data set, which can not only better predict the changing trend of data, but also have a small prediction error.

2) Model Comparison and Evaluation Analysis
In order to observe the prediction error of the three models more intuitively, the experiment subtracts the prediction value of the three models from the actual value to obtain the prediction error value of the model. The box diagram of prediction error is shown in Figure 10. Obviously, compared with the BP model and LSTM model, the prediction error value distribution of the Bi-LSTM model is more concentrated and closer to 0, indicating that the prediction accuracy of the Bi-LSTM model is higher.
After the three models complete the demand forecast, the obtained MSE and R 2 are shown in Table 3. According to the evaluation index results of each model in Table 3, among the three models, the Bi-LSTM model has the lowest MSE

Conclusions
Demand forecasting has a significant impact on fresh food e-commerce logistics management. Therefore, this paper proposes a fresh food e-commerce logistics demand forecasting scheme based on the Bi-LSTM model. After experiments, the following conclusions are drawn.
The main factors affecting the demand for fresh products include whether it is weekends, holidays, months, seasons, and temperatures. Fluctuations in minimum temperatures affect customer demand for fresh produce. At the same time, due to the influence of the Spring Festival, the period from December to February has become the low peak period of demand for fresh products in a year.
According to the model prediction results, the MSE value of the Bi-LSTM model is 0.063, indicating that the model prediction error is smaller and the R 2 value is 0.858, indicating that the prediction results are more accurate. This shows that the fitting effect of the Bi-LSTM model is relatively good.
Experiments show that the logistics demand forecasting scheme based on the Bi-LSTM model can take into account the overall change trend of demand and the different influences of various factors on demand. Therefore, it can meet the timeliness requirements of fresh food e-commerce companies, and is suitable for