Multipath Selection Algorithm Based on Dynamic Flow Prediction ()
1. Introduction
With the development of cloud computing, 5G and other technologies, under the trend of explosive traffic growth, increasing scale and complex topology in Data Center Network (DCN) [1], the traditional network management architecture gradually exposes some deficiencies. Therefore, designing more effective and intelligent load balancing algorithms is of great research significance to improve the network performance in cloud data centers.
The proposal of SDN [2] provides strong technical support for the effective detection and management of networks. SDN separates the control plane from the data plane, so that the data plane is only responsible for routing and forwarding, and the control plane implements forwarding decisions, realizing universal forwarding and efficient manipulation of network data streams, thus increasing the flexibility of the network.
We first use machine learning to dynamically predict the network flow and design a multipath forwarding policy based on the current and predicted network load characteristics to dynamically adjust the network load situation. The effectiveness of this algorithm is verified by comparing it with other load balancing algorithms through simulation experiments.
2. Related Work
In recent years, SDN-based network load balancing technology has received widespread attention because of its high flexibility and adjustability, and researchers at home and abroad have conducted a series of studies on it and proposed a variety of solutions.
Fu Wang [3] and others proposed a Dynamic Distributed Multi-path (DDMP) load balancing algorithm, which quickly responds and adjusts the traffic according to the inverse of the buffer occupancy when certain paths in the network are loaded to prevent severe congestion. However, as the scale of the data center network expands, its energy consumption will become an important issue. M. A. Saifullah [4] et al. proposed EHLBOF (Extended Health monitoring for Load Balancing in OpenFlow based network) algorithm, which not only takes into account the current load of the servers, but also their health status, thus improving service availability and response speed. In addition, Zhang Chaohui [5] et al. proposed an energy efficient routing algorithm based on bandwidth matching, which is able to reserve sufficient space for the upcoming data streams, preventing link congestion and saving network energy consumption.
A. Montazerolghaem [6] proposed a modular energy and load control system in IoMT that both reduces the number of loMT active servers and switches and balances the load distribution. Literature [7] proposed a network control mechanism based on SDN and AI, and proposed three operator network optimization algorithms, which provide effective solutions and theoretical support for telecom operators to implement intelligent network control and traffic optimization in SDN.
The MTDLR [8] algorithm integrates path-level metrics to select forwarding paths and adapts to different topologies by adjusting the weights of network influences. However, MTDLR determines the weighting parameter to select the optimal path only through the average flow bandwidth utilization, which may cause the algorithm to fall into a local optimal solution and is not flexible enough for the dynamically changing network load problem.
3. Load Balancing Algorithm Based on Dynamic Flow Prediction and Multi-Constraint Routing
3.1. Dynamic Flow Forecasting
3.1.1. Long Short-Term Memory
The LSTM model [9] was proposed by Hochreiter and Schmidhuber in 1997, which effectively solves the problem of gradient vanishing and gradient explosion encountered when dealing with long sequential data. The structure of LSTM is shown in Figure 1.
Figure 1. LSTM structure.
The LSTM model adds three parts to the RNN (Recurrent Neural Network), namely, forgetting gate, input gate and output gate, and each gating unit consists of
and tanh activation function.
1) Forgetting gate: determines which old information is forgotten.
indicates the importance of the old information, the higher the value, the more important the information.
(1)
2) Input gate:
decides which new information is stored. is the output of the input gate, indicating the importance of the new information.
indicates new information can be stored to the current cell state.
(2)
(3)
At this point, the cell state is updated and the updated state is
.
3) Output Gate: Determines what information will be output.
(4)
(5)
In Equations (1) to (4), W is the network weight coefficient,
is the output of the previous time step,
is the output of the current time, and b is the bias of the activation value.
3.1.2. Model Evaluation Indicators
The average absolute error (MAE), root mean square error (RMSE), average absolute percentage error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE) are usually used to evaluate the prediction effect of the model, and the lower the value of the indexes, the more stable the model is and the more accurate the prediction results are. The formulas are as follows:
(6)
(7)
(8)
(9)
where m,
, and
are the total number of samples, measured values, and predicted values, respectively. MAE measures the average size of the absolute value of the difference between the predicted and actual values; RMSE is the square root of the mean of the squares of the difference between the predicted and actual values; MAPE is the average percentage of the absolute value of the ratio of the prediction error to the actual value; SMAPE reduces the problem of numerical inflation that may occur with MAPE by taking the absolute value of the predicted value and the actual value and averaging them.
3.2. Quantum Annealing Algorithms
Metropolis et al. [10] proposed the idea of Simulated Annealing (SA) in 1953, which has since been widely used in combinatorial optimization [11] problems. SA, as a heuristic global optimization algorithm, simulates the physical process of atoms rearranging to reach thermodynamic equilibrium during annealing of solid materials. Although SA can handle complex optimization problems, its convergence speed and and solution quality depend on the parameter settings, and the temperature of the object is directly proportional to the energy, at lower temperatures, according to the Metropolis criterion, it only accepts new solutions that are smaller than the current solution, or when the temperature decreases at too fast a rate, it may skip many potential globally optimal solutions, and thus easily causes the system to stay at the local optimal solution.
With the development of quantum technology, Quantum Annealing (QA) algorithm [12] based on quantum tunneling effect is proposed. The so-called quantum tunneling effect, i.e., the quantum leap gives the quantum the ability to penetrate the potential barrier higher than its own energy. Simulated annealing algorithm and quantum annealing algorithm over the potential barrier are shown in Figure 2, the particle reaches the local optimum at point A, the simulated annealing algorithm needs to go over the potential barrier in order to achieve the global optimum, and the quantum annealing algorithm only needs to reach the global optimum point B directly through the quantum tunneling effect.
Figure 2. Comparison of SA and QA.
The process of realizing quantum annealing requires 6 steps:
Step 1: Construct the evaluation function
of the quantum system, where
is the potential energy to represent the objective function of the optimization problem and
is the kinetic energy;
Step 2: Set the initial temperature T, the transverse magnetic field strength Γ, and the maximum number of iterations Steps. Select an initial state x randomly at temperature T and calculate
;
Step 3: Generate a new state x' by randomly perturbing state x and calculate its
;
Step 4: Calculate
. If
, or
and
, the new state x' is accepted, otherwise, return to step 3;
Step 5: Perform a de-tempering and magnetic field attenuation operation;
Step 6: Determine whether the maximum number of iterations or the termination condition is reached, if so terminate, otherwise return to step 3.
According to the above steps, the flowchart of quantum annealing algorithm is shown in Figure 3:
3.3. Multi-Constraint Based Path Forwarding Policy
After a data stream is generated, there are multiple paths between its source and destination nodes in the data center, and choosing the appropriate path for forwarding can effectively maintain network load balancing. In this paper, we design the optimal forwarding path policy that satisfies the constraints through the current and predicted load balancing degree.
Figure 3. Flowchart of QA.
Construct the data center network as a directed graph
, with S being the set of all nodes in the network and L being the set of all links in the network. Denote the path between the data flow source node s and the destination node d as Ps,d.
3.3.1. Path Available Bandwidth
In a data center network, choosing a path each time to forward with the maximum available bandwidth between source and destination nodes can effectively alleviate and prevent load imbalance problems. Get the switch port statistics to get the number of bytes sent and received by the port and calculate the transmission rate
of the port based on the last recorded number of bytes transmitted:
(10)
In Equation (10), i and j respectively denote the source and destination switches, stats denotes the number of bytes transmitted on the port, and pd denotes the time interval between two recordings.
Then the bandwidth
of link l is denoted as the minimum of the two ports connected, and the available bandwidth
is the difference between the link capacity
and the used bandwidth, and the minimum available bandwidth of all the links in the path is denoted as the available bandwidth of the path:
(11)
(12)
(13)
3.3.2. Link Load Balancing Degree
A link is often part of multiple paths, so the available bandwidth of different links in a path varies. To avoid overloading certain links and improve transmission efficiency, the load balancing degree of a link is considered when selecting a forwarding path. In this paper, the bandwidth equalization degree of a path is expressed as the difference between the minimum and maximum values of the bandwidth between all nodes in the path:
(14)
3.3.3. Link Forwarding Policy
Combining the available bandwidth and bandwidth equalization of the above paths with the LSTM prediction of the available traffic and bandwidth equalization, the transmission cost of the paths is used as a weight for path selection. In addition, in order to guarantee the transmission of the data flow, one path is selected at a time and only one path is selected, and there are no loops in the selected paths, so that the objective function definition is obtained:
(15)
In Equation (15), n is the number of all forwarding paths calculated from the source destination address.
3.4. Design of LSTM-QALB Path Forwarding Algorithm
The optimal path is solved by quantum annealing algorithm and the solution process is shown in Algorithm 1:
Algorithm 1. Optimal path selection based on LSTM and QA.
Input: topological directed graph G Output: optimal forwarding path |
1 |
Get the set of forwarding paths all_paths based on the source directory address. |
2 |
for path in all_paths do |
3 |
Calculate the available bandwidth for all links on the path, save to total_weights. |
4 |
Predict the available bandwidth of all links on the path, saved to total_forecast_weights. |
5 |
for weights in total_weights do |
6 |
Calculate the available bandwidth of the path and save it to free_bws. |
7 |
Calculate load balancing of paths, save to balances. |
8 |
for forecast_weights in total_forecast_weights do |
9 |
Calculate available bandwidth for path prediction, save to forecast_free_weights. |
10 |
Calculate the load balancing degree of the path prediction and save it to forecast_balances. |
11 |
Define the Hamiltonian quantity. |
12 |
Calculation of optimal forwarding paths based on quantum annealing algorithm. |
4. Experimental Analyses
In this paper, we simulate the changes in the distribution of network traffic in the data center through python and choose a Fat-tree with k = 4 as the experimental topology, as shown in Figure 4, which contains 4 core switches, 8 aggregation switches, 8 edge switches and 16 servers.
Figure 4. Fat-tree (k = 4) topology.
The link bandwidths are all set to 100 Mbps, the link delay is 1 ms, and the traffic data of 64 links are collected from the simulation environment. After a number of learning training, set the number of iterations to 250, using Adam’s algorithm, the learning rate is 0.001. Then the traffic is predicted, and four evaluation index values are obtained: MAE = 4.99, RMSE = 7.28, MAPE = 5.65%, SMAPE = 2.76%, the model prediction is more accurate.
The Poisson model [13] is used, where the source and destination hosts are randomly selected each time, so that the number n of packets arriving in time sequence t satisfies the Poisson distribution with parameter
, i.e.,
, and its corresponding sequence of packet arrivals at the time interval T is exponentially distributed, i.e.,
. The parameters were adjusted from 0.1 to 0.9 to simulate different network loads, and each set of experiments was repeated 10 times to improve the accuracy of experimental data.
In this paper, the LSTM-QALB algorithm is compared with MTDLR and EERA algorithms for throughput and packet loss. The experimental results in Figure 5 and Figure 6 show that as the network load increases, the system throughput and packet loss rate both increase; when the parameter is greater than 0.5, namely, when the load is high, the system throughput increases slowly and the packet loss rate rises sharply. When the load is low, the performance of the three algorithms is similar, but when the load is high, the LSTM-QALB algorithm outperforms the other two.
Figure 5. Comparison of system throughput of different algorithms.
Figure 6. Comparison of average packet loss rate of different algorithms.
5. Conclusion
In this paper, we propose a load balancing algorithm LSTM-QALB based on LSTM dynamic traffic prediction and quantum annealing, which selects the optimal forwarding path by calculating the available bandwidth of the current and predicted paths with the link bandwidth equalization. Compared with MTDLR and EERA algorithms, LSTM-QALB has obvious advantages in terms of system throughput and average packet loss rate, which achieves the purpose of load balancing and improving the network quality of service.