Load Balancing in IP/MPLS Networks: A Survey

The present era has witnessed tremendous growth of the Internet and various applications that are supported by it. There is an enormous pressure on Internet Service Providers (ISPs) to make available adequate services for the traffics like VoIP and Video on demand. Since the resources like computing power, bandwidth etc. are limited, the traffic needs to be engineered to properly exploit them. Due to these limitations, terms like Traffic Engineering, Quality of Service (QoS) came into existence. Traffic Engineering broadly includes techniques like multipath routing & traffic splitting to balance the load among different paths. In this document, we survey various techniques proposed for load balancing that are available on the Internet. We here try not to be exhaustive but analyze the important techniques in the literature. Present survey would help to give a new direction to the research in this realm.


Introduction
Traffic engineering (TE) broadly defines the optimization of functional abilities of the network [1]. This optimization is done by diverting the traffic to the paths that are lightly loaded in order to balance the load amongst the paths as per the various metrics calculated. Methodologies for TE proposed all over the world can be divided in to state dependant and time dependant. Time dependant functionalities engineer the traffic on the basis of long time scale. On the other hand, state dependant methods alter the traffic in short time scale depending on the different metrics calculated online or offline of the present traffic. The aim of both these methods of course is to balance the traffic so as to avoid the congestion. Present day IP network relies on the best effort service but as there is considerable growth of the applications that rely on the services of the Internet for their operation, there is huge competition amongst the ISPs to provide Quality of Service (QoS). QoS refers to the transport of traffic in the network as per the agreement between the user and the ISP which is known as Service Level Agreement (SLA).
Internet Engineering Task Force (IETF) proposed Multiprotocol Label Switching (MPLS) for providing QoS in the Internet. MPLS is a very scalable, protocol independent, data-carrying mechanism. In MPLS the forwarding decision are made solely on the content of the assigned labels without a need to examine the networks layer header of the packet itself. In MPLS, one or more virtual path similar to the Asynchronous Transfer Mode (ATM) or Frame Relay is set up which is known as Label Switched Path (LSP). Forwarding the packets on the basis of label facilitates source routing and QoS [1]. In this paper, we survey the work done in load balancing in the network by various authors that had the following motivations: 1) Reducing Congestion in the network.
2) Reducing packet loss and packet delay.
3) Providing QoS parameters like fault tolerance. 4) Increasing overall efficiency of the network. Subsequently the paper is divided into various sections. In Section 2, we discuss load balancing fundamentals. In Section 3, we discuss contributions to the load balancing by various authors and finally we conclude our work in Section 4.

Load Balancing Fundamentals
Central aspect of Traffic Engineering is Load Balancing. The main idea is to map the part of the traffic from the heavily loaded paths to some lightly loaded paths to avoid congestion in the shortest path route and to increase the network utilization and network throughput. Approached used for Load Balancing can be broadly classified in to following types [1]: 1) Round Robin forwarding.
2) Time dependant approach: Balancing traffic on the basis of long time span as per the experience of the traffic.
3) Hashing based approaches. 4) Routing traffic as per the metrics calculated from the traffic.
Per packet round robin scheduling is advantageous only when all the paths are of equal cost. Otherwise packet disordering will take place which can be interpreted as false congestion signals. This would lead in unnecessary degradation in the throughput of the network leaving some links unutilized whereas at the same time leading to the overutilization of the other links [2].
Time dependant approach will vary the traffic on the basis of variations in the traffic over a long time span. These types of approaches are insensitive to the dynamic traffic variations.
Hashing based approaches are a stateless approach which applies the hash function on subset of five tuple (source address, destination address, source port, destination port and protocol id). This type of traffic splitting is fairly easy to compute. Though, it maintains the flow based traffic splitting yet by this method the traffic cannot be distributed unevenly. And more over as it does not maintain the state so dynamic traffic engineering is not applicable to these types of approaches.
Various authors have proposed traffic engineering with some calculated metrics like packet delay or/and packet loss etc. dynamically and applying them to split the traffic. This method is highly advantageous if the flow integrity is maintained and if the metrics calculation overhead is not considerable.

Load Balancing Proposals
In [2] authors analyze MPLS load balancing algorithms like (MATE) which distributes the flow on the basis of packet loss and packet delay, load distribution in MPLS (LDM) and load balancing over widest disjoints paths (LBWDP). Authors also introduce Periodic Multi-Step (PEMS) algorithm that adapts the offered quality depending on the class of the routed traffic. PEMS has three phases. In first phase offline path selection amongst all the path from ingress and egress pair is done; second phase do path allocation and the third phase does dynamic adaptation of the parameters of the splitting ratio equation depending on the network state.
Traffic splitting can be done with direct hashing in which a traffic splitter applies hash function to any combination of five-tuple mentioned before and uses that hash value to select the outgoing path. It is very simple as no state is needed to be maintained. [3] Propose the first ever performance study of direct hashing based schemes using real packets from two trunks of the backbone networks. After applying the hash function, the split ratio which is only discrete value is fed in to multiple source and destination pairs. They conclude that direct hashing when applied to source address and destination address leads to highly imbalanced networks. On the other hand, computationally complex 16-bit CRC (Cyclic Redundant Checksum) based hashing when applied on the five-tuple gives excellent load balancing results. They also propose that if some adaptation is included with hashing then it will improve the load balancing significantly.
In [4] authors use MPLS to set up multiple virtual paths between source to destination called Label switched Path (LSP). These paths are similar to virtual paths in ATM. Their load balancing mechanism comprises of two functions, splitting function and the allocation function. Splitting function first splits the incoming traffic in to different bins and then allocation function allocates appropriate LSP to the incoming traffic. Paper [4] analyzes three algorithms Topology-based Static Load-Balancing Algorithm (TSLB), Resource-based Static Load-Balancing Algorithm (RSLB) and Dynamic Load-balancing algorithm (DLB). They conclude that the DLB balances load better that the previous two algorithms. This is obvious since load balancing depending on the state of the traffic gives better link utilization and efficiency. Dynamic routing decision is classified in to two groups [5]: Connection based and Packet based. In Connection based some metrics of the connection is devised which effects the whole flow whereas in Packet based, routing is different for each incoming packet and is therefore easy. In [5] authors propose that load in the path is inversely proportional to the delay between source and destination Label switched routers (LSR) which are known as Ingress and Egress LSR respectively.
Hash based approach come with the drawback that it leads to the packet disorder since packets from a single flow are moved to different links. In [6] authors propose mapping between flow and physical path by monitoring the queue length of flows. Moreover, they take in to consideration the size of the flow for load balancing while reassigning the appropriate flow. This is the main idea of their model dynamic hashing with flow volume (DHFV). Through simulation they have shown that hashing using a 16-bit CRC over the five-tuple gives excellent load balancing performance.
Congestion management schemes can be classified into three types [7]: 1) Response time scale: On the basis of time taken by the solution to resolve the congestion.
2) Preventive vs. reactive: Preventive policies try to prevent congestion by future estimates on the traffic. On the other hand, Reactive policies on sensing the congestion, try to reduce it.
3) Supply side vs. demand side: Supply side policies alter the supply of the resources to minimize congestion. On the other hand demand side alter the admission of the traffic to minimize the congestion.
Authors propose, a Dynamic Load Balancing Algorithm (DYLBA) which detects the congestion when either the load on some network links is dangerously close to the link capacity, or when a new LSP demand request cannot be satisfied. It then reroutes the traffic on the basis of flow to the most promising link.
In paper [8] authors model each link as M/G/1 processor sharing queue. They distribute the traffic on flow basis by finding the average delay in each link. Flow basis distribution prevents the packet disorder.
Paper [9] balances the traffic by a mapping process between a flow and a path depending on the metrics of both, the path itself and of the traffic to be forwarded. Authors present a model called Queue Turing Algorithm (QTA) which at first divide the overall traffic into two parts; best effort traffic which is a general IP that do not need the QoS and the MPLS traffic which need to be forwarded taking into consideration various parameters of the links. This algorithm has the same advantages as the load balancing algorithm but can serve the QoS traffic well. Paper [10] updates this policy of service on the basis of class by proposing an algorithm for generating maximum revenue by supporting more and more traffic by MPLS load balancing. The revenue generated is directly proportional to the number of busy connections in LSP.
In paper [11] authors propose a term Distributed Traffic (DT) which is inversely proportional to the delay and square root of packet loss on a path. They parallelly run their algorithm which calculates the delay and packet loss on all paths between a source and destination. They distribute the load on basis of the DT calculated. This paper is updated in [12] which propose an adaptive load balancing mechanism based on the real-time measurement that is able to hold path integrity per flow while minimizing congestion. They define a term Traffic Conductance (TC) which is calculated similar to DT with slight variations and is used to balance the traffic in real time.
Paper [13] propose an algorithm Parallel-Path-based Balance Scheme (PPBS) which at first calculates node disjoint LSPs in the network and then it dedicates the suitable LSP to the traffic by comparing the bandwidth of that LSP with the bandwidth of the flow aggregate. By simulations the authors prove if accurate information of the network is available then more traffic can be transmitted to the lightly loaded links while leaving the heavily loaded links.
Paper [14] proposes a model DLSP which is constructed by dividing the original LSP into number of node disjoint LSP and distribute the traffic by fractions on those LSP. In the egress node, the packets are assembled in order to prevent them from disordering. By simulations the paper proves that this model leads to signifi-cant performance gains.
Paper [15] proposes a Distributed Explicit Partial Rerouting (DEPR) scheme for rerouting the traffic from the congested network. Since MPLS takes considerable time in rerouting the traffic, this algorithm works in the distributed manner where each node take part in congestion monitoring around its outgoing links. If a link is found congested this algorithm selects appropriate alternate link by comparing the link against some threshold so that the traffic is again not inserted in another congested link.
Paper [16] proposes a dynamic multipath traffic engineering algorithm called LDM (Load Distribution over Multipath). This algorithm improves the network utilizetion as well as the network performance as experience by the users. The improvement is gained by adaptively splitting traffic load among multiple paths. Authors confirm that LDM performs better than hop count-based, as well as the traffic load based routing mechanisms by the simulations. However, they did not perform any theoretical analysis to point out the benefits of multipath routing. But this algorithm can suffer from instability because of repeated oscillation.
This oscillation problem can be solved by employing two thresholds. In [17] the authors propose a new version of LDM that corrects the instability of the original model. One of the disadvantages of LDM is to not take into account the left over capacity of a path before assigning it a new traffic. This has been taken care of in [17].
In paper [18] authors propose a mechanism of load balancing based on the splitting of the traffic on packet basis and then to prevent disorder of the packets. This mechanism reorders the packets. They propose to change the experimental bits in the MPLS header to insert the splitting id and sequence number for the egress node to recognize the packets to be reordered.
Oscillations are the side effect of traffic splitting when the granularity of steps is coarse. So splitting is done with finer granularity steps. But this increases the number of iterations and the optimal solution can not be reached. Paper [19] proposes an adaptive granularity solution to dynamically adjust the granularity based on traffic conditions. The main idea is to choose the splitting ratio after each measurement period of the traffic so that the traffic converges as soon as possible, as if they are statically chosen.
Paper [20] presents comparison and simulation of some popular MPLS load balancing algorithms like Minimum Interference Routing Algorithm (MIRA), Dynamic Online Routing Algorithm (DORA) and Profile Based Routing (PBR). Readers are encouraged to go through the paper for details.
As mentioned by many authors, the application of an evolutionary based heuristic for solving the minimumcost constraint multipath routing with MPLS is NP-hard, i.e. computationally expensive. Thus, even for some tens of nodes, an exact method takes a long runtime to solve it. Therefore, it is much more appealing to develop specific or heuristic algorithms to solve this problem. In [21] authors investigate the application of one of the most successful state-of-the-art multi objective evolutionary algorithms for solving the traffic engineering optimization problem. It aggregates the multiple objectives into a single objective using weighted sum method.
One way to balance the load among the set of service provider network links in hose model, is to minimize the maximum bandwidth reservation among the set of network links. However, such an objective will spread the bandwidth reservation as widely (i.e. use many links) and as evenly as possible, which often results in large increase in the total bandwidth requirement and excessive computation time. Our solution is a Multi-Objective Multi-Path (MOMP) optimization approach [22]: while keeping the min-max objective. It also tries to minimize the total bandwidth requirement, and by assigning proper weights to the two objectives, achieves a balance between them.
A possible way to achieve this integration of streaming and elastic flows is to use Cross-protect router. A Crossprotect router consists of two traffic control components. A Priority Fair Queuing (PFQ) scheduler, which is a simple adjustment of a fair queuing scheduler, that implicitly differentiates between streaming and elastic flows and an admission control mechanism that guarantees a minimum QoS to accepted (or protected) flows, as well as the scalability of the scheduler by limiting the number of flows that need to be handled by the scheduler at any given time [23]. Paper [23] propose Flow-aware TE approach for carrier class Ethernet networks providing services like those defined by the Metro Ethernet Forum by using Cross-protect. The packets entering the router are fed to the implicit classification to decide whether to be served by priority queue or fair queue. They next extend the ingress TE scheme with a simple flow aware load balancing algorithm, providing greater resilience (enforced fairness, overload control) and potentially better resource utilization.
Backbone networks are highly over provisioned to cope up from fault tolerance. Valiant Load Balancing (VLB) has been known to provide high fault tolerance in the backbone networks with slight over provisioning. VLB does that by making all the nodes split the traffic amongst their next neighbors. So every node gets the fraction of total traffic. Moreover, the routing path is already known so there is high resistance if any node fails since the traffic can be transmitted by the remaining nodes [24]. In [24] authors use VLB for load balancing and show that if there are N paths between any pair of nodes and if some paths fail then the source node only needs to send more traffic on the paths that are still available. In order to tolerate k arbitrary failures, the network is required to increase its link capacities by a fraction of approximately k/N.
Load-balanced routing increases network resource utilization efficiency. Paper [25] proposes an approach to use load-balanced routing based on shortest-path-based routing by using two-phase routing over shortest paths. Two-Phase Routing (TPR) performs load balancing and each flow is routed according to the OSPF protocol in two stages across intermediate nodes. The number of possible routes is high when the network has many nodes. This reduces network congestion. However, the protocol requires the configuration of IP tunnels, such as IP-in IP and Generic Routing Encapsulation (GRE) tunnels between all edge nodes and intermediate nodes in the network. The number of tunnels increases in the order of N 2 in MPLS-TE networks. So from the network operational point of view this mechanism is not scalable.
In [26] authors propose an iterative algorithm to balance the low class traffic with specific probability instead of balancing it with traditional algorithms like equal cost multipath algorithm which may lead to a poorly balanced traffic, which in turn leads to network congestion and less effective network performance.
There are many ways to balance the traffics in a network. In [27] traffic sharing between multiple service paths is considered to be adopted in a hierarchical routing network. Connections with similar attributions will be assigned into several different service paths so that the network resources can be utilized more efficiently. The variable weight is also used to adjust the traffic distribution. Traffic sharing and variable weight are different methods of meeting requirement of load balancing. These methods can not only settle the problems of unreasonable use of resource caused by topology aggregation and the SPF algorithm, but also reduce the blocking probability and enhance the survivability of networks. Based on this principle, a novel routing selection algorithm VWTB is proposed in [27], which is proved to yield good routing performance.
Therefore, if the traffic vector elements are set to the values of the node weights (which are for e.g. proportional to the number of users attached to the node, as in [28], the nodes that are expected to service more customers can be guaranteed proportionally higher traffic loads. We determine the guaranteed node traffic for shortest-path routing (SPR), in order to compare it with the guaranteed node traffic for the proposed load balanced routing (LBR). In the case of the SPR, the link loads depend not only on the node traffic loads but also on the traffic-matrix elements. The worst case traffic-patterns should be found for all links, and they determine the guaranteed node traffic loads.
Directed acyclic graph (DAG) are needed for the simulation of network algorithms. Papers [29,30] presents algorithm for the construction of Independent Directed Acyclic graphs (IDAGs) which are link independent DAGS. These IDAGs are proved to achieve multipath routing load balancing with overhead of only 1 bit per packet.
In [31] authors have proposed a reactive congestion control algorithm and load balancing using IP fast routing. The main idea is to forward the packets in congestion network to only the detour paths when the congestion takes place. These packets may lead to congestion in the detour path. So to cope up from these problems, the rerouted packets are given less priority than the original packets of those paths.
The paper [32] updates the previous proposed methods by various authors on fault tolerance in MPLS networks. Recommendations of the transmission of traffic of failed LSP (Label Switched Path) by one or more failure free LSPs have been made. For the same the following issues & their solutions have been considered: 1) How to distribute the affected traffic to the failure free working LSPs?
Solution: The paper reflects the use of minimum cost flow solution for this problem by establishing a simple graph.
2) How to redirect the affected traffic to the failurefree working LSPs?
Solution: Changing the routing tables of the IP Access Network before MPLS networks for redirecting the traffic to new LSPs.
3) How to forward the affected traffic along the route of a failure-free working LSP? Solution: Using IP tunneling mechanism. 4) How to solve packet loss and disorder? Solution: Transferring the sequence number of the unsent packet to the source and there after all the packets starting from that number is transmitted by working LSPs.

Conclusion
Load balancing helps the network in many folds i.e. to remove congestion, minimize packet delay, packet loss, increase network reliability and efficiency. In this paper we surveyed various mechanisms of load balancing in IP/ MPLS networks. The main idea for load balancing is to find the optimum path to balance the load by calculating various traffic metrics. These mechanisms can be deployed in MPLS traffic engineering to support different class of services as per the service level agreement.