Load Balancing for Hex-Cell Interconnection Network

The hex-cell is one of the interconnection networks used for parallel systems. The main idea of the hex-cell is that there are hexagon cells that construct the network; each one of those cells has six nodes. The performance of the network is affected by many factors one of the factors as load balancing. Until the moment of writing of this paper, there is no load balancing algorithm for this network. The proposed algorithm for dynamic load balancing on hex-cell is based on Tree Walking Algorithm (TWA) for load balancing on tree interconnection network and the ring all to all broadcast.


Introduction
Hex-cell is newly proposed interconnection network in (2008) [1].The researches that evaluate the performance of the hex-cell are not enough and it should get more attention because it has potentials for parallel systems.Since there is no load balancing algorithm on hex-cell topology (until the moment of writing of this paper), the aim of this paper is to propose a dynamic load balancing algorithm and evaluate it.
The proposed algorithm is based on Tree Walking Algorithm (TWA) for load balancing on tree interconnection network and the ring all to all broadcast, using the SBHCR (Section Based Hex-Cell Routing Algorithm) addressing schema proposed in that divides the hex-cell into six sections [2].
The rest of the paper is arranged as the following.Section 2 is the related works that we build our work on it include hex-cell network topology, Tree Walking Algorithm (TWA) and ring all to all broadcast.Section 3 is the proposed load balancing algorithm and example to illustrate the way the algorithm works.Section 4 is the simulation for the algorithm.Finally, Section 5 is the conclusion of the paper and the future work.

Hex-Cell Network Topology
Hexogen units create hex-cell topology; each one of those cells has six nodes.The depth of the network is the number of levels around the innermost cell denoted by HC (d) where d is the depth.So the innermost cell has depth of one, the six cells around it form level two, then the next twelve cells make the level three and so on [1], As shown in Figure 1.
There are three addressing for the hex-cell topology.First addressing depend on the number of the line that the node stand on from top to down and the number of node in that line from left to right [1], which denoted by pair (X, Y) where X is the line number and Y is the node number in that line.So if the node in the third line and has position of ten then the node label is (3,10).As shown in Figure 2.
The second addressing of the hex-cell is by dividing the network into six sections label from one to six left to right (clockwise).Each node takes a label consist of three numbers (S, L, X) where S is the section number, L is the level number and X is the node number in that level while X is not bigger than ((2 × L) − 1) [2].So if we have a node in section three, in level two and number in that level is three then the node label is (3,2,3).As shown in Figure 3.
The third addressing is by using the level of the node and the node number in that level.Which denoted by pair (X, Y) where X is the level number and Y is the node number while the Y is less than 6 × (2 × L − 1) where L is the node level [3].So the node in the third level and has number of thirty then (3,30) as shown in Figure 4.

Tree Walking Algorithm (TWA)
The TWA uses global information collection to know the accumulative load at each subtree and the number of the nodes at each subtree.After that the root node calculates the average load and broadcast it.Then each node calculates the quota for subtree for which it is the root.Finally start the exchange of tasks between the nodes [4].
To explain how TWA works Figure 5 shows tree with seven nodes.Each node has number of tasks.First after the global information collection, each node has number of tasks (W i ) and number of nodes in the subtree it rooted (N i ).Then each node calculate the total number of tasks in the subtree (SW i ).The root node (N 0 ) calculates the average load and the remaining task (R) that cannot be evenly divided on nodes.Then broadcast average load and R to all the nodes.Next each node calculates the quota for itself (Q i ) and the subtree it rooted (SQ i ).Table 1 shows the values for each node.
The flow of steps is presented in Figure 6.

Ring All to All Broadcast
In all to all broadcast what needed is that each node sends the same message to all other nodes.That kind of       broadcast on ring topology need (n − 1) communication steps where n is the number of nodes.
To explain how all to all broadcast on ring topology works Figure 7 shows ring topology consist of six nodes.
Since the number of nodes equal six then the number of communication steps needed for all to all broadcast is five (6 − 1 = 5).First step each node sends its message to the next node clockwise.Then after that in the following four steps each node sends the recent message received from the previous node [5] as shown in Figure 8.

LBHC Proposed Algorithm
The proposed load balancing algorithm depends on the TWA on tree and all to all broadcast on ring.The hexcell interconnection network in [2] is divided into six sections.We will use each section as tree topology and the root node for each tree construct ring topology of six nodes, as shown in Figure 9.

Phase Three: Broadcast the Average Load
Broadcast the average load tasks and the remaining tasks for each node in the tree section.Each node calculates its quota: 1) Node quota = average load tasks + 1 IF node order < remaining tasks.
2) Node quota = average load tasks IF node order ≥ remaining tasks.
For the each subtree quota: Subtree quota = (Average load tasks × Number of nodes in the rooted subtree) + Remaining tasks for the rooted subtree.

Phase Four: TWA Balancing
Trees with different load tasks do: 1) Trees with extra tasks apply TWA so that the extra task goes to the root node.
2) Trees with exact quota just apply TWA.
3) Trees with fewer tasks than its quota do TWA to complete its node from bottom to top with the right number of tasks (the node quota).

Phase Five: Ring Balancing
If the root node has extra tasks then sends the extra tasks to the next root else it sends Zero tasks to the next root node.

Phase Six: Final Balancing
Apply TWA again to balance the received tasks.

LBHC Example
Here is an example on the proposed LBHC algorithm.Figure 10 shows hex-cell network and each node has number of tasks.

Phase One: Global Information Collection
Each tree does the global information collection and compute total number of tasks and number of nodes in each subtree.The following six tables (Tables 2-7) show that information.

Phase Two: All to All Ring Broadcast
Each root node sends the maximum average and minimum average.Then evaluate the task number and it is efficient to do global balancing since maximum quota is 18 and the minimum is 13.(Since we have chosen 5 tasks different between the highest average tasks load and lowest average tasks load to check the efficiency to do  global balancing between the sections trees).So calculate the new tree quota.Table 8 shows the information broadcast to all root nodes.
So average load = total task/6.Average load = 856/6.Average load = 142 and TR = 4. Tree quota = tree average load + 1 IF section ≤ TR.Tree quota = tree average load IF section > TR.Then the quota for trees in Sections 1 to 4 is 143 while for trees in Sections 5 and 6 is 142.Each tree calculates the quota for each node and subtrees using the new average load.As following: The average load = 143/9 = 15 and the remaining (R) = 8.The average load = 142/9 = 15 and the remaining (R) = 7.

Phase Three: Broadcast the Average Load
Broadcast the new average load tasks and remaining to the nodes in the tree sections and calculate the new quota.
Q i = (The average load + 1) IF i < R. Q i = (The average load) IF i ≥ R. Tables 9-14 show the new average load for each node and subtrees.

Phase Four: TWA Balancing
Trees 2, 3 and 6 have extra tasks so we apply TWA on them.As an example we take tree Section 2. Figure 11 shows the section two trees with the tasks load it has.Now the exchanging of tasks between the nodes is done as the following: Each node i waits to receive tasks from its parent if (SW i < Q i ) or from its children j if (SW j > Q j ) else the node i send tasks to its parent if (SW i > Q i ) or to its children j if (SW j < Q j ).
Seven communication steps needed to perform the balancing: 1) Send 13 tasks from node 4 to node 3.
Trees 1, 4 and 5 have fewer tasks than its quota so we apply TWA to complete its node from bottom to top with the right number of tasks (the node quota) and wait for more tasks.As an example we take tree Section 4. Figure 13 shows the tree Section 4 that has fewer tasks than its quota.Now the exchanging of tasks between the nodes is done as the following: Each node i waits to receive tasks from its parent if (SW i < Q i ) or from its children j if (SW j > Q j ) else the node i send tasks to its parent if (SW i > Q i ) or to its children j if (SW j < Q j ).
Six communication steps needed to perform the partial balancing: 1) Send 8 tasks from node 0 to node 1. Send 1 tasks from node 4 to node 3.
Send 13 tasks from node 8 to node 7.
As shown in Figure 14.
Until now, we have partial balanced hex-cell network.Figure 15 shows the hex-cell partial balanced network.

Phase Five: Ring Balancing
Here, the root nodes start to send extra tasks among between them (Clockwise) starting from root node in tree Section 1.
1) Root node in tree Section 1 will send 0 tasks since it waits for tasks.
2) Root node in tree Section 2 will not receive any tasks and it will send the extra tasks (14 tasks) to the root node in tree Section 3.
3) Root node in tree Section 3 receives 14 tasks, since it has extra tasks (11 tasks) it will add the extra tasks to the received tasks (total extra tasks 25) and send it to root node in tree Section 4.
4) Root node in tree Section 4 will receive 25 tasks, while it has fewer tasks load than it quota then it will take the appropriate number of task (23 tasks) and send the rest (2 tasks) to the root node in tree Section 5.
5) Root node in tree Section 5 will receive 2 tasks, while it has fewer tasks load than it quota then it will take the appropriate number of task (1 task) and send the rest (1 tasks) to the root node in tree Section 6. 6) Root node in tree Section 5 will receive 1 task, since it has extra tasks (9 tasks) it will add the extra tasks to the received tasks (total extra tasks 10) and send it to root node in tree Section 1.     7) Root node in tree section 1 will receive 10 task, and its quota is completed now.As shown in Figure 16.

Phase Six: Final Balancing
After the root node in tree section 4 received the tasks to complete its quota as shown in Figure 17.The tree will apply TWA again to balance the received tasks.
One communication step needed to perform the balancing: 1) Send 7 tasks from node 0 to node 1.As shown in Figure 18.
Finally the hex-cell network tasks load is balanced as shown in Figure 19.

Simulation
The simulation is done using JAVA programming language version (1.8.0_51) 64-bit, using multi-threading to simulate each node.The hardware specification for our simulation are: 1) Processor: Intel(R) Core(TM) i5-2450 M CPU @ 2.50 GHz.
As for the simulation we have chosen 5 different tasks between the highest average tasks load and lowest average tasks load to check the efficiency to do global balancing between the sections trees.
The major factor to evaluate load balancing algorithm is the accuracy.After the simulation with different levels and different inputs for the load tasks for each node, LBHC proved to be effective.In all runs, the difference between the highest task load and the lowest task load is one.(Unless it is not efficient to do global balancing where the difference will be the factor we choose and that is 5).
The execution time for any algorithm is one of the important factors to evaluate the performance.So Figure 20 shows the average execution time for LBHC in various levels, from level one (hex-cell has 6 nodes) to level ten (hex-cell with 600 nodes).
Another factor we have studied is number of messages in the network while applying the LBHC algorithm.Figure 21 shows the average number of messages in various levels from level one (hex-cell has 6 nodes) to level ten (hex-cell with 600 nodes).

Conclusion and Future Work
In this paper, we have proposed a dynamic load balancing algorithm for hex-cell interconnection network that   is based on Tree Walking Algorithm (TWA) and the ring all to all broadcast, using SBHCR addressing schema that divides the hex-cell into six sections.As the simulation shows, this algorithm has good performance in execution time and number of message compared with the number of nodes in the network.But since there are no other dynamic load balancing algorithms on the hex-cell, we could not compare it any algorithm.For future

3. 2 . 2 .
Phase Two: All to All Ring Broadcast Each root node in the ring topology broadcast total task and average load to all other root nodes in the ring.Evaluate average load between the trees to check if it's efficient to do global load balancing.If it's efficient to do global balancing then new global average load tasks and remaining tasks for each tree calculated as following: 1) The new average calculated by all root nodes: a) Calculate new global average load tasks (accumulate task/6).b) Calculate new global remaining tasks (accumulate task Mod 6).2) Tree task quota calculated by all root nodes: a) Tree task quota = global average load + 1 IF section ≤ global remaining tasks.b) Tree task quota = global average load IF section > global remaining tasks.3) Average load tasks and remaining tasks for each node calculated: a) Calculate new average load tasks (global average load/tree nodes).b) Calculate new remaining tasks (global average load Mod tree nodes).If is not efficient to do global balancing leave the average load and remaining task for each tree as it is.

Figure 8 .
Figure 8.All to all broadcast on ring topology.

Figure 9 .
Figure 9. TWA trees and ring topologies in hex-cell.

Figure 12 .
Figure 12.TWA balancing on tree Section 2 with extra tasks.

Figure 13 .
Figure 13.Tree Section 4 with fewer tasks then it quota.

Figure 16 .
Figure 16.Root nodes from the sections sending tasks.

Figure 17 .
Figure 17.Partial balanced tree Section 4 received more tasks.

Figure 18 .
Figure 18.Tree section 4 apply TWA again to balance the received tasks.

Figure 20 .
Figure 20.Average execution time in nanosecond for different levels.

Figure 21 .
Figure 21.Average number of messages in the network.

Table 1 .
Calculate values for each node.

Phase Two: All to All ring broadcast If
For each child in the node children { Receive global information from the child} If node has parent { Send global information to the parent } Else { Calculate average load (accumulate task / accumulate nodes) Calculate remaining load (accumulate task Mod accumulate nodes)} node is root node { For six loops { Send node global information to the next root node Receive global information from the previous root node} // Evaluate tasks load If (maxAverage − minAverage > 5) { Calculate new global average load tasks (total task load / 6) Calculate new global remaining tasks (total task load Mod 6) Calculate new average load tasks (global average load / tree nodes) Calculate new remaining tasks (global average load Mod tree nodes)} } Phase

Three: Broadcast the Average Load
If ((Actual accumulated tasks < Quota) {Receive tasks from the parent}} For each child in the node children { If ((Actual accumulated tasks for the child > Quota) { Send (extra tasks or Zero tasks) to the next root node Receive extra tasks from the previous root node}} Phase Six: Final Balancing If node has parent { If ((Actual accumulated tasks < Quota) { Receive tasks from the parent}} For each child in the node children { If ((Actual accumulated tasks for the child < Quota) { Send tasks to the child}} 3.2.LBHC Phases 3.2.1.Phase One: Global Information Collection Local information collection in each tree topology separately, as in TWA.