Study of Modeling for Scalable and Monitorable Network on Chip

The performance of multiple processor based on Network on Chip (NoC) is limited to the communication efficiency of network. It is difficult to be optimized of routing and arbitration algorithm and be assessed of performance in the beginning of design because of its complex test cases. This paper constructs a scalable and monitored system level model with SystemC for NoC with Packet Connected Circuit (PCC) protocol. The overall performance and transfer details can be evaluated particularly by running the model, and the statistical basis can also be provided to the optimization of designing NoC.


Introduction
Network on Chip (NoC) becomes an effective intercomnect among multiple processor due to its excellent expandability.In multi-processor structure, the performance of system depends on software parallelism, memory access efficiency and communication efficiency.The degree of software parallelism relates specific application.However, in the structure of NoC, storage resources and processors, as resource nodes, connect with the communications node of NoC, and their efficiency of memory access and communication are limited to the transmission efficiency of NoC.
The main factors which affect transmission of NoC include topology, arbitration algorithms, routing algorithms and network protocols.In engineering applications, topologies which are easy to be implemented in manufacturing process and whose latency distributed evenly are usually chosen.Arbitration algorithms are used to deal with requests coming from input channels of routing nodes by specific rules.Routing algorithms calculate the position related to local routing nodes and destination nodes by reading the information in request which obtains arbitrations, and select one output channel according to the request information and the current statuses of output channels.Network protocols, usually associated with specific applications, are the standards which should be abode in NoC transmissions, which involve the format of data packets, the coding rules of controlling informa-tion and handshake mechanism between routing nodes.When the interconnect solutions are being designed, topology, usually as the hardware factor in network, is decided first.And network protocol also is the priority factor because of its higher relevance with specific applications.The decision of arbitration algorithm and routing algorithm should be traded off resource consumption and flexibility.
Diversity of choosing topology, network protocol, arbitration and routing algorithm make NoC have strong extendibility.But that also makes the choosing network to fulfill specific engineering application become more complicated.For the more relatively precise, using quantitative indicators to evaluate the performance of network communication becomes an effective method to assess the pros and cons of NoC.The quantitative indicators of network performance statistics from a large number of transmissions need a fast and flexible simulation platform to support.The hardware implementation of NoC demands the transmission distribution to be as evenly as possible, so that local overheating of chip could be avoided and the efficiency could be improved.However, it is hard to make the distribution of transmission between processor evenly in practical application, and the communication task partitioning only depends on rough estimation.It is difficult to carry out this estimation for the network with a mixed arbitration and routing algorithms.This paper presents a system model with SystemC [1] on NoC.The model achieves the scalability of network topology with an automatic interconnection algorithm, and integrates a variety of arbitration algorithms, routing algo-rithms and monitors which can observe the real-time communication on NoC.By running this model, the overall performance can be evaluated, and the transmission hotspot can be given by monitoring the real-time transfer on the routing nodes.By tracing the transmission paths, some key data can be supplied for designing routers with a mixed arbitration and routing algorithms.

Scalability Modeling
Mesh is a topology with strict orthogonal coordinate and is conducive to implement by Integrated Circuit planar process.Packet Connected Circuit (PCC) protocol is suitable to communications, media, digital signal processing and network processing applications which demand high-bandwidth, low latency connection technology [2,3].This paper researches the scalability of NoC by establishing a system model with mesh topology and PCC protocol.

Automatic Interconnection Algorithm
All position and input-output relationship of nodes in 2D-mesh construct is shown Figure 1, which is named the "standard" interconnection.Nine nodes are labeled as R00, R10, •••, R22.Any routing node in M × N (M, N ≧ 2) 2D-mesh corresponds to the node in "standard" interconnect.The module in M × N 2D-mesh can be linked automatically as follows.The node types corresponding to "standard" interconnect in real network should be judged first according to the network's scale and actual coordinates.Then the channels' number and code (numbered L, N, S, W and E.) are initialized according to the node type for each node in real network.Node ports with code W and E are bound to adjacent node ports in X-axis and so do ports with code N and S in Y-axis.For every routing nodes, whose port with code L is bound to the port of Local System.As relating to X and Y directions, automatic interconnection could be actualized by two double for loop.The automatic interconnection algorithm is based on position mapping according to the node type in real network.Similarly, this method can be applied to other network topology, and the automatic interconnect has extended to 3D-mesh network.

Arbitration Algorithm Design
In the system modeling, three arbitration algorithms are adopted for routing nodes according to the priority policies.They are fixed priority algorithm, rotating algorithm and random priority algorithm.
Fixed priority algorithm assigns a fixed priority number for each input channel in descending order of direction L, direction N, direction S, direction W and direction E.
Rotating algorithm assigns a priority number for each input channel in a rotation order and every input channel has an initial priority.When arbitration requests are received, the routing nodes select one input channels with the highest priority in the input channels which have received request packet, and allow the input channels with the highest priority to send routing request to the target output channel.Then the priority of input channel which has the highest priority previous reduces to the minimum and the relative order of other input channels' priorities keep invariable.
Random priority algorithm assigns a random priority number for each input channel.When arbitration requests are received, the routing nodes select one input channels with the highest priority in the input channels which have received request packet, and allow the input channels with the highest priority to send routing request to the target output channel.Then the priorities of all input channels are reallocated as a random order.
In the system modeling, the three arbitration algorithms are implemented by calling the routing nodes modules' member functions.So, arbitration algorithms can be extended by writing functions to keep the environment consistent with the function call interface.

Routing Algorithm Design
Taking into account the logical resource consumption, three routing algorithms which are easy to be implemented on hardware are adopted.They are X-Y routing algorithm, distributed routing algorithm and adaptive routing algorithm.
In X-Y routing algorithm, mesh topology is divided into two mutual orthogonal dimensions: X dimension and Y dimension, and each dimension is marked with coordinate of routing node.When routing starts, routing node calculates the dimension offsets of the source node to target node, and the dimension offsets decrease as the routing conducted.X-Y routing algorithm keeps to the rule that X dimension offset reduce to zero first, then Y dimension offset reduce to zero [4].
Distributed routing algorithm selects transmission path according to the status of routing nodes' output channels.The transmission path from source node to target node is not unique, and the length of each path is not the same.In order to reduce the complexity of distributed routing algorithm, while ensuring the shortest transmission path, fully distributed algorithm in this design is limited convergent.That is the distance from nodes along the path to target node reduce as the transmission conducted.In order to alleviate the congestion of network when X-Y routing algorithm is adopted, the algorithm selects Y dimension first when there are two optional output channels.
Adaptive routing algorithm selects transmission path according to the utilization rate of idle output channels.The algorithm selects the output channel used fewer to balance the load as possible.In addition, adaptive routing algorithm also keeps to the convergence limit.
As it is similar to arbitration algorithms, routing algorithms are implemented by calling the routing nodes modules' member functions.So, routing algorithms can be extended by writing functions to keep the environment consistent with the function call interface.

Monitorable Modeling
The monitoring of NoC includes the monitored of overall performance, network transmissions and specific transmission path.In the design the average throughput and average packet delay could be monitored.

Monitoring of Performance
Throughput is the measure of data transmission rate on computer and communication system.Usually, based on the data bits or the number of groups which could be handled in one second, throughput is an overall assessment which measures the ability to handle the data transfer request for a system and its components.In the modeling, throughput is defined as the ratio of the time spent on transferring information packets and total time including transferring time and waiting time when transmission conflicts happen.In order to evaluate the overall performance of the network, average throughput is defined as the ratio of the total communication packets and the max time in all transmissions [5], with the unit of bits/s.
Packet delay reflects the transmission speed of network on the other hand, which is defined as the ratio of the time lag and the number of total packets.The time lag is from the request packets sent from source node to the end packet received by target nodes, which includes transmission time and resend time when transmission conflicts happen.Average packet delay is defined as the ratio of the total packets delay and total packets, with the unit of cycles/flit.
The data type sc_time in SystemC supports the description for simulation time.When testing the network, Local System is taken place by communication transmitter and receiver.When the request from the transmitter is sent to network, the simulator records the start time, and records the end time when the end packet is received by receiver.In order to test the average performance when the network works stably in full loads, a communication mode is designed as follows: for a routing node in N*N 2D-mesh with coordinate P(i, j), if i is not equal j, the transmissions are conducted between P(i, j) and P(j, i), and if i is equal j, the transmissions are conducted from P(i, j) to P(I + 1, j + 1).When i and j are equal to N -1, the transmissions are conducted from P(N -1, N -1) to P(0, 0).As mentioned above, there are N*N pairs of transmissions being conducted simultaneously in NoC and the network is congested deeply.Ten times transmission time and packets' number are recorded to calculate average throughput and average packet delay.

Network Communication Real-Time Recording
The real-time recording of network communication includes records of arbitration information, routing information and transmission information.Arbitration information includes arbitration time, arbitration algorithms and arbitration results, which comes from arbitration when the arbitration module receives arbitration requests from the nodes' input channels.Routing information includes routing time, routing algorithms and routing results, which comes from routing when the routing module receives routing requests from the output of arbitration module.Transmission information includes the setup time and the closing time recorded by output channels and feedback from target nodes.In the modeling, the information mentioned above is recorded instantaneously by each routing node and the records saved as running log files to be handled later.

Tracing of Transmission Path
Transmission path is a sets of routing node coordinates, which refer to the coordinates of starting node, passing nodes and target node during the transmission from source to target.In the modeling, the tracing of transmission paths come from the handling on the log files by filtering program.From the transmission path, the arbitration algorithms, routing algorithms and congestion of passing nodes can be got easily, which can be used to optimize the distributions of arbitration algorithms and routing algorithms for the passing nodes.

Experiments
In order to quantify the relationship of network performance and network parameters, under the test mode mentioned in 3.1, the network performance is tested at specific combinations of network parameters such as arbitration algorithms and routing algorithms.With the purpose of getting more accurately simulation for real network, injection rate is inducted to evaluate whether the network is saturated.Injection is defined as the ratio of packets length and the sum of packets length and delay between transmissions.
For 6*6 network with injection 50%, the relationships of average throughput and average packet delay and packet length are tested, with results shown as Figures 2 and 3.By comparing the two figures it can be seen that some combinations of arbitration algorithms and routing algorithms are better than others owing to its higher average throughput and lower average packet de-lay.These combinations are Fixed-Distributed (the abbreviation of fixed priority algorithms with distributed routing algorithms, similarly hereinafter.),Rotating-Distributed, Random-Adapt and Random-Distributed.
For 6*6 network with packet length of 4000 flits, the relationships of average throughput and average packet delay and injection rate are tested, with results shown as Figures 4 and 5.By comparing the two figures, it can be seen that the combinations of arbitration algorithms and routing algorithms mentioned above can adapt to the application of the high-density communication.And the best injection rate is between 40% and 50% with the feature that the average throughput is no longer increasing with the injection rate significantly increased, while average packet delay increases significantly.

Conclusion
This paper presents a system model with SystemC for    NoC and expounds the scalability of topology, arbitration algorithms and routing algorithms.On the base of monitoring the network communication instantaneously, the data transmission paths are provided by handling the running log files.With average throughput and average packet delay, the overall performance and detailed transmission could be evaluated and optimized, providing statistics for the design of NoC.