Design of Efficient Router with Low Power and Low Latency for Network on Chip

The NoC consists of processing element (PE), network interface (NI) and router. This paper proposes a hybrid scheme for Netwok of Chip (NoC), which aims at obtaining low latency and low power consumption by concerning wired and wireless links between routers. The main objective of this paper is to reduce the latency and power consumption of the network on chip architecture using wireless link between routers. In this paper, the power consumption is reduced by designing a low power router and latency is reduced by implementing a on-chip wireless communication as express links for transferring data from one subnet routers to another subnet routers. The average packet latency and normalized power consumption of proposed hybrid NoC router are analyzed for synthetic traffic loads as shuffle traffic, bitcomp traffic, transpose traffic and bitrev traffic. The proposed hybrid NoC router reduces the normalized power over the wired NoC by 12.18% in consumer traffic, 12.80% in AutoIndust traffic and 12.5% in MPEG2 traffic. The performance is also analyzed with real time traffic environments using Network simulator 2 tool.


Introduction
A router is a device that forwards data packets across computer networks.Routers perform the data "traffic direction" functions on the Internet.A router is a microprocessor-controlled device that is connected to two or more data lines from different networks.When a data packet comes in on one of the lines, the router reads the address information in the packet to determine its ultimate destination.Then, using the information in its routing table, it directs the packet to the next network on its journey.
NoC is commonly considered as a scalable solution for on-chip communication.A typical NoC consists of computational PE, NI, and routers [1].The latter two comprise the communication architecture.The NI is used to packetize data before using the router backbone to traverse the NoC.Each PE is attached to an NI which connects the PE to a local router.When a packet was sent from a source PE to a destination PE, the packet is forwarded hop by hop on the network via the decision made by each router.For each router, the packet is first received and stored at an input buffer.Then the control logics in the router are responsible to make routing decision and channel arbitration.Finally, the granted packet will traverse through a crossbar to the next router, and the process repeats until the packet arrives at its destination.
There are several topology architectures, specially based on mesh and torus [2].Also, the NoC routers for the new general-purpose and massive multi-cluster chips need to be programmable and adaptable in order to fulfill these architecture requirements.In addition, these routers must manage a cluster of cores, and they must support the parallel communication patterns on demand to increase the data throughput.
A router is considered as the most important component for the design of communication back-bone of a NoC system [3].In a packet switched network, the functionality of the router is to forward an incoming packet to the destination resource if it is directly connected to it, or to forward the packet to another router connected to it.It is very important that design of a NoC router should be as simple as possible because implementation cost increases with an increase in the design complexity of a router.The main aim of this paper is to reduce the power consumption and latency using wired and wireless architecture.

Related Works
The existing NoC routers developed by various researchers are still complex to support the communications among various modules within a system with heavily varying workloads and changing constraints.The existing designs report for low power consumption schemes as well as real time observability in different runtime environments.
The NoC router in [4] is based on store and forward technique, loop back mechanism.The proposed NoC is based on new error detection mechanisms suitable for dynamic NoC, where the number and position of processor elements or faulty blocks vary during runtime.The presented mechanism distinguishes the permanent and transient errors and accurately localizes the position of the faulty blocks in the NoC routers, while preserving the throughput, the network load, and the data packet latency [5].The major drawback of these architectures is when the local port has a permanent error and the IP connected to it is lost or needs to be dynamically moved in the chip because of the dynamic partial reconfiguration.The simulation results showed routing error localization close to 96% for routing errors on an adaptive algorithm based on XY in a 6 × 6 NoC.
Ranjitha et al. [6] proposed the design of 8 port router for NoC using Verilog HDL.The buffering method used here is store and forward.Control logic is present to make arbitration decisions.Thus communication is established between input and output ports.Data registers latches the data from data input based on state and status control signals, and this latched data is sent to the First in First out (FIFO) for storage.Apart from it, data is also latched into the parity registers for parity calculation and it is compared with the parity byte of the packet.An error signal is generated if packet parity is not equal to the calculated parity.The design was tested using System Verilog and the code coverage and functional coverage of Router was observed using coverpoints, cross and different test cases like constrained, weighted and directed test cases.
A low-latency wormhole router for packet-switched NoC designs, for Field Programmable Gate Array (FPGA), is presented in [7].This has been designed to be scalable at system level to fully exploit the characteristics and constraints of FPGA based systems, rather than custom ASIC technology.It achieves a low packet propagation latency of only two cycles per hop including both router pipeline delay and link traversal delay.It can also be configured in various network topologies including 1-D, 2-D, and 3-D.The architecture proposed can be easily migrated across many FPGA families to provide flexible, robust and cost-effective NoC solutions suitable for the implementation of high-performance FPGA computing systems.Two key attributes that have allowed this to be achieved are 1) high scalability, mainly in router radix in order to accommodate highly connected topologies, and 2) optimized pipeline organization in order to reduce hop delay.
Guoyue Jiang et al. [8] designed an On-Chip Networks using Hybrid Scheme.The proposed architecture in this paper supports low-Latency and Low-Power for network on chip router.The authors used virtual circuit switching technique to intermingle with circuit switching and packet switching.The multiple virtual circuit-switched (VCS) connections are allowed to share a common physical channel.The authors used their algorithm on traffic workloads which achieved 20.3% latency reduction and 33.2% power saving can be obtained when compared with the baseline NoC.
Dean Michael Ancajas et al. [9] proposed an Aging Aware Adaptive Routing Algorithm for NoC designs.The proposed algorithm reduced power-performance overheads caused due to aging degradation, and also minimized the stress experienced by heavily utilized routers and links.The authors achieved 13% and 12.17% average overhead reduction in network latency and energy delay product per flit, a 10.4% improvement in performance, and a 60% improvement in mean time to failure using our aging-aware routing algorithm.
Ou He et al. [10] used Unified Scheduling and Mapping algorithm for the design of network on chip router.The authors developed a graph model to consider the network irregularity and estimate communication energy and latency.Heuristic technique was integrated with proposed architecture speed up the network on chip.The authors achieved the improvement of 27.3% and 14.5% on the execution time with 24.3% and 18.5% lower energy.
Lee et al. [11] designed NoC, the data transferring by virtual channels can avoid the issue of data loss and deadlock.Smart Power-Saving (SPS) architecture was developed for low power consumption and low area in virtual channels.The proposed method reduced 37.31%, 45.79%, and 19.26% on power consumption and reduced 49.4%, 25.5% and 14.4% on area, respectively.Rajesh Nema et al. [12] designed Network on Chip (NoC) for the communication subsystem between intelligent property (IP) cores in a system on chip (SoC).The authors used wormhole router supporting multicast for Network-on-chip algorithm for low latency and low power consumption.This network flow control mechanism decomposed a packet into smaller flits and delivers the flits in a pipelined fashion.The authors achieved 0.02 mW of power consumption and 0.06ns delay.
Minakshi M. Wanjari et al. [13] proposed NoC Router Architecture as the communication backbone in NoC.The virtual channel buffers used in this router provided better channel utilization as well as this router has low latency and requires less area as compare to other routers.As fixed priority arbiter can be used for few requesters and there was no limit to how long a lower priority request should wait until it receives a grant so this can affected the network performance.
Soteriou et al. [14] worked on router for NoC to increase throughput of the network.They achieved up to 94% of throughput but power consumption is increased by the factor of 1.28.Lin et al. [15] addressed the bufferless flow control for lightly loaded networks.However, bufferless flow control has a great effect on communication latency.Each channel controller had two additional tasks: dynamically configuring the channel direction and to allocate the channel to one of the routers, sharing the channel.40% area overhead over the typical NoC router architecture was attained due to double crossbar design and control logic.
The conventional methodologies stated in this section were based on the wired links between the routers in NoC, which will increase the power consumption and average packet latency.The hybrid NoC proposed in this paper supports both wired and wireless links which will reduce the power consumption and average latency of the packet transmission between the routers in NoC.The remaining of this paper is organized as follows.The proposed NoC router is presented in Section 3. The simulation results and analysis of proposed work are presented in Section 4. Finally the conclusion drawn from this work is discussed in Section 5.

Proposed Method
Table 1 shows the list of abbreviations used throughout this paper.SoC architectures require high throughput and performance to transfer data in a multicore SoC.Therefore, the NoC (network-on-chip) can be proposed to solve this requirement, but it derived new problems such as power consumption and latency [1].The generic NoC architecture is shown in Figure 1.It consists of Router(R), PE and NI [1].The PE transfers the packets to NI in the NoC.The main function of the NI is to convert the packets into number of flits and these flits are transferred to the routers.The router in NoC has 3 types as corner router (CR), edge router (ER) and middle router (MR).The CR router has three I/O ports, ER router has 4 I/O ports and MR router has 5 I/O ports to transfer and receive the flits.The MR design is an important task due to its high number of I/O ports than the CR and ER.The design procedure of MR is followed by both CR and ER.The internal functions of all three routers are same except the number of I/O ports.Each I/O port has n-virtual channels (VC).The VC in the router is used  to reduce the packet loss and deadlock in the NoC.The main limitations of the VCs are that it consumes more power and area of the NoC.This paper focuses to reduce the power and area requirements of the VCs in the NoC.
Router includes transmission channel, routing computation (RC), virtual channel arbiter (VA), switch arbiter (SA), and crossbar (XBAR).The flits include header, body, and tail; the header flit has PE priority, source address, destination address, and so forth.The RC uses header flit and routing algorithms to find transmission path.VA uses two stages arbitration to select most high priority packet transmission and then will sign transmission channel.SA uses two stages arbitration and will select most body flits into XBAR to transmit.The VA will be working when the packet is arrival.The SA operation when the flit is arrival.The tail flit represents last flit, and then the router will unregister transmission channel.
Each router consists of input arbitration unit, crossbar unit and output arbitration unit as illustrated in Figure 2.Each arbitration unit consists of N-number of VC, virtual channel arbiter (VCA), Switch arbiter (SA), Routing computation (RC).In this paper, the determination of N is based on the number of I/O ports.
Each router consists of n-arbitration units.If the router is corner router, then the router consists of 3 arbitration units in input and output port of the router.If the router is edge router, then the router consists of 4 arbitration units in input and output port of the router.If the router is middle router, then the router consists of 5 arbitration units in input and output port of the router.It is the responsible for selecting the highest priority packets.The arbitration unit includes routing computation (RC), VC arbiter (VA), and switch arbiter (SA).The RC computes the routing path of the flits from the header of the flit and transfers the flits based on the priorities in the header of the flit.The VA has two functions.The first one is to select the packets from input port VC based on priority and transmits these packets to the crossbar architecture.Second one is to transfer the packets from crossbar to output port VC.
The VA contains a number of two-stage arbitrations to select packet and sign up VCs.First stage selects the local highest priority packet from input VCs to crossbar and signs up VCs.Second stage selects the global highest priority packet from input crossbar to output VCs and signs up VCs.The SA also contains a number of two-stage arbitrations to select flits for transmission.First stage selects the local highest priority flits from input VCs to crossbar.Second stage selects the global highest priority flits from input crossbar to output VCs.Router using mesh topology follows XY-routing algorithm to transmit the flits.In this paper, the XY-routing algorithm is designed for both 4 and 8-ports.The XY-routing algorithm for four port router is described in Fig- ure 4. The destination address of the flit consists of destination latitude and longitude.When the destination latitude (d_lat) and current latitude (r_lat) of router is equal and destination longitude (d_long) and current longitude (r_long) of the router is equal, then it means the flits arrival to the destination router.Otherwise, the XY-routing algorithm uses 2 stages.In first stage, if the destination latitude is greater than the current latitude of the router, then the packet is forwarded to the West port of the router; else it is forwarded to the East port of the router.In second stage, if the destination longitude is greater than the current longitude of the router, then the flits are transmitted to South port of the router, else it is forwarded to the North port of the router.The XY-routing algorithm for 4-port is shown in Figure 4.The XY-routing algorithm for eight port router is illustrated in Table 2.

Relation of Topology and Router
Table 2 shows the XY-routing algorithm for 8-port router in NoC architecture.This router consists of 8-ports as North, East, South, West, North East, North West, South East and South West.The packets are routed from one port to another port based on the conditions stated in Table 3.

Hybrid Wireless on Chip Interface
The main drawback of the existing NoC Architecture is its high latency and power consumption due to multi-hop long-distance communication among PE.This limitation is overcome by implementing a wireless on-chip communication interface between the PE as shown in Figure 5.
The conventional methods for on-chip wireless communication are optical interconnections [3], RF Interconnect (RF-I) transmission lines [4].The main limitations of on-chip optical interconnections are that it requires separate transmitter and receiver components, integration of on-chip photonic components which increases the power consumption.RF-I requires additional, physically overlaid transmission lines which serve as wave guides   Where, d_lat and d_long represents destination latitude and longitude, respectively.r_lat and r_long represents router latitude and longitude, respectively.
to enable data communication which increased latency of the transmission.In this paper, a hybrid wireless on chip communication interface is proposed to reduce both latency and power consumption.Wireless links are inserted between subnets to form express communication links by replacing baseline wired routers with routers having wireless communication capabilities.We applied algorithm 1 described in Table 2 to find optimal locations for wireless routers (WRs) so that average traversal distance is minimized.
In Table 4, Set the initial WLR coordinate as s0, initial temperature as t0, the terminating temperature as tt, f as cost function and s as current coordinates.The proposed NoC architecture supports both wired and wireless connection for transferring the packets from source router to destination router.If the source and destination router is on same subnet, then the packets are routed using the wired connection.If the source and destination are in different subnets, then source router routes the packets to the wireless router in subnet 1 using wired connection and wireless router in subnet 1 routes the packets to the wireless router in subnet 2 using wireless connection.Next, the wireless router in subnet 2 routes the packets to the destination router in subnet 2 using wired connection.The proposed hybrid routing algorithm is described in Figure 6 which supports both wired and wireless connection.

Results and Discussion
The width of wired links is assumed to be 64 bits, which is the size of a flit.Both synthetic traffic and application traffic are applied for evaluating the performance of the proposed hybrid router.For synthetic traffic (Artificial traffic), the traffic pattern is uniformly random as variable, and all packets are 4-flit long length [16].For application traffic, we applied the 3-tuple traffic generation technique.All simulations are performed in mesh network with single-cycle channels.

Experiments Based on Synthetic Traffic
For analysis the performance in synthetic traffic, the experiments are done over the transpose traffic, bitrev traffic, bitcomp traffic and shuffle traffic [17].In transpose traffic mode, each node sends messages only to a destination with the upper and lower halves of its own address transposed.Each node sends only to whose address is bit reversal of the sender's address in bitrev traffic mode.Each node sends messages only to a with one's complement of its own address under bitcomp traffic mode and each node sends messages to other nodes with an equal probability in shuffle traffic mode.Figure 7 shows the average power consumption and average latencies  for the synthetic traffic.The power and latency parameters are measured for the NoC with hybrid (wired and wireless) and wired mode.The conventional NoC router used wired links for data transmission and reception.This increases the power consumption and latency.The proposed hybrid scheme supports both wired and wireless links for data transmission, which reduces the power and latency over the conventional scheme.Compared with conventional NoC wired router, the proposed hybrid scheme reduces the average power consumption by 11.1% in shuffle traffic mode, 12% in bitcomp traffic mode, 10.7% in transpose traffic mode and 10.52% in bitrev traffic mode.Specifically, the improvement of the proposed hybrid scheme depends on the traffic pattern.The proposed scheme can significantly reduces the latency over the conventional wired NoC router by 30% in shuffle traffic mode, 11.25% in bitcomp traffic mode, 12.85% in transpose traffic mode and 13.3% in bitrev traffic mode.The proposed hybrid system performance is analyzed with respect to various traffic modes as shuffle, bit comp, transpose and bitrev as stated in Guoyue Jiang et al. [8].The performance analyzed against with wired and wireless connection modes.Although our proposed hybrid scheme can deliver lower power than conventional design, the gaps between conventional and the proposed hybrid scheme are small under shuffle, transpose, and bitrev traffic at low injection rate.However, power consumption is high under bitrev traffic.Figure 8 shows the analysis of average latency of the proposed hybrid router with respect to various injection rates.The performance is also analyzed with different traffic modes.

Experiments Based on Real Time Traffic
The performance of the proposed hybrid scheme is further analyzed on real time traffic.Experiments are conducted over the three real time applications as Consumer, AutoIndust and MPEG2.Consumer and AutoIndust are the E3S benchmark suite [16] while MPEG2 is the stream program The size of each packet is eight flits and size of each flit is 64 bits length.The average packet latency and normalized power consumption over the real time traffic are given in Table 5 and Table 6.Compared with conventional wired NoC router, the proposed hybrid scheme reduces the average packet latency by 12.05% in consumer traffic mode, 13.1% in AutoIndust traffic mode, and 10.93% in MPEG2 traffic mode.It also reduces the average packet latency over the baseline router by 13.54 in consumer traffic, 21.05% in AutoIndust traffic mode and 16.5% in MPEG2 traffic mode.The proposed hybrid NoC router reduces the normalized power over the wired NoC by 12.18% in consumer traffic, 12.80% in AutoIndust traffic and 12.5% in MPEG2 traffic.It also reduces the normalized power over the baseline NoC router by 14.06% in consumer traffic, 14.91% in AutoIndust traffic and 14.61% in MPEG2 traffic.

Conclusion
The hybrid NoC router is proposed in this paper which supports both wired and wireless configurations.The proposed hybrid scheme is tested on various traffic modes such as synthetic and real time environments.The performance of the proposed method is analyzed using normalized power consumption and average packet latency.The proposed hybrid NoC router reduces the normalized power over the wired NoC by 12.18% in consumer traffic, 12.80% in AutoIndust traffic and 12.5% in MPEG2 traffic.It also reduces the normalized power

Figure 3
Figure 3 shows different topology of network on chip.Each router uses three different topologies to transmit the

Figure 3 .
Figure 3. NoC topology (a) Ring topology, (b) Mesh topology, (c) Star topology.flits as mesh topology, ring topology and star topology.Figure 3(a) shows the Ring topology where all the routers are connected in a ring.Figure 3(b) shows the Mesh topology where all routers are connected via a predefined pattern.Figure 3(c) shows the Star topology, where all the routers are connected to the central router in a bidirectional manner.Router using mesh topology follows XY-routing algorithm to transmit the flits.In this paper, the XY-routing algorithm is designed for both 4 and 8-ports.The XY-routing algorithm for four port router is described in Fig-ure 4. The destination address of the flit consists of destination latitude and longitude.When the destination latitude (d_lat) and current latitude (r_lat) of router is equal and destination longitude (d_long) and current longitude (r_long) of the router is equal, then it means the flits arrival to the destination router.Otherwise, the XY-routing algorithm uses 2 stages.In first stage, if the destination latitude is greater than the current latitude of the router, then the packet is forwarded to the West port of the router; else it is forwarded to the East port of the router.In second stage, if the destination longitude is greater than the current longitude of the router, then the flits are transmitted to South port of the router, else it is forwarded to the North port of the router.The XY-routing algorithm for 4-port is shown in Figure4.The XY-routing algorithm for eight port router is illustrated in Table2.Table2shows the XY-routing algorithm for 8-port router in NoC architecture.This router consists of 8-ports as North, East, South, West, North East, North West, South East and South West.The packets are routed from one port to another port based on the conditions stated in Table3.

Figure 3 (
a) shows the Ring topology where all the routers are connected in a ring.

Figure 3 (
b) shows the Mesh topology where all routers are connected via a predefined pattern.

Figure 3 (
c) shows the Star topology, where all the routers are connected to the central router in a bidirectional manner.

Input:
Network and subnet size Output: location coordinates of WLR If t(current temperature) > tt then Assign variable £ = f(s) − f(s0) and if £ < 0 then Set s0 = s; Else Select the random number r between 0 and 1 Determine ε = exp(−£/t) Check if r < ε then Set s0 = s; Return (s0)

Figure 7 .
Figure 7. Power consumption analysis of the proposed NoC router.

Figure 8 .
Figure 8. Latency analysis of the proposed NoC router.

Table 1 .
List of Abbreviations used in this paper.

Table 5 .
Average packet latency of real time traffic.

Table 6 .
Normalized power consumption of real time traffic.the baseline NoC router by 14.06% in consumer traffic, 14.91% in AutoIndust traffic and 14.61% in MPEG2 traffic.The proposed scheme can significantly reduces the latency over the conventional wired NoC router by 30% in shuffle traffic mode, 11.25% in bitcomp traffic mode, 12.85% in transpose traffic mode and 13.3% in bitrev traffic mode.The performance of hybrid NoC router is superior to the conventional baseline and wired NoC router by incorporating wired and wireless subnets in NoC. over