Area Efficient Sparse Modulo 2 n − 3 Adder

This paper presents area efficient architecture of modulo 2n − 3 adder. Modulo adder is one of the main components for the implementation of residue number system (RNS) based applications. The proposed modulo 2n − 3 adder is implemented effectively, which utilizes parallel prefix and sparse concepts. The carries of some bits are calculated with the help of sparse approach in log2n prefix levels. This scheme is implemented with the help of idempotency property of the parallel prefix carry operator and its consistency. Parallel prefix structure contributes to fast carry computation. This will reduce area as well as routing complexity efficiently. The presented adder has double representation of residues in {0, 1, and 2}. The proposed adder offers significant reduction in area as the number of bits increases.


Introduction
Residue number system (RNS) is a classical and a non weighted number system [1].RNS divides the given number into collection of small numbers, which significantly improves the speed of operation; the result is obtained by reverse conversion [2].RNS has plenty of applications in different fields, e.g., digital signal processing (DSP) for filters, convolution, FFT transforms [3]- [7], cryptography [8], image processing for wavelet transforms [9]- [11], error detection and error correction [12], fault tolerance signal processing properties [13] and communication [14].
An RNS is specified by set of moduli { } i.e. the least non negative remainder of the division of A by m k .The dynamic range is denoted by M, which is defined as a product of moduli set [1].The re-sidue number system also has a lot of applications in the field of arithmetic operations like addition, subtraction, multiplication [15].The most widely used moduli set is { } 2 1, 2 , 2 1 n n n − + [16].To increase the dynamic range of RNS, the moduli set is increased further to { } 2 1, 2 3 n n ± ± [17].L. Kalampoukas in [18] has proposed a new design in the view of modularizing to generate and propagate a factor in place of conventional end around carry scheme (EAC).This adder has parallel prefix carry computation structure which reduces the number of stages, leading to optimize in the speed and area for 2 n − 1 modulo addition.H. T. Vergos et al. [19] proposed a new architecture which eliminates double parallel-prefix computation problem and customizes modulo 2 n + 1 addition.The design offers reduction in cell area, wiring complexity and power consumption in conjunction with high speed of operation with the concept of sparse modulo 2 n + 1 adder which is based on the extension of eminent idempotency property of prefix operator.Latency compatible parallel prefix modulo 2 n − 3 adder is presented in [20] to include extra modulus term.In this, design technique of [18] is extended and modified for the difficulties occurred in derivation of generate and propagates signals formula with variable-weight end around carries.

Main Contribution
Double representation for modulo (2 n − 3) i.e. (0, 1, and 2) is explained in [21] where ripple carry addition strategy is used.In this paper we propose a modulo 2 n − 3 adder which uses the concept of parallel prefix sparse adder.Parallel prefix approach has better compatibility with modulo (2 n − 1).Sparse parallel prefix adder is endorsed for large word-lengths addition, curtails the wiring and area design without affecting the delay.
The proposed adder has lesser area as compared to existing modulo 2 n − 3 adder [20].
This paper is organized as follows: Section 2 describes basics of parallel prefix addition.In Section 3, modulo 2 n − 3 adder is discussed.Section 4 explains about sparse concept for modulo 2 n − 3 adder.Finally, unit gate area and unit gate delay are calculated in Section 5.

Basics of Parallel Prefix Adder
Parallel-Prefix adder (PPA) performs parallel addition which plays a key role in microprocessors, DSP, mobile devices and other high speed applications.Parallel-Prefix structure reduces logic complexity and delay thereby enhancing the performance in term of area and power dissipation.Let the two inputs are A, B described as where ⋅ , + and ⊕ symbols are used to represent the logical AND, OR, XOR operations.Second stage of network defines carry computation unit, where we use two different types of operators that are and .The operation performed by these operators is as follows [22].
, , , , The equations that are useful for generation of carry network [23] are: : : : : : , , , , , , In the above expression The third stage is an "xor" operation of half sum bits and previous carry to get the final sum.For the design of large word length adders the concept of sparse is used [24].In sparse PPA, instead of generating carry for every bit, it generates the carry for every k th bit therefore it is called sparse-k parallel prefix adder.Figure 2
Where i u is the half adder sum output of i A and i B ,    1:0 n G − ′ represents end around carry for the next stage.The alternative approach has been presented for modulo adder using PPA structures [20].It had given that i th carry expression in the case of modulo 2 n − 3 adder is as follows: where, The sum expression for bit position one is From above expression, the carries can be calculated by propagate and generate bits.Figure 5(a) shows modulo 8  2 3 − regular parallel prefix (RPP) adder structure [20].
The RPP is differing with modulo 8 2 1 − having half carry-save stage for preprocessing, one bit in "zero" position before enforcing the EAC and two carries enter into the position "one" after EAC enforcement.Figure 5(b) represents modulo 8 2 3 − total parallel prefix (TPP) adder structure [20].TPP is same as RPP.The only difference is that we have 1 one gate more delay than other carries.The sum S 1 is implemented with the help of multiplexer taking For the rest of bits the sum expression calculated using exclusive-OR gate.
The delay offered by RPP adder structure is more as compared to TPP adder structure due to extra prefix level.The TPP structure has a disadvantage of routing complexity as well as excessive area problem as the bit length of adder increases.

Sparse Modulo 2 n − 3 Adder
In this segment, we proposed modulo 2 n − 3 adder by utilizing the concept of integer  sparse-4 PPA in which the same carry select adder, used to implement sparse modulo 2 n − 3 adder.In sparse-4, the carry is generated for every 4 th bit.We are using carry select adder for modulo operation so we are required to show that the rest of carries are associated with available ones.
From the general carry expression given in Equation (8) Let n = 32 bit, the carry expression 14 C can be derived by available 12 C written as: We can also write it as: ( ) This can also be expanded as: ( ) By the formula of Rearraging the redundant terms given in [23].
( ) At last, the carry expression 14 C in terms of 12 C is written as: From above expression we conclude that this relation is quite similar to integer adder.
Therefore we can directly use carry select block Figure 2

Performance Analysis and Comparison
The theoretical area and delay analysis is explained in terms of area (∆a) and delay (∆g) of basic 2-input gates.From the concept of unit gate model, basic 2-input AND, OR, NAND, NOR are assumed as single unit gate (∆a, ∆g), whereas exclusive-OR & exclusive-NOR and assumed to be double unit gate (2∆a, 2∆g) [15].The area and delay of Inverters and buffers are not taken into account in unit gate model.
The delay offered by proposed sparse modulo 2 n − 3 adder is same as [20].Table 1 shows the estimated gate delay and gate area of proposed adder as function of bit length n.
Table 2 shows the unit gate delays and unit gate areas for different values of n of proposed adder and also shows the percentage reduction in area in comparison with [20].
The percentage reduction in area increases as the number of bit length increases.We have also elaborated proposed work with HDL code written on Xilinx 14.7 and verified for correctness using simulation tests.Number of lookup table (LUTs) count is given in Table 3 for n = 8 which measures the area utilization for proposed adder.

Conclusion
In this paper, we have proposed an area efficient sparse modulo 2 n − 3 adder which plays an important role in verity of computer applications.The efficiency in term of area of proposed adder is explained by using the concept of unit gate model.For different value of n (=8, 16, 32, 64), the percentage area reduction is (=2.3, 13.2, 21, 27.54) respectively with same delay.Simulation results show that the area of proposed adder has been reduced by 34% in term of LUT count for n = 8.Therefore, it is observed that, the presented modulo adder offers less area in performing the addition for larger word length input and also reduces the routing complexity in comparison with the previously reported adder.
relatively prime to each other.An Integer A is converted into RNS as addition performed in PPA is computed in three steps.The first stage computes the carry generation (G i ), propagation (P i ) and half sum (H i ) bits given as. i

Figure 1 (
Figure 1(a) and Figure 1(b) represent 8 bit Ladner Fischer and Kogge Stone structure of PPA respectively.Figure 1(c) represents the basic cells that are used in the construction of PPA.
(a) represents a simple 16-bit sparse-4 PPA as shown below.

Figure 2 (
Figure 2(b) shows carry select adder block which is used in sparse-4 PPA.This computes two sets of sum assuming carry equal to one and zero, select the resultant sum based on the carry which come from prefix network.By applying carry select adder in sparse PPA, routing problem is eliminated and area decreases effectively.

Figure 3
Figure3describes that the carry generated in position zero enters in to next bit that is position one which already contains EAC.In worst case the carry bypasses from position two to next position.This problem can be eliminated by using carry save preprocessing stage[20] as shown in Figure4.Where i u is the half adder sum output of i A and i B ,

1 C
(b) of sparse integer adder for performing modulo operation.But the main problem is the carry expression given in Equation (8) which is defined for 2 1 i n ≤ ≤ − .The carry equation for is quite different so the modification of carry select block is needed for first four bits of modulo 2 n − 3 adder, it is based on carry 1 C given in Equation (9).

Figure 6 Figure 6
Figure 6 is similar to carry select block of Figure 2(b) except at sum position S 1 .The Figure 6 is used only for first four bits of sparse-4 modulo 2 n − 3 adder.The remaining bits uses carry select block of Figure 2(b) for implementation of sparse modulo (2 n − 3) adder.This sparse-4 modulo 2 n − 3 adder has double representation for {0,1,2} with 2 n − 3, 2 n − 2, 2 n − 1, so there are six pairs of combinations in which two pairs has tendency to produce wrong addition result.The solution for this problem is explained in[20] and[21]; these explanations still exist for proposed adder.

Figure 7
Figure 7 represents the proposed 32 bit sparse modulo 2 n − 3 Adder having lesser area than previously reported modulo adder.
the half adder carry Figure 6.Carry select block for modulo 2 n − 3 adder only for first 4 bits.

Table 1 .
Adders unit gate area and delay estimations.

Table 2 .
Delay and area for different bit length.