Implementation of N-Bit Binary Multiplication Using N − 1 Bit Multiplication Based on Nikhilam Sutra and Karatsuba Principles Using Complement Method

This paper is designed to introduce new hybrid Vedic algorithm to increase the speed of the multiplier. This work combines the principles of Nikhilam sutra and Karatsuba algorithm. Vedic Mathematics is the mathematical system to solve the complex computations in an easier manner. There are specific sutras to perform multiplication. Nikhilam sutra is one of the sutra. But this has some limitations. To overcome the limitations, this sutra is combined with Karatsuba algorithm. High speed devices are required for high speed applications with compact size. Normally multipliers require more power for its computation. In this paper, new multiplication algorithm for the multiplication of binary numbers is proposed based on Vedic Mathematics. The novel portion in the algorithm is found to be in the calculation of remainder using complement method. The size of the remainder is always set as N − 1 bit for any combination of input. The multiplier structure is designed based on Karatsuba algorithm. Therefore, N × N bit multiplication is done by (N − 1) bit multiplication. Numerical strength reduction is done through Karatsuba algorithm. The results show that the reduction in hardware leads to reduction in the delay.


Introduction
Researchers are trying to design devices which require minimum space and power with high speed.The multipliers are the important unit in many high speed applications.But it needs more components and consumes more power.From the conventional multipliers, Bough-Wooley consumes less power but the bit length is restricted to 16 bits.For high speed devices, Wallace with Booth encoding produces good result.But Wallace will occupy more space due to the usage of more components [1].To overcome these issues, multiplier based on Vedic Mathematics is designed.In [1], the Urdhva Tiryakbhyam Vedic multiplier is designed with adiabatic logic.The logic cells used in the half adder, full adder and AND gate are replaced with 2P-2N logic.They focused to reduce the power.
In [2], vertical and cross wise algorithm is implemented using various compressor adders like 5-3 adders, 10-4 adders, 20-4 adders and 20-5 adders.The percentage improvement between the traditional adders and these compressor adders is much less.The compressor adders are used in the conventional multipliers to validate their proposed work.The result shows the significant improvement.The Vedic multiplier using McCMOS (Multi Channel CMOS) with 65 nm and 45 nm technology is proposed in [3] [4].The power delay product is reduced from 48% to 70% using this technology.
The MAS (Multiplier Adder Subtractor) unit is incorporated [5] in the design of conventional ALU using Vedic Mathematics.The conventional ALU consists of Arithmetic Unit, Logic Unit and shifter module.MAS unit comprises all the necessary arithmetic modules to build arithmetic unit.In [6], 16 × 16 bit multiplier block is built, the functionality is verified in XC3STQ144 Xilinx kit, and the delay is compared with conventional multipliers.The final GDSII format is derived using Cadence tool.
The problem solving techniques using Vedic Mathematics not only reduce computational time but also give the way for effective learning.In [7], the arithmetic operations addition, subtraction, multiplication and division are performed using Nikhilam sutra.In [7], vinculum operations are explained and the method to find 10's complement if the number contains n zeros at the right side is well explained.In [8], Ekanyunena Purvena is explained and its architecture for binary numbers is given.Actually this is the sub sutra for Nikhilam.The important condition is one multiplicand which should contain array of 9 (i.e. 9 or 99 or 9999…).The multiplication is done through subtraction here.In [9], various conventional multipliers are compared with Vedic multiplier in terms of area, speed and power.
In [10], the Dadda multiplier is designed with pipelining.They modified the structure of D-Flipflop which is used for pipelining.And also sp-D3Lsum-based HA is used for tree reduction of Dadda algorithm.The design is implemented using 1P-9M Low-K UMC 90 nm CMOS process technology in Cadence Virtuoso.DRC and LVS check for the proposed design is done by Cadence Assura.In [11], the implementation of linear convolution and circular convolution is done using the Vedic multiplier.In [12], the Vedic multiplier is designed using Nikhilam sutra and Karatsuba algorithm.In that, the remainder calculation is done through the subtraction operation.The modification of the multiplier structure is done in [13].Here the remainder is calculated by computing 2's complement of the input numbers.But in [13] [14], the inputs are swapped if the multiplier is greater than multiplicand.In this work, without swapping, the multiplication is done through the calculation of remainder using 2's complement method.

Proposed Architectures
The architecture is designed based on the combination of Karatsuba and Nikhilam sutra. in the conventional Karatsuba algorithm, the remainder is determined by taking Least Significant Half of the number without alteration.In the proposed work, the remainder is computed using Nikhilam Sutra.The detailed algorithm is given in [13] [14].In this section, the proposed algorithms are presented and their architectures for three different modes are given.The results are proven theoretically in this section.Three modes are discussed in detail below.).
Step 5: Adding all the components to derive the final product  ) ( ) ( ) In [13], the architecture of the algorithm is derived based on remainder.The remainder is derived by subtracting the highest weight by the numbers A or B.

Algorithm for Mode II
Input: A, B Output: P Step 1: Considering A and B are having negative remainders.(i.e. both are less than 2 N−1 ).The remainder is computed by complementing A & B with N − 1 bits.(Consider A r and B r ).
Step 2: Multiplying the remainders A r and B r .i.e. ).
Step 5: Adding all the components to derive the final product (2)

Algorithm for Mode III
Input: A, B Output: P  Step 1: Considering A and B are having mixed remainders (i.e. one is positive remainder and the other is negative remainder).The positive remainder is derived as per Mode I and the negative remainder is calculated as per Mode II (consider A r and B r ).
Step 2: Multiplying the remainders A r and B r .i.e. 1 1 2 m r r = * .The product will be negative due to mixed remainders.
Step ).The sign of m 3 depends on the type of remainder B r .
Step 5: Adding all the components to derive the product P m m m = − ± .The architecture for mixed remainder is shown in Figure 3.The product of the numbers is calculated as follows The input multiplexer is used here to derive the remainder.Based on MSB value of A and B, the remainder is calculated.For negative remainder, the complement of A is taken.For the positive remainder, the number is taken directly considering N − 1 bits.The multiplier unit is used to multiply the remainder terms r r A B .m 1 is derived by shifting the value of A by N − 1 bits.(i.e. .The adder/subtractor is designed to select the operation based on the type of remainder (r 2 ).If B N−1 = 1, the remainder will be negative, adder/subtractor will perform subtraction operation.If B N−1 = 0, it will perform addition operation.
The combined structure is shown in Figure 4. Unlike [13] [14], no need to swap the inputs when one number is larger than the other.Using the combined structure, the number in any mode can be calculated.This structure is similar to the structure shown in Figure 3.A simple Ex-or gate is used as control signal to select addition/subtraction operation.

Results
The various conventional multipliers are considered and compared with proposed multiplier.The computational delay for various multipliers is listed in Table 1.The simulation result is shown in Figure 5.While comparing delay with other methods, Vedic multiplier has minimum delay among all methods and hence combination of proposed with conventional Vedic has been used for high speed applications.While comparing the area, Wallace will occupy more space because it requires more number of components.The comparison table for power analysis is shown in Table 2.By seeing both results, proposed Vedic multiplier is efficient in area and speed.Therefore, instead of using other methods in the proposed algorithm, the proposed algorithm is called in successive manner.The result for successive approximation of proposed algorithm is shown in Table 3.

Conclusion and Future Work
In this paper, a new multiplication algorithm using Nikhilam sutra is presented.The modification of binary Vedic multiplier with the previous papers is presented here.In the calculation of remainder, a single bit is reduced, and hence usage of components will be reduced.Therefore, the interconnection delay and computation time are reduced.The speed and the area are optimized using this modified Vedic multiplier.The performance of the modified multiplier varies on the type of multiplier used.Finally successive approximation of proposed algorithm is also done here.Comparing with conventional methods, the combination of multiplier with Wallace multiplier gives reduced stage delay.But this combination consumes more power.Normally, Vedic multiplier is used to perform the operation with minimum delay.Therefore, in combination with conventional Vedic multiplier the proposed method gives better result.For high speed applications, proposed method with Wallace multiplier can be used.For low power and low area applications, proposed multiplier with Vedic (Urdhava) or Braun multiplier can be used.From the results it is clear that the proposed algorithm is best suited for high speed

4 :
Mode I-Positive Remainders Mode II-Negative Remainders Mode III-Mixed Remainders 2.1.Algorithm for Mode I Input: A, B Output: P Step 1: Considering A and B are N bit numbers and having positive remainders (i.e. both are greater than 2 N−1 ).The positive remainders are derived by considering the numbers without MSB having N − 1 bits.(Considering A r and B r ) Step 2: Multiplying the remainders A r and B r .i.e. 1 1 2 m r r = * .Step 3: Shifting the input A left side by N − 1 times.( Shifting the remainder of B, B r by N − 1 times ( The proposed architecture for positive remainders is shown in Figure1.As per the algorithm, the product is derived as follows

3 : 4 :
Shifting the input A left side by N − 1 times ( Shifting the remainder of B, B r by N − 1 times (

2 .
The architecture for negative remainders is shown in Figure The product is determined as follows

3 : 4 :
Shifting the input A left side by N − 1 times ( Shifting the remainder of B, B r by N − 1 times (

Figure 5 .
Figure 5. Simulation waveform for the multiplier with bit size 32.

Table 1 .
Comparison of delay with various methods.

Table 2 .
Comparison of power with various methods.

Table 3 .
Comparison of delay using successive approximation method.