Efficient Realization of Vinculum Vedic BCD Multipliers for High Speed Applications

Decimal multipliers play an important role in our day to day life for commercial, financial and tax applications. Every processor multiplier acts as the basic building block which decides the performance of processor. Time and again research is going on to design high-performance, low-latency BCD multiplier architectures. This paper proposes a new approach to BCD multiplication us-ing vinculum number system. The key feature of the proposed architecture uses entirely a new one digit ROM based BCD multiplier that uses vinculum numbers as operands. Using this one digit BCD multiplier, an N digit BCD multiplier is built by using the vedic vertical cross wire method (Urdhav Tri-yagbhyam). We have also used our proposed multi operand VBCD Adder (Vinculum BCD Adder) [my paper 26] to add the partial products. In this paper, we show that this approach is a promising alternative to conventional BCD multiplication or other decimal multiplication methods that use alternative decimal representations like 5211, 4221, Xs3 etc.


Introduction
Designing of hardware units for decimal arithmetic is a growing interest among researchers to achieve better latency and throughput for highly complex, accurate fast computation required in business and commercial applications.The basic binary number system can be used for decimal arithmetic operations but it requires conversions at both ends.These conversions will take significant amount of processing time which increases delay.Binary and Decimal number system supports integer and fractional parts in numbers and the system which uses fractional numbers may result in lack of accuracy which in turn has a greater −ve impact on commercial, financial and tax applications.To solve these problems, interest in hardware design of decimal arithmetic is growing.This has led to the incorporation of specifications of decimal arithmetic in the IEEE-754 2008 standard for floating-point arithmetic [1].With high performance and low resource usage it is expected to facilitate the implementation of business applications [2] [3].In DFP formats, multipliers play an important role in multiplication of mantissas.Among all arithmetic operations, multiplication is a complex operation.To speed up this operation, different methods were explained in the literature: Mixed Binary and BCD Approach [4], Multiplication via Carry Save Addition [5], Efficient Partial Product Generation [6], using Radix-10 multipliers [7] [8] [9], Parallel Decimal Multipliers [10] [11] [12] [13], Compressor Trees for partial product reduction [14], Multi operand Decimal Adders [15] [16], Redundant BCD and Signed Digit Adders [17] [18], High performance vedic decimal multiplier using binary to BCD converter [19], Vinculum BCD Multiplier (VBCD multiplier) [20] [21].In our earlier approach in [20], we had proposed a vinculum BCD multiplier based on Ten's complement method.There also we used vertical cross wire method to generate partial products but the generated partial products were checked for +ve or −ve and if it is −ve, it was passed through Ten's complemented circuit and those products were given to the adder circuit.
In this paper, a vedic BCD multiplier based on vinculum number system is proposed.It uses the same vertical and cross wire method as used in [21] for generation of partial products.However, the number system used is different.The proposed system uses a unique number set {0, 1, 2, 3, 4, 5, 4, 3, 2, 1 }, in which -ve numbers represented in two's complement forms [24].The advantage of this code is same binary architectures can be used for designing.VBCD adders are used to add partial products.The correction logic is included in the adder block itself where the output of the VBCD adder is a valid vinculum output.This design is referred to as signed because it uses both +ve and −ve number digits.
The proposed VBCD multiplier is different from [20] in the following aspects.1) Representation of vinculum number system in which each digit is represented using 4 bits.2) Parallel VBCD adders and multi operand VBCD adders are used to add partial products.Our simulation results indicate that this approach is viable and efficient.The synthesis results show an improvement in speed.
The outline of the paper is organized as follows.In Section 2, reviews of decimal multiplication, vertical and cross wire method used in generation of partial products were discussed.The proposed methods of partial product addition using parallel adders and multi operand VBCD adders are discussed in section 3.In section 4 area and delay parameters are compared with other implementations found in the literature and finally conclusion and future scope are provided in Section 5.

Review of BCD Representations and Decimal Multiplication
Multiplication mainly consists of three stages: the generation of partial products, the fast addition (reduction) of partial products and the final carry propagate addition.
Vazquez and E. Antelo implemented a BCD multiplier using a recoding technique [7].Signed-Digit (SD) Radix5 was employed to recode one of the input operands of the multiplier for the generation of the partial products.
6-Input LUTs and fast carry chains in Xilinx FPGAs were used to generate the building blocks and the decimal adders.Another SD-based decimal multiplier approach was proposed in [18].The recoding was based on SD Radix10.
BCD4221, 5211, and 5421 converters were used for the partial product generation.BCD4221-based compressors and adders were utilized in this approach.
Although the BCD4221-based operations are similar to binary operation, the recoding and the different code conversions still lead to delay and resource cost.
1) SD Radix10 Recoding and Decimal multiplier based on BCD-4221/5221.In this method author assumed both multiplier and multiplicand to be unsigned BCD decimal integers of n digits each.The product P = x * y is a non redundant BCD format with 2n digits.They used BCD4221/5211 in recoding [13].
2) Redundant BCD Representation and Decimal Partial Product Generation and Reduction: In this method author used the same BCD 4221 and BCD5211 encodings to reduce partial product reduction [12] [13].It is passed through a pre-computed correction, binary CSA tree structure, decimal sum correction blocks and 3:2 compressors to get final BCD 8421 corrected sum with 2d digits [14] [15].
3) Decimal Multiplier using Hybrid BCD codes: In this design author uses various types of BCD codes like 4221, 5211, ODDS, XS-3 and XS-6 codes [13] in which binary partial product reduction trees are non-fixed size.
The above method uses the weighted codes where conversions are required from one code to other code.

Vertical and Cross Wire Method (Urdhav Triyagbhyam)
This method is very simple and suitable for both binary and decimal number systems.It follows the principle of divide and conquer method where large module is divided into small modules of regular structures [20] [21].This feature became an advantage in designing VLSI architectures.This method is very efficient for high speed applications [22].Figure 1 shows an example of two digit multiplication using Urdhav Triyagbhyam method.

Proposed Vinculum Number Representation
It's a Vedic mathematics of representing numbers.It allows only the digits from 0 to 5 either in +ve form or in -ve form.The higher order numbers from 6 to 9 must be converted into its equivalent numbers.In our method we selected the two's complement representation to denote -ve numbers.Instead of 6,7,8,9 the equivalent less complex digits 4, 3, 2, 1 are included in the set of vinculum numbers.Therefore the new vinculum number system is {0, 1, 2, 3, 4, 5, 4, 3 2, 1 }.These digits are represented in binary using 4 bits each [24].

Generation of Partial Products
Single digit VBCD multiplier is developed using LUT where all partial products are saved in memory as shown in Figure 2. The maximum value of the partial product generated by single digit is +25 (5 × 5) in BCD the maximum value generated is 81 (9 × 9).Very less combinations are available in proposed number   system method which is simple and faster.This forms the basic multiplier for all other higher multipliers.

Two Digit VBCD Multiplier
Figure 3 shows an example of 2 digit Vedic multiplier using vertical cross wire method where it generates always only four partial products and these are added to get final product.
Figure 4 shows the pictorial representation of addition of partial products with their intermediate sum and carry bits of various levels to get their final product using VBCD parallel Adders [26].

Two Digit VBCD Multiplier Architecture
Figure 5 shows the basic 2 × 2 digit Vedic multiplier.Multiplier and Multiplicand are the two inputs to the system which produces four partial products.
These four partial products are passed through parallel VBCD adder [26] for addition.The output of the adder is nothing but Final Result.The addition process is explained in next paragraph.

Example for 4 Digit Vedic VBCD Multiplier
The above figure shows an example of 4 digit BCD multiplication using vertical cross wire method only by divide and conquer method.In this each digit is subdivided into 2 digits and multiplication is performed as shown in above figure.It was observed that as the number of digits increases BCD adders increases but the number of levels or stages remains same because it generates only four partial products always.
Four Digit VBCD Multiplier Architecture Figure 8 illustrates 4 × 4 multiplication using 2 × 2 digit Vedic multiplier (divide and conquer approach).Using this approach we will get only four partial product rows at any time there by addition becomes simple and faster.Four  digit VBCD adders are used to add partial products and the output of the adder structure becomes Final product of 8 digits (32 bits).

Adder Structures for Adding Partial Products
Efficient adders were designed to add partial products for high performance and less delay.Literature gives various adder structures like simple simple Ripple carry adder, CLA, Carry Save Adders etc to complex prefix adder structures like Kogge stone, Brent kung adders etc, compressor logics (3:2 to 7:2 compressors), parallel Adders, Multi operand adders.Our proposed method uses 3 different methods to add partial products.

First Method (VBCD Parallel Adder)
In this method we used signed parallel adders to add partial products.The input to the adder may be +ve or -ve numbers which produces a valid vinculum sum.
The advantage of signed digit adders are carry depends only on i-1 th stage for i th bit addition as shown in Figure 9 which means only one bit delay exists.This concept was explained more clearly in refs [25] [26] and Figure 10 shows an N-digit parallel adder in which i + 1 stage depends only on ith stage output.

Second Method (Multi Operand VBCD Adders)
It uses Multi-operand signed digit adders to add partial products.Minimum depth of the adder is two which means we require two operands to add also known as parallel adder and maximum we went up to 8 operands as shown in Figure 11 and Figure 12 with 4, 8, 16, 32 bit operands.4 × 4 digit multiplier uses 4 rows with 7 columns.The maximum depth for 4 digit multiplier is 5 along with previous carry bit.So we used 5:2 multi operand adder with 5 inputs and two outputs sum and carry.We observed that delay is reduced when compared to first method.Figure 13 explains addition of partial products using multioper and adder concept for the example which is shown in Figure 6.

Third Method (Rearrangement of Partial Products)
In this method (refer Figure 15) instead of using partial products in conventional method (as shown in Figure 14) we rearranged partial products without changing its position value thereby we can use hardware efficiently.In this me- In this paper we designed, implemented, simulated and synthesized 2 × 2 digit, 4 × 4, 8 × 8, 16 × 16 digit multipliers and these are compared with conventional multipliers.It is observed that in proposed method delay was significantly reduced with very little overhead in other parameters like area and power which can be used in high speed applications.

Results: Simulation Results for Multipliers
Using the proposed Adder structures in PPA block, the multipliers from 1 digit to 16 digit are evaluated and implemented in this section.The result of 2 digit multiplier is compared with few designs mentioned in the technical literature as shown in Table 1.
The decimal multiplier designs are described at gate level in verilog HDL, Figure 15.Rearranged partial products for addition.shown for 8 bit multipliers.
Table 3 shows two digit multiplier using three different methods explained in above section and it as observed that Multi operand VBCD adders (method 2) is faster and method 3 occupies less area as shown in table.

Conclusion and Future Scope
Hence we conclude that in this paper we designed an efficient vinculum vedic BCD multiplier for faster operations.Performance of multipliers has been investigated and compared with other multipliers.These multipliers can be used in floating point multipliers for multiplication of mantissas.The proposed multiplier has very less delay and hence can be used for high speed applications.

Figure 1 .
Figure 1.Two Digit multiplication using vertical cross wire method.

Figure 7
Figure 7 shows the pictorial representation of addition of partial products their intermediate sum and carry bits of various levels to get their final product.

Figure 4 .
Figure 4. Addition of partial products using VBCD parallel adders.

Figure 7 .
Figure 7. Addition of partial products using parallel adder.

Figure 13 .
Figure 13.Addition of partial products using Multi Operand Adders.
that in hardware the number of LUT's utilized are very less when compared to the above two methods.

Table 1 .
Comparison with Different Multipliers of size 2 digit.

Table 2 .
Synthesis report for various size multipliers.

Table 3 .
Synthesis report for 2 digit multiplier using different methods.by Xilinx 14.2i simulator tool.Table 2 shows the proposed Vedic VBCD Multipliers for various sizes and comparison table is