Circuits and Systems, 2011, 2, 225236 doi:10.4236/cs.2011.23032 Published Online July 2011 (http://www.SciRP.org/journal/cs) Copyright © 2011 SciRes. CS Fast SignedDigit Multioperand Decimal Adders Jeff Rebacz, Erdal Oruklu, Jafar Saniie Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, USA Email: erdal@ece.iit.edu Received April 14, 2011; revised May 22, 2011; accepted May 29, 2011 Abstract Decimal arithmetic is desirable for high precision requirements of many financial, industrial and scientific applications. Furthermore, hardware support for decimal arithmetic has gained momentum with IEEE 754 2008, which standardized decimal floatingpoint. This paper presents a new architecture for two operand and multioperand signeddigit decimal addition. Signeddigit architectures are advantageous because there are no carrypropagate chains. The proposed signeddigit adder reduces the critical path delay by parallelizing the correction stage inherent to decimal addition. For performance evaluation, we synthesize and compare multiple unsigned and signeddigit multioperand decimal adder architectures on 0.18 μm CMOS VLSI technology. Synthesis results for 2, 4, 8, and 16 operands with 8 decimal digits provide critical data in de termining each adder’s performance and scalability. Keywords: Computer Arithmetic, Decimal Arithmetic, SignedDigit, Multioperand Adder, BCD 1. Introduction Translating a decimal fraction into a finite floatingpoint representation is prone to losing precision as a result of rounding errors. In almost all financial settings, decimal arithmetic is desired to guarantee balances are calculated correctly and lawfully. Some industrial and scientific applications require highprecision decimal arithmetic as well. Software packages have been available for most programming languages so that decimal numbers could be evaluated with decimal arithmetic to avoid error [1,2]. IBM recently departed from this software solution by incorporating a decimal floatingpoint arithmetic unit in the Power6 [3] and z10 processors [4]. A compelling reason to do such is a report [5] showing that 55% of the numbers stored in the databases of 51 major organiza tions are decimal. One study shows that for a set of five benchmarks, a 1.3 to 12.8 speedup factor was obtained by simulating a processor using virtual decimal arithme tic hardware against software routines [6]. Applications that spend a large proportion of the time consuming decimal calculation stand to benefit from hardwarebased decimal operations. Therefore, research into decimal arithmetic has gained momentum. Decimal renditions of binary carrysave [7,8] and carrylookahead adders [911] have been proposed. Decimal floatingpoint addition is treated in [1214]. New decimal multipliers [1518] and dividers [1921] have also been proposed. In [17], new decimal encodings improve the latency and area for de cimal partial product generation and reduction for multi plication. Similarly, in [22], a new redundant digit set is used with special encodings called twovalued digits (twits), resulting in a faster implementation of both addi tion and subtraction. The main motivation behind this paper is to introduce a new signeddigit architecture and objectively compare it with signed and unsigned digit adders. Signeddigit decimal adders have the benefit of carryfree addition although a carrypropagate adder must be used to trans form the signeddigit sum into an unsigned sum. In the next two sections, we will present the theory of decimal encodings and signeddigit decimal numbers. In the sub sequent sections, brief descriptions of other adders are given: the nonspeculative multioperand adder [7], mixed binary and BCD adder [23], reduced delay BCD adder [10], dynamic decimal CLA [11], Svoboda adder [24], speculative signeddigit adder [25], decimal carryfree adder [26] and Redundant Binary Coded Decimal (RBCD) adder [27,28]. Then, the proposed method for signeddigit addition is discussed. A constant addition technique will be applied to both the correction step in signeddigit decimal addition and conversion to binary coded decimal (BCD). Multioperand decimal addition based on signeddigit addition will follow. Finally, the synthesis results will be discussed. The notation used throughout this paper is as follows:
226 J. REBACZ ET AL. Variables are represented with a name and a subscript. Take xi for example, x is the label and i indicates the digit position. If no subscript is present, x will refer to the wordwide variable across all digit positions. Numbers enclosed in square brackets index the bit position. For example, the second bit of sum at the first digit position is sum0 [1]. 2. Decimal Encodings A discussion of decimal encodings is related to signed digit arithmetic because the primary motivation for both are similar, i.e. they are concerned with achieving better performance by leveraging a more convenient number representation for certain tasks. For example, Binary Coded Decimal (BCD) is the representation usually used for decimal arithmetic. A BCD digit is the binary repre sentation of a decimal number [0,9] with 4 bits. BCD is convenient for arithmetic, but is not optimal for storage because 4 bits for 10 decimal digits wastes 6 encodings. In IEEE 7542008, storage of decimal floatingpoint numbers is specified to be in Densely Packed Decimal (DPD) form. In DPD, 3 decimal digits are encoded with 10 bits, which is much more efficient for storage. For multiplication, several signed and unsigned repre sentations have been developed to increase performance. In [15], it is shown how a signeddigit representation can reduce the number of partial products in a decimal mul tiplier. These partial products can be efficiently accumu lated with a signed digit adder. In [15], the classic Svo boda adder [24] is used for partial product reduction, yet, it will be shown that the proposed signed digit represen tation can yield better performance. On the other hand, unsigned representations in multiplications are also gaining track. In [17], traditional BCD or BCD8421 is recoded to BCD5421, BCD4221 and BCD5211 to efficiently generate partial products and reduce the par tial product reduction tree. One of the advantages of this scheme is that multiplying by two from one encoding to another may simply require a left shift. Another advan tage of the encoding is that it allows the reuse of a binary radix4 multiplier, sharing components to save area. These new encodings also have advantages over BCD 8421 in division, as demonstrated in [20]. Unsigned decimal adders usually work with BCD numbers, but that is not required. The partial product adder in [29] uses the Overloaded Decimal Representa tion (ODR), a redundant unsigned 4bit encoding to store and add intermediate partial products in a carrysave format. In ODR, a decimal digit may take on values 0 through 15. In iterative operations, the decimal correc tion vector of +6, which is usually required for unsigned decimal addition, is easier to detect and less costly to add. The correction is only needed when a digit exceeds 16, which is easily detected by inspecting the digit’s carry out. When the carry out is detected, adding 6 occurs in the next iteration, which shortens the critical path. Another adder that leverages encodings other than BCD is the mixed binary and BCD multioperand adder. The mixed binary and BCD multioperand adder [23], which will later be described in more detail and imple mented, uses both binary and BCD representations and operations. Binary addition occurs for a column of BCD digits. The binary result is then converted back to BCD number. The process will repeat until there are two rows of BCD numbers to be inputted into a 2operand BCD adder. In signeddigit adders, there appear to be three popular representations: the Svoboda code [24], the two’s com plement representation (used in the proposed method), and the positive/negative component representation. The Svoboda code uses 5bits to represent numbers in the set [−6,6]. A positive number xi is represented in Svoboda code as 3 ii x 31 3 i while a negative number is represented as i x . For example, positive decimal numbers 1 and 6 are represented in Svoboda code as 110 = 00011 and 610 = 10010 respectively; nega tive decimal numbers −1 and −6 are represented in Svo boda code as −110 = 11100 and −610 = 01101 respectively. There are two representations for zero (00000 and 11111). The Svoboda code allows for quick addition, but the encoding may be difficult to work with or costly to convert to and from. The two’s complement representation for signeddigit numbers is used in the RBCD adder [27,28] and in this work. In RBCD, the digits are 4bits wide and represent numbers between 7 and −7 inclusive. RBCD’s number range requires BCD numbers to go through a conversion step (since 8 and 9 must be recoded). Our proposed adder does not require a conversion step because we use 5bits to represent a signed decimal digit in the range of [−9,9]. This convenience is a key feature of our design. In [25,26], a signeddigit is represented as the sum of one positive 4bit binary vector x+ and one negative 4bit binary vector x−. This positive/negative component rep resentation may store −2 in various ways: as x+ = 0001 and x− = 0011 or as x+ = 0010 and x− = 0100. With this representation, numbers can be easily inverted by swap ping x+ and x−. Additionally, no conversion from BCD is needed. However, there are 8 bits to a digit. Though this fact does not automatically imply more area consumption, the results do suggest it. 3. Decimal SignedDigit Theory The decimal signeddigit number system used here has Copyright © 2011 SciRes. CS
J. REBACZ ET AL. 227 all the properties and limitations set forth in [30]. A de cimal signeddigit set D is a special case of general signeddigit sets when r = 10 (r is the radix or base). 3 13 5 2 ,,1,0,D (1) where α, β and r are related by (2). 1r (2) Usually, symmetric signeddigit sets are used (α = β). The signeddigit set is said to be redundant (any number has multiple representations) if (3) holds. 1 2 r (3) The decimal signeddigit set used in this work is [−9,9] or −9 to 9 inclusive (r = 10, α = 9). With this decimal signeddigit set, to add two decimal signeddigits xi and yi, three quantities must be added together: the interme diate sum ui (i.e., interim sum), the carry ci (i.e., transfer) which can take values in {−1,0,1}, and the correction . The intermediate sum ui is xi + yi and ranges from −18 to 18 inclusive with the digit set [−9,9]. The carry ci is generated using a rule set such that −8 ≤ ui − ≤ 8 holds. Determining ci is done by comparing ui to comparison constants. For the decimal signeddigit set used, one comparison constant is in [−8,−1], the other in [1,8], and are usually chosen to minimize hardware complexity [31]. If −1 and 1 were the comparison con stants, then ci is −1 when ui is less than −1; ci is 1 when ui is greater than 1; ci is zero when ui is −1, 0 or 1. The correction is and adds to the intermediate sum ui along with the previous digit’s carry to obtain the sum. Note that i is between −8 and 8 inclusive, so an input carry of −1 or 1 will never cause a carry to propagate to the next decimal digit. The above is ex pressed in (4)(6). 10 i c 10 i c 10 i c 10 i uc iii uxy (4) 1 10 ii ii ucc (5) 1if1 1if 1 0otherwise i ii u cu (6) An example demonstrates that the signeddigit tech nique still produces carries, but never propagates them. In other words, a digit’s carry out is not a function of its carry in. In this example, the operands are absent of neg ative digits only to facilitate demonstration. The com parison constants are −1 and 1. For the least significant digit column, the intermediate sum is 8 + 8 = 16. Since 16 > 1, c0 = 1 is added to the next column and –10 is the correction for the least significant column. 8 + 3 4 7 8 3 16 9 9 16 u vector −10 0 −10 −10 −10 −10 correction vector 1 0 11 1 1 c vector 1 −72−30 0 6 s vector The general algorithm is as follows: 1) For each digit, add the two operands to get the in termediate sum, ui. The range for ui is −18 to 18 inclu sive. 2) If ui > 1, set correctioni = −10 and ci =1. If ui < −1, set correctioni = 10 and ci = −1. In the example, the c vector is shifted one digit position to the left so that addi tion occurs within each column. 3) Find the sum of the three vectors, u, correction and c. This operation, as those before, will not propagate carries because the method guarantees: −9 ≤ si ≤ 9. 4. Prior Work in UnsignedDigit Decimal Addition 4.1. Nonspeculative Multioperand Adder In [7], a nonspeculative multioperand adder is presented that can sum up to 16 operands. For each digit column, M operands are added using a linear array of M2 binary carrysave adders. Carries outside the 4bit range for each digit are transferred to the next column, and saved for later use. A 4bit CPA finds a 5bit uncorrected in termediate sum after the carrysave adders. This uncor rected intermediate sum together with the saved carries are inputted into combinational logic to find the carry out and a 4bit correction vector. Another 4bit CPA adds the correction vector to the uncorrected intermediate sum to yield the corrected intermediate sum. Finally, the carry outs and the corrected intermediate sum must be added with a wordwide carrypropagation adder. In this work, multioperand adders requiring decimal carrypropagation addition use the dynamic decimal CLA [10] because it gives the best delay and area usage. 4.2. Mixed Binary and BCD Multioperand Adder This adder, proposed in [23], presents a scalable scheme to realize any Noperand addition. First, for each digit column, the N operands are added with N:2 reduction and a CPA. Each column sum is then converted into a decimal number with a network of binary to decimal converter cells. Depending on the number of decimal digits this conversion process produces, another stage of binary addition and binary to decimal conversion may ensue. When the digits of the converted decimal sum is two, a fast decimal carrypropagation adder (in this work, Copyright © 2011 SciRes. CS
228 J. REBACZ ET AL. the dynamic decimal CLA adder [10]) is used to obtain the final decimal sum. For 3 to 11 operands, only one stage is needed because maximum number of digits for the sum of a digit column with 11 operands is . Thus, 2operand ad dition follows to calculate the final sum. For 12 to 111 operands, two stages are needed because the maximum decimal sum for a digit column is 999, which is three digits. Adding three digits across each digit column is 3operand addition, and necessitates another stage. 2:9 1199 4.3. The Reduced Delay BCD Adder In [10], the 2operand decimal adder is composed of a 4bit binary adder, an analyzer circuit, a carry network, and another 4bit binary adder to add the correction vec tor. Using the intermediate sum from the first 4bit bi nary adder, the analyzer circuit finds the generate and propagate signals for that digit. The signals from all dig its are passed to a KoggeStone carry network to find the carries. Appropriately wiring the generated carries to the final 4bit adder will add the correction vector (0, 6, 1 or 7), which depends on the carry in and the carry out for that digit. 4.4. The Dynamic Decimal Adder Using Carry Lookahead Like the reduced delay BCD adder, the dynamic decimal CLA adder [11] is a 2operand adder that finds digit propagate and generate signals to be used in a carry loo kahead scheme for fast carry propagation. These signals are generated per digit using combinational logic on the bit propagate and generate signals of the two input BCD digits. The sum bits are calculated as a function of the bit generate and propagate signals, the carry in, and the car ry out. A speculation technique is used for the upper two bits of each digit to speed up the addition time. With dynamic logic, this technique can yield impressive speed. 5. Prior Work in SignedDigit Decimal Addition 5.1. Svoboda Adder The Svoboda adder [24] was an early 2operand design that added digits from −6 to 6 inclusive. The Svoboda adder uses the Svoboda code described in Section 2. This code helps simplify decimal addition. On the other hand, converting BCD operands into the Svoboda code and back requires overhead. BCD to Svoboda code conversion begins by trans forming each input BCD digit to a 5bit vector corre sponding to the Svoboda code of that digit minus 4. Then, a Svoboda adder adds 4 to the Svoboda code of all digits. To convert back to BCD, a similar, but reversed proc ess is used. Constants are iteratively added with a Svoboda adder until all the digits are between –4 and 5 inclusive. The worst case (i.e., the number of times the loop is executed) depends on the number of digits. We have implemented the Svoboda adder with the suggested BCD to Svoboda code conversion scheme but without the suggested Svoboda code to BCD conversion scheme. Instead, we have replaced it with a faster carrylooka head conversion scheme. First, the Svoboda code is transformed into a two’s complement number ranging from −6 to 6 inclusive. Second, generate and propagate signals are detected and used in a KoggeStone prefix network to generate carries so that the proper corrections can be applied to each digit. Also, the Svoboda adder was originally designed with two stages of chained full adders with endaroundcarries. In the experiment, we have replaced the full adder chains with prefix adders in order to improve speed performance. 5.2. Speculative SD Adder This architecture [25] is rather complex as it implements a clever speculation technique that facilitates the addition of input carries. The input operands are in the positive/ negative component representation. For example, −3 would be represented like this (let the two vectors be expressed as (x+, x−): 25 3 0010,0101 Internally, the speculative SD adder uses compressors to add the input operands. These compressors resemble carrysave addition, but are modified to handle negative bits. The two comparison constants are 1 and −1. A sign detection unit is necessary to determine the carry out. Once the carry out is found, a correction is added if nec essary. No input conversion is necessary since BCD is within the speculative SD’s digit set. The adder is speculative because it prepares two pairs of sums, one which can be easily incremented by one and another that can be easily decremented by one. The last step of this adder involves adding a correction vector according to the digit posi tion’s carry out. For conversion back to BCD, the negative component is subtracted from the positive component to yield a 5bit, two’scomplement number between −9 and 9. This num ber is inputted into an analyzer circuit to find generate and propagate signals to use in a carrylookahead scheme. Once the carries are known, the right constants can be Copyright © 2011 SciRes. CS
J. REBACZ ET AL. 229 added to the SD number to get an unsigned BCD result. 5.3. Decimal CarryFree Adder The Decimal CarryFree Adder (DCFA) from [26] uses a digit set from −9 to 9. DCFA represents numbers in the same way [25] does, with two 4bit vectors (positive/ negative component representation). This design is dif ferent than [25] because it speculates the addition of sev eral corrections (not just the addition of the transfer in). Two sign detection circuits determine the transfer out (like in [25]). The positive and negative transfer out and the negative transfer in signals select the sum in three levels of multiplexors. The positive transfer in signal is wired into the first bit of the selected sum to effectively add it. 5.4. RBCD Adder The 2operand Redundant Binary Coded Decimal adder (RBCD) [27,28] adds 4bit wide signeddigits between 7 and −7 inclusive represented in two’s complement. The reduced digit set dramatically simplifies carry detection at the expense of some initial overhead required to con form BCD operands to the adder’s digit range. The adder is implemented with two 4bit adders, a carry generation block and a correction vector generation block. It is very similar to the proposed 2operand adder, but differs in the digit set, the carry detection circuit, and especially correction method. Also, RBCD, as well as the other SD adders, do not discuss multioperand addition. Additional circuits are described to perform BCD to RBCD and RBCD to BCD conversion. The former case requires less logic. On detection of a 7, 8 or 9, a carry will be sent to the next digit and 6 will be added to the current digit. In the RBCD to BCD conversion, generate and propagate signals are found and used in a carry lookahead circuit. Once the carries are known, a 4bit adder corrects the sum in each digit. 6. Proposed SignedDigit Decimal Adder 6.1. Methodology In the proposed signeddigit architecture, the digit set used is −9 to 9 inclusive and is represented using a con ventional 5bit, two’s complement vector [32]. For the digit at position i, xi and yi are added to yield a 6bit wide intermediate sum, ui (the carrypropagate adder (CPA) chosen in these designs uses a prefix tree). Then, two levels of simple logic determine the carry ci, which uses positive and negative magnitude components to represent {−1, 0, 1}: and . i c i c ii i cc c (7) The proposed rule set for ci selects the two comparison constants to be −8 and 7 for reduced hardware complex ity. As opposed to the rule set in (6), this rule set can be implemented as a boolean function of 3 variables with 4 minterms as opposed to a boolean function of 6 variables with 9 minterms. The proposed rule set is defined in (8). After an exhaustive design space exploration, it has been found that this rule set requires the minimum logic use. All other valid pairs of comparison constants will have a higher logic complexity and the evidence of this is visi ble by looking at the upper 3 bits of the two’s comple ment boolean numbers in the set [−18,18]. 1if8 1if7 0otherwise i ii u cu (8) Each digit’s positive and negative carry signals are easily calculated with (9)(10). Figure 1(a) expresses these equations. u i [4] i c i c u i [3] U i [5] (a) i y i CPA 3 6 5 i c i c 1i c 1i c CPA u i s i 5 correction vecto generation carry generation (b) Figure 1. The circuit in (a) elaborates the carry generation block. Once ui is obtained from the carry propagate adder, i c and i c are calculated with a few gates. In (b), a highlevel diagram depicts the 2operand SD adder. The up per 3 bits of the intermediate sum are used for carry genera tion while the lower 5 bits are inputted to the second CPA. Copyright © 2011 SciRes. CS
230 J. REBACZ ET AL. 543 ii ii cu uu (9) 543 iii i cuuu (10) After ui and ci are found, the correction vector must be found and added. The correction vector corresponds to from (5). However, since the incoming carry from the previous digit, ci1, must eventually be added to ui as well, it is convenient to think of the correction vec tor as 1. The correction vector (which is simply the constant term to be added to ui) is tabulated for every possible input combination in Table 1. Addi tion between ui and this correction vector will result in a 5bit sum, si. Note that because of the rule set chosen, the final sum for each digit will fall in [−9, 8] meaning that the carryout of the last CPA will always be 0. 10 i c 10 ii cc The proposed SD decimal adder is shown in Figure 1 (b). The carry generation block of the proposed adder is shown in Figure 1(a). It is apparent in this figure that carries only propagate to the next digit. There is no ripple effect since carry generation does not depend on the car ry in. This design is a necessary precursor to the im proved version in the next section. 6.2. Parallel Addition of the Correction Vector The previously proposed adder’s correction generation block and correction vector CPA can be replaced by a parallel speculative structure to reduce delay since the bits of ui are available before the carries. In these parallel adders, speculative addition of the correction vector and incoming carry take place in one or two stages of logi cally minimized constant addition. For one stage, the addition of the constants −11, −10, −9, −1, 0, 1, 9, 10, 11 are precomputed and selected with ci and ci1 in a multi plexor. Table 1. Correction vector generation. i c i c 1i c 1i c Correction Vector 0 0 0 0 000002 (010) 0 0 0 1 111112 (–110) 0 0 1 0 000012 (110) 0 1 0 0 010102 (1010) 0 1 0 1 010012 (910) 0 1 1 0 010112 (1110) 1 0 0 0 101102 (–1010) 1 0 0 1 101012 (–1110) 1 0 1 0 101112 (–910) For two stages, the addition of −10, 0 and 10 are pre computed and selected by ci in one stage; −1, 0 and 1 are precomputed and selected by ci1 in the other stage [32]. Depending on the method of constant addition chosen, certain optimizations can be made. For example, the proposed parallel adder uses a method of constant addi tion such that the addition of x + c, where x is the 5bit input and c is the constant, produces a vector f that indi cates which bits in x need to be inverted to obtain the sum. So, five XOR gates are needed to invert r for any constant addition. Hardware can be reduced by placing the XOR gates after the multiplexor instead of having groups of XOR gates for each constant before the multi plexor. The terminology and concept of the constant addition used were derived from the flag inversion cell (fic) se quences in [33]. The method in [33] describes adding two variables and a constant. For this architecture, only one variable and a constant are added. A detailed de scription of the constant addition mechanism can be found in [33]. In the twostage parallel adder, the intermediate sum, ui, is fed to two constant addition blocks for adding and subtracting 10. A 3to1 multiplexer will select f for adding −10, 0 or 10 according to the selects: i c and i c . The multiplexer’s output is XORed with ui to invert the flagged bits. This inversion yields . 10 ii uc The remaining step is to add the incoming carry. Two more constant addition blocks are used to add or subtract 1 from i u10 i c . Another 3to1 multiplexer is used with XOR gates after it to invert another flagged set of bits and yield the sum. Breaking the addition up into two stages seems to sacrifice speed. However, the two levels involve a very small amount of logic. Both circuits (one stage and twostage implementations) have a better area delay product with parallel constant addition than with out. The onestage parallel adder is shown in Figure 2. This design takes advantage of the fact that ui is calcu lated before any of the carries. In fact, the correction vector speculative addition starts as soon as ui[0] arrives. Therefore, it can be seen that this type of speculation can improve performance. Most other decimal signeddigit adders do not speculate the constant addition. 7. SD and BCD Conversion Any SD number can be converted to BCD, and vice versa [30]. The proposed SD adders must extend an un signed BCD number by one bit for BCD to SD conver sion. On the other hand, a carrypropagation operation must be performed for SD to BCD conversion. A Kogge Stone prefix network is used to accelerate the carry propagation. The propagate signal pi is set when the SD Copyright © 2011 SciRes. CS
J. REBACZ ET AL. 231 carry generation i y i CP u i f i s i speculative constant addition multi ple xo bitwise XOR 3 6 5 i c i c 1i c 1i c Figure 2. The proposed onestage signeddigit unit for one digit is shown. This figure shows that the carry generation and constant addition speculation occur in paralle l, which is desirable for speed. digit is 0. The generate signal gi is set when the SD digit is negative. It should be apparent that −1 and 0 are the two possible values for the carry. Carries are generated using a KoggeStone prefix network, and are used to determine the correction vectors for each sdi to obtain bcdi. If there is a carry in to a digit that is greater than 0, −1 is added. If there is a carry in to a negative or 0 digit, 9 is added. If there is no carry in but the digit is negative, 10 is added. Equation (11) expresses these conditions in terms of gi, pi, and the carry in cin. if 1 if 9 10 if if inii in i inii in i i iinii in iinii in cgpc sd cgpc sd bcd sd cgpc sd cgpc (11) To perform these corrections, constant addition is used again. The improvement in speed and areadelay inside the conversion circuit is around 20% when using a con stant addition correction scheme over conventional addi tion. Figure 3(a) shows the conversion block for 1 digit, Figure 3(b) for 8 digits. In [34], a hybrid SD number system shows that un signed and signed numbers have a continuum of number representations between them. For example, every third or every fourth digit can be signed. This in turn would limit the carrypropagation chains to those places. How ever, experiments with hybrid SD representations have shown that the added logic necessary to incorporate the two schemes together costs too much area and delay. 8. Multioperand Addition Twooperand signeddigit decimal adders can be used to add multiple numbers if arranged in a tree. In this way, corrections are made after every addition. However, it is apparent that immediate correction is not necessary. The correction step can be postponed in a similar manner as shown in [7] with the addition of new constraints to the problem (see Table 2). sd i [0] constant addi t i on sd i [3]sd i [2]sd i [1] sd i [4] constant addition co n stant addition 0000 sd i [1] sd i [3] sd i [2] sd i [0] +10 +9 −1 4to1 bcd i [1] bcd i [2] bcd i [3] bcd i [0] p i c in g i (a) sd 7 sd 1 sd 0 555 SD to BCD converter pgc in bcd 7 pg c in bcd 1 SD to BCD converter SD to BCD converter p gc in bcd 0 KoggeStone Prefix tree (b) Figure 3. In (a), SD to BCD conversion for one digit is shown. This circuit finds g and p so that a carrylookahead generator can find cin for all digits. The conversion scheme for 8 decimal digits is shown in (b). Table 2. Ranges for multioperand addition. Number of Operands Range of ui Bounds of 10 ii uc 2 operands [–18,18] [–8,8] 3 operands [–27,27] [–7,7] 4 operands [–36,36] [–6,6] 5 operands [–45,45] [–5,5] Copyright © 2011 SciRes. CS
J. REBACZ ET AL. 232 c 7 n The operations for noperand addition for a digit in the ith position can be summarized in (12)(14) (ai,n repre sents the nth input for the ith digit position). These show that an intermediate sum can be calculated from multiple operands and a similar correction procedure can be used to obtain the sum. Equation (14) must be satisfied for a ci in order for carryfree decimal signeddigit addition to work. For example, for three operand addition, –7≥ i ≥ 7, for all possible ui in [−27,27], a ci value can be found that makes true. Note that (14) finds the maximum bounds, but tighter bounds are possible so long as the bounds can represent all decimal digits. 10 i u 710 ii uc ,1 ,2,ii ii ua aa (12) 1 10 ii ii ucc (13) :1010 10 iii cnuc n (14) with the maximally redundant digit set of [−9,9], 6 or more operands cannot be added without an intermediate correction since adding 5 carries restricts to [−4, 4]. This condition cannot hold for such ui as −5, 5, 15. 10 ii uc Therefore, only 2, 3, 4 and 5operand signeddigit ad ders are possible with the present scheme (see Figure 4). Furthermore, only the 2operand adder can make use of the parallel addition technique; delay cannot be improved since the greater number of parallel additions increases the multiplexer size and the load. CPA correction vector generation i c i−1 3 : 2 reduction carry generation u i CPA (3, 4 or 5) : 2 reduction Figure 4. The proposed signeddigit multioperand addition unit for one digit is shown. This scheme is limited for addi tion of 3, 4 or 5 operands. The combinational logic for carry generation becomes more constrained and more complex as n increases. For 2operand addition, there is a lot of flexibility in choos ing the rule set as evidenced by the logic minimization obtained in (8). For 5 operands, fewer rule sets are available to choose from. Since adding 5 SD numbers may result in 45 or −45 (maximum carry in of 4 or −4), the rule set must ensure that . The 2operand correction logic requires the 3 upper bits from ui whereas the 5operand correction logic requires all bits from ui and many more minterms. 510 ii uc5 For multioperand addition requiring multiple modules (i.e. n > 5), combining reduction where possible can yield better performance [35]. The lower part of the SD adder that reduces and adds ui, ci−1 and can be combined with the earlier part of the next level’s SD adder that reduces and adds n operands. The operation for these parts is addition, so the property of associativity can be exploited. 10 i c As an example of multioperand addition as well as this optimization technique, a 16operand adder is shown in Figure 5(a). Three 5operand adders add the first 15 input operands. The 16th input operand is added on the second level with the three sums from the first level us ing a 4operand adder. The circuit can be improved by considering the three parallel 5operand adders together, and summing their internal ui, c i1, and terms together instead of separately. Basically, instead of per forming three separate 3:2 reductions followed by a CPA for each 5operand adder, one large 9:2 reduction fol lowed by a CPA is performed. Adding the 16th input at this stage only involves growing the reduction circuit to 10:2. Furthermore, only one CPA is needed to produce ui for the 4operand circuit instead of four (one at the end of each 5operand adder and one near the beginning of the 4operand adder). This optimized 16operand adder is shown in Figure 5(b). The reduction circuit should be organized to reduce the terms that become available first. First, the intermediate sums and leftover operands should be reduced. When the carries become available, their reduction should be merged with the intermediate sums. Likewise, reduction of the correction vector should be merged when it becomes available. 10 i c 9. Results All designs were written in Verilog HDL. Synthesis re sults for area and timing were obtained from Synopsys Design Compiler using MOSIS TSMC 0.18μm standard cell technology. Each design was compiled as an 8digit (decimal) adder. Several of the designs are 2operand only, but are arranged in a parallel tree to obtain multi operand addition. Copyright © 2011 SciRes. CS
J. REBACZ ET AL. Copyright © 2011 SciRes. CS 233 (a) ui ci ci–1 10*ci ui ci–1 ci (b) Figure 5. Multioperand addition. ((a) 16operand adder is constructed with 4operand and 5operand adders from Figure 4; (b) An improved 16operand adder combines the reduction steps in the last stage of the three 5operand adders with the first stage of the 4operand adder.) The designs in Table 3 have been synthesized with BCD conversion (if necessary) before the SD operation. Thus, the adders are outputting different number repre sentations, but are adding BCD numbers. The Svoboda adder uses a 5bit Svoboda code to facilitate addition. We see from Table 3 that even after accelerating the adders with a prefix tree, the Svoboda adder is still too costly in area. Furthermore, the special Svoboda code requires much more data conversion overhead to and from BCD (compare Table 3 to the 2operand rows in Table 4 which shows 2operand addition with conver sion from and to BCD). The speculative SD adder uses the positive/negative component representation. This representation is advantageous for inverting an operand.
234 J. REBACZ ET AL. Additionally, there is no overheard required for convert ing to the adder’s representation. However, the area and delay requirements are too much compared to other de signs. The DCFA [26] achieves a better areadelay product than the speculative SD while using the same number representation. The RBCD adder is strong in terms of delay and area, but the conversion from BCD puts RBCD behind many of the adders. The results in Table 3 for our proposed adders dem onstrate two points: i) The adder architecture is efficient in terms of area and delay and ii) The constant addition modification can improve delay or area (see last two rows). The architecture referred to by Table 3 as the proposed SD is shown in Figure 1. The proposed onestage SD is shown in Figure 2. The advantage of the proposed SD's number representation is that it requires no conversion from BCD. Additionally, a digit's sign information is found in the digit's most significant bit. For the representation used in the speculative adder and the DCFA, a sign detector must be used on each digit to determine the digit’s sign. Therefore, we believe the proposed designs offer superior performance with the two’s complement representation. For the multioperand results, every adder inputs de cimal digits in the BCD representation and outputs an unsigned BCD sum vector. An additional conversion step is necessary after all SD adders and before the RBCD and Svoboda adders. Since the operand size is 8 digits, the carryfree advantage of the SD adders is not being leveraged. Table 3. Area and delay comparison for 2operand SD ad ders. Decimal Adder Delay (ns) Area (mm2) AreaDelay (ns*mm2) AreaDelay compared to Proposed SD RBCD [27,28] 1.66 0.03840.0637 × 1.26 Svoboda SD [24] 2.18 0.08980.1960 × 3.87 Speculative SD [25] 1.40 0.05820.0815 × 1.61 DCFA [26] 1.48 0.04980.0737 × 1.45 Proposed SD 1.36 0.03730.0507 × 1.00 Proposed 1stage SD 1.27 0.03850.0489 × 0.96 Proposed 2stage SD 1.36 0.03460.0471 × 0.93 Table 4 shows that the multioperand SD adder pro posed in this work outperforms the existing SD designs at every corner (except against the RBCD adder’s area for two operands). The RBCD adder occupies the least area for 2 and 4 operands, but falls second to the pro posed adder for 8 and 16 operands. For hardware designs that can benefit from the two’s complement SD repre sentation, the proposed architecture should be considered. Among the unsigned adders in Table 5, the dynamic decimal adder using CLA yields the best delay. It also consumes the least area for 2 and 4 operands. For 8 and 16 operands, the mixed binary and BCD adder architec ture yields the best areadelay product. It may be specu lated that the mixed binary and BCD approach will scale best, since the main growing component is the fast tree of binary adders. Table 4. Area and delay comparison for signed multiope rand adders. Operands Delay (ns) Area (mm2) AreaDelay (ns*mm2) AreaDelay compared to Proposed SD Svoboda Adder [24] 2 3.39 0.110 0.373 × 3.42 4 4.93 0.242 1.19 × 5.36 8 6.50 0.477 3.10 × 4.58 16 8.15 0.912 7.43 × 4.17 Speculative SD Adder [25] 2 2.51 0.0713 0.179 × 1.64 4 3.89 0.129 0.502 × 2.26 8 5.22 0.288 1.50 × 2.22 16 6.83 0.545 3.72 × 2.09 DCFA Adder [26] 2 2.79 0.0675 0.188 × 1.72 4 4.15 0.172 0.714 × 3.22 8 5.90 0.323 1.91 × 2.82 16 7.40 0.649 4.80 × 2.70 RBCD Adder [27,28] 2 2.67 0.0486 0.130 × 1.19 4 3.73 0.0794 0.295 × 1.33 8 5.01 0.154 0.772 × 1.14 16 6.21 0.306 1.90 × 1.07 Proposed SD Adder 2 2.08 0.0539 0.109 × 1.00 4 3.18 0.0698 0.222 × 1.00 8 4.37 0.155 0.677 × 1.00 16 5.49 0.324 1.78 × 1.00 Copyright © 2011 SciRes. CS
J. REBACZ ET AL. 235 Table 5. Area and Delay Comparison for unsigned multi operand adders. Operands Delay (ns) Area (mm2) AreaDelay (ns*mm2) Nonspeculative Adder [7] 2 1.29 .0246 .0317 4 3.15 .0614 .193 8 4.03 .147 .592 16 6.78 .280 1.90 Mixed Binary and BCD Adder [23] 2 1.29 .0246 .0317 4 2.71 .0535 .145 8 3.41 .101 .344 16 4.45 .187 .832 Reduced Delay BCD Adder [10] 2 1.34 .0284 .0381 4 2.57 .0644 .166 8 3.68 .143 .526 16 4.84 .278 1.34 Dynamic Decimal using CLA Adder [11] 2 1.29 .0246 .0317 4 2.38 .0598 .142 8 3.01 .123 .370 16 3.82 .236 .902 Proposed SD multioperand addition with conversion to BCD does not outperform the best unsigned multi operand scheme. However, the SD adders’ larger func tional domain must not be overlooked. That is, the ease at which subtraction can be performed with an SD adder over an unsigned adder strengthens the SD adder’s posi tion. 10. Applications The ideal target applications for the proposed signed digit schemes would leverage signeddigit advantages. One benefit is the elimination of carry propagation addi tion with long words (64 or 128 bits) operated on itera tively. If the application does not require iterative com putations, then immediate conversion back to an un signed representation will negate the carryfree addition performance. However, if the application requires know ledge of sign at each iterative step, which signeddigit addition easily provides, then signeddigit addition is promising. Moreover, applications that can conveniently use the signeddigit representation for other operations as well as addition/subtraction stand to benefit. The itera tive multiplier in [15] uses a signeddigit adder because an earlier step (partial product generation) found that the signeddigit representation can increase performance. In SRT division, division is executed iteratively and the sign of the dividend is necessary at each iteration. Therefore, the proposed 2operand adders can potentially serve a core role in division. For the proposed multioperand scheme, adding multiple partial products to reduce cycles in iterative multiplication appears promising. Finally, advanced financial algorithms may stand to benefit. 11. Conclusions In this study, new signeddigit two operand and multi operand decimal adders are proposed. Performance of re cent decimal adder architectures have been investigated and compared. The proposed SD adder excels in speed and area usage among previously proposed SD adders. The use of constant addition for speculation and the merg ing of adjacent modules with sharable operations enable efficient implementation of two operand and multiop erand decimal addition. 12. References [1] BigDecimal, 2008. http://java.sun.com/products. [2] DecNumber, AlphaWorks, 2008. http://www.alphaworks.ibm.com/tech/decnumber. [3] L. Eisen, et al., “IBM POWER6 Accelerators: VMX and DFU,” IBM Journal of Research and Development, Vol. 51, No. 6, 2007, pp. 663683. doi:10.1147/rd.516.0663 [4] C. Webb, “IBM z10: The NextGeneration Mainframe Microprocessor,” IEEE Micro, Vol. 28, No. 2, 2008, pp. 1929. doi:10.1109/MM.2008.26 [5] A. Tsang and M. Olschanowsky, “A Study of Database 2 Customer Queries,” Technical Report TR03.413, IBM Santa Teresa Laboratory, San Jose, USA, 1991. [6] L.K. Wang, C. Tsen, M. Schulte and D. Jhalani, “Benchmarks and Performance Analysis of Decimal FloatingPoint Applications,” 5th International Confer ence on Computer Design, Lake Tahoe, 710 October 2007, pp. 164170. doi:10.1109/ICCD.2007.4601896 [7] R. Kenney and M. Schulte, “HighSpeed Multioperand Decimal Adders,” IEEE Transactions on Computers, Vol. 54, No. 8, 2005, pp. 953963. doi:10.1109/TC.2005.129 [8] I. D. Castellanos and J. E. Stine, “Compressor Trees for Decimal Partial Product Reduction,” GLSVLSI’08: Pro ceedings of the 18th ACM Great Lakes Symposium on VLSI, ACM, Orlando, 46 May 2008, pp. 107110. [9] I. S. Hwang, “High Speed Binary and Decimal Arithme tic Unit,” USA Patent No. 4,866,656. [10] A. Bayrakci and A. Akkas, “Reduced Delay BCD Ad der,” IEEE International Conference on Application Spe cific Systems, Architectures and Processors, Montreal, 911 July 2007, pp. 266271. [11] Y. You, Y. D. Kim and J. H. Choi, “Dynamic Decimal Adder Circuit Design by Using the Carry Lookahead,” Copyright © 2011 SciRes. CS
J. REBACZ ET AL. Copyright © 2011 SciRes. CS 236 IEEE Design and Diagnostics of Electronic Circuits and Systems, Prague, 1821 April 2006, pp. 242244. doi:10.1109/DDECS.2006.1649627 [12] L.K. Wang and M. Schulte, “A Decimal FloatingPoint Adder with Decoded Operands and a Decimal Lead ingZero Anticipator,” 19th IEEE Symposium on Com puter Arithmetic, Portland, 810 June 2009, pp. 125134. [13] A. Vazquez and E. Antelo, “A HighPerformance Sig nificand BCD Adder with IEEE 7542008 Decimal Rounding,” 19th IEEE Symposium on Computer Arith metic, Portland, 810 June 2009, pp. 135144. [14] L.K. Wang and M. Schulte, “Decimal FloatingPoint Adder and Multifunction Unit with InjectionBased Rounding,” 18th IEEE Symposium on Computer Arith metic, Montpellier, 2527 June 2007, pp. 5668. [15] M. Erle, E. Schwarz and M. Schulte, “Decimal Multipli cation with Efficient Partial Product Generation,” 17th IEEE Symposium on Computer Arithmetic, Cape Cod, 2729 June 2005, pp. 2128. [16] M. Erle, M. Schulte and B. Hickmann, “Decimal Float ingPoint Multiplication via CarrySave Addition,” 18th IEEE Symposium on Computer Arithmetic, Montpellier, 2527 June 2007, pp. 4655. [17] A. Vazquez, E. Antelo and P. Montuschi, “A New Family of HighPerformance Parallel Decimal Multipliers,” 18th IEEE Symposium on Computer Arithmetic, Montpellier, 2527 June 2007, pp. 195204. [18] G. Jaberipur and A. Kaivani, “Improving the Speed of Parallel Decimal Multipliers,” IEEE Transactions on Computers, Vol. 58, No. 11, 2009, pp. 15391552. doi:10.1109/TC.2009.110 [19] T. Lang and A. Nannarelli, “A Radix10 DigitRecurrence Division Unit: Algorithm and Architecture,” IEEE Trans actions on Computers, Vol. 56, No. 6, 2007, pp. 727739. doi:10.1109/TC.2007.1038 [20] A. Vazquez, E. Antelo and P. Montuschi, “A Radix10 SRT Divider Based on Alternative BCD Codings,” 25th International Conference on Computer Design, Lake Tahoe, 710 October 2007, pp. 280287. [21] H. Nikmehr, B. Phillips and C. Lim, “Fast Decimal FloatingPoint Division,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 14, No. 9, 2006, pp. 951961. [22] A. Kaivani and G. Jaberipur, “Fully Redundant Decimal Addition and Subtraction Using StoredUnibit Encod ing,” Integration, the VLSI journal, Vol. 43, No. 1, 2010, pp. 3441. [23] L. Dadda, “Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach,” IEEE Transactions on Computers, Vol. 56, No. 10, 2007, pp. 13201328. doi:10.1109/TC.2007.1067 [24] A. Svoboda, “Decimal Adder with Signed Digit Arithme tic,” IEEE Transactions on Computers, Vol. C18, No. 3, 1969, pp. 212215. doi:10.1109/TC.1969.222633 [25] J. Moskal, E. Oruklu and J. Saniie, “Design and Synthesis of a CarryFree SignedDigit Decimal Adder,” IEEE In ternational Symposium on Circuits and Systems, New Orleans, 2730 May 2007, pp. 10891092. [26] H. Nikmehr, B. Phillips and C. Lim, “A Decimal Car ryFree Adder,” Smart Structures, Devices, and Sys temsII, Sydney, 13 December 2005, pp. 786797. [27] B. Shirazi, D. Yun and C. Zhang, “RBCD: Redundant Binary Coded Decimal Adder,” IEE Proceedings on Computers and Digital Techniques, Vol. 136, No. 2, 1989, pp. 156160. doi:10.1049/ipe.1989.0021 [28] B. Shirazi, D. Yun and C. Zhang, “VLSI Designs for redundant BinaryCoded Decimal Addition,” 7th Annual International Phoenix Conference on Computers and Communications, Scottsdale, 1618 March 1988, pp. 5256. doi:10.1109/PCCC.1988.10042 [29] R. Kenney, M. Schulte and M. Erle, “A HighFrequency Decimal Multiplier,” IEEE International Conference on Computer Design: VLSI in Computers and Processors, San Jose, 1113 October 2004, pp. 2629. doi:10.1109/ICCD.2004.1347893 [30] A. Avizienis, “Signed Digit Number Representations for Fast Parallel Arithmetic,” IRE Transactions on Electronic Computers, Vol. EC10, No. 3, 1961, pp. 389400. doi:10.1109/TEC.1961.5219227 [31] B. Parhami, “Generalized SignedDigit Number Systems: A Unifying Framework for Redundant Number Repre sentations,” IEEE Transactions on Computers, Vol. 39, No. 1, 1990, pp. 8998. doi:10.1109/12.46283 [32] J. Rebacz, E. Oruklu and J. Saniie, “High Performance SignedDigit Decimal Adders,” IEEE International Con ference on Electro/Information Technology, Windsor, 79 June 2009, pp. 251255. doi:10.1109/EIT.2009.5189621 [33] V. Dave, E. Oruklu and J. Saniie, “Design and Synthesis of a Three Input Flagged Prefix Adder,” IEEE Interna tional Symposium on Circuits and Systems, New Orleans, 2730 May 2007, pp. 10811084. doi:10.1109/ISCAS.2007.378197 [34] D. Phatak and I. Koren, “Hybrid SignedDigit Number Systems: A United Framework for Redundant Number Representations with Bounded Carry Propagation Chains,” IEEE Transactions on Computers, Vol. 43, No. 8, 1994, pp. 880891. doi:10.1109/12.295850 [35] J. Rebacz, E. Oruklu and J. Saniie, “Performance Evalua tion of MultiOperand Fast Decimal Adders,” 52nd IEEE International Midwest Symposium on Circuits and Sys tems, Cancun, 25 August 2009, pp. 535538. doi:10.1109/MWSCAS.2009.5236036
