NP-Domino, Ultra-Low-Voltage, High-Speed, Dual-Rail, CMOS NOR Gates

In this paper, novel ultra low voltage (ULV) dual-rail NOR gates are presented which use the semi-floating-gate (SFG) structure to speed up the logic circuit. Higher speed in the lower supply voltages and robustness against the input signal delay variations are the main advantages of the proposed gates in comparison to the previously reported domino dual-rail NOR gates. The simulation results in a typical TSMC 90 nm CMOS technology show that the proposed NOR gate is more than 20 times faster than conventional dual-rail NOR gate.


Introduction
Modern electronic technology faces trade-offs between power budget, and performance. Traditionally for the high-performance systems, design considerations assume a sufficient and stable supply of energy source to maintain constant performance throughout overall system operation [1]. For decades, the supply voltage of this system, has been set above the transistor's threshold voltage (Vth), and called above-threshold (or super-threshold) operation. However, in the modern low-power (LP) and ultra-low-voltage (ULV) portable applications, the energy supply is strictly limited, and the overall system benefits from the innovative techniques for active energy minimization and standby power reduction [1]- [5]. Examples of such power saving techniques include supply-voltage scaling, multi-threshold logic, transistor-stacking, and power-gating [1]- [5]. These ULV systems are extensively used in the modern applications such as low-cost iOT devices, wearable-electronics, intelligent remote sensors, implantable/wearable medical-devices, and energy-harvesting systems. For these ULV applications, often innovative techniques utilized to reduce the overall energy consumption. Sub-threshold design often consi-dered as a very suitable, and energy-efficient solution for these emerging energy-constrained applications [1]- [5].
The downscaling of CMOS technology (for higher transistor-density and computing-capacity) and reducing supply-voltage results in degradation in the speed of the logic circuits due to reduced gate-source voltage of the transistors [1]- [5]. Furthermore, existence of substantial leakage current in the modern CMOS nodes prevents the scaling of Vth aggressively [1]. In one side, increasing the market of low-cost portable-devices demands the design of the low-power blocks that enable the implementation of long-lasting battery-powered systems. On the other side, the general trend for increasing the operating frequencies and circuit complexity, in the modern high-performance processing applications, requires the design of very innovative high-speed circuits [2]- [19]. Current digital design techniques do not offer reliable and high-speed logic circuit that can operate at deep sub-threshold voltages. Hence, operability is the main goal in implementing ULV systems in these high-speed applications [13]- [19]. Domino logic is known as a high-performance circuit configuration which normally utilizes clocking scheme and is embedded in the static-logic environment [1]. Domino CMOS has become a popular logic family for high-performance and high-speed applications and it is extensively used to implement high-speed processors [6]- [10], since they provide advantages over static-CMOS logic, including fast operation, and lower number of transistors (lower silicon area) [11]. SFG technique has been proposed for ULV NP-domino logic structures [13]- [19]. ULVSFG logic implemented in a modern CMOS process requires frequent initialization (pre-charge) to minimize leakage. By applying the input signals, using input capacitors (Cin), to the gate of evaluation transistors (EN), these nodes (SFG nodes) can have a larger voltage level than power supply-voltage (VDD) [13]- [19]. The main aim is to increase the current of the evaluation transistor (EN) to achieve higher speed in the evaluation phase. In this paper, we used the SFG concept to speed up the performance of the conventional dual-rail logic. We compared the performance of the designed SFG dual-rail NOR gate with the conventional dual-rail NOR gate. Simulation result shows significant speed improvements. Furthermore, we discussed the inclusion of a new keeper transistor which improves the stability of the gate, to hold the voltage of the floating gate node and the output node, for a delayed input signal, especially when these gates are utilizing in a chain of gates in a large system. This paper is organized as follows: in Section 2, a short introduction to the simple conventional dynamic single-rail and dual-rail domino logic is provided and also the ULVSFG inverter and ULVSFG NOR gates are discussed. In Section 3, the proposed ULVSFG dual-rail domino NOR gates are discussed, and we study the delay and stability of the new logic gate. In Section 4, the simulation results, for the different NOR gates are given, and compared; finally, Section 5 concludes the paper. Figure 1 shows the conventional dynamic pre-charge to 1, single-rail and dual-rail NOR gates [1]. This type of domino logic is widely used in the high-speed applications like high-speed processors (e.g. 1-GHz 0.75 W ARM Cortex A8 designed by INTRINSITY) and studied in details in many papers and books (e.g. [1]). Domino logic gates, operate in two different phases of "pre-charge" and "evaluate". Compared to the simple static CMOS NOR, dynamic domino logic achieves higher speed at the cost of higher power consumption [1]. However A major limitation in the single-rail Domino logic is that only non-inverting logic can be implemented [1] [2]. This requirement has limited the widespread use of the pure domino logic style. This limitation is overcome with utilizing the "true" and "complemented" logic outputs of the dual-rail domino logic, at the cost of approximately doubling the power/energy consumption and utilized silicon-area [1] [2].

Semi-Floating-Gate Domino Logic
Figure 2(a) shows the dynamic, pre-charge to 1, single-rail Inverter gate using SFG technique [14]. This kind of ULVSFG NP-domino logic is introduced and discussed in details in many papers [13]- [19]. The main purpose of the ULVSFG domino style is to increase the current of the transistors at the low supply voltages without increasing the transistor widths. We may increase the current and speed compared to conventional domino and static CMOS, using different pre-charge voltages to the gates, and applying capacitive coupled inputs [14], similar to neuron MOS in [12]. In these topologies Voffset+ pins are connected to VDD and Voffset-pins are connected to GND. The High-speed N-type ULVSFG domino NOR (pre-charge to 1), is shown in Figure 2   clock signals are used both as control signals for the recharge transistors RP and RN, and as reference signals for NMOS evaluation transistors EN. Both inverter and NOR gates shown in Figure 2, operate in two different phase called "pre-charge" and "evaluate" and follow the sequences in the normal NP-domino logic style. The virtual ground signal (Virt.GND) is synchronized with the clock signal, with transistor dimensions sufficient enough to drive the needed current through the EN transistors during the evaluation phase.
When CK is low (0), and Virt.GND is high (1), both inverter and NOR gates becomes in the precharge phase. During this phase, RP transistors turn on and recharge the gate of the EN transistors to VDD. Meanwhile RN transistors turn on and recharge the gate of EP transistors to 0. Thus EP transistors turn on and precharge the output nodes to VDD. The keeper transistors, KN and KP, are inactive during this phase as the output node is precharge to VDD, and input signals are in the low (0) level since the gates follow the NP-domino logic.
In the evaluation phase, in the both inverter and NOR gates shown in Figure 2, when the clock signal CK switch from 0 to 1, and Virt. GND = 0, both recharge transistors RP and RN switch off which make the charge on the SFG nodes (Vp and Vn) become semi-float. The output nodes remain at high level until an input transition occurs. The input signals (IN) must be monotonically rising to ensure the correct operation for the N-type domino logic. This can only be satisfied if the input signal is low at the beginning of the evaluation phase, and if IN only makes a single transition from 0 to 1 in the evaluation phase. When this transition happens (IN rises from 0 to 1), the voltage of the semi-floating gates (VN) increase well above VDD, based on a capacitive coupling from the input node to the SFG node, and this increases the current of the EN devices in the evaluating phase and speed up the evaluation process. In this case the keeper transistors (KN) will be turn off and the voltage of the VN will be stable. Also, in this case, the keeper transistors (KP) will be turned on and will increase the gate voltage of the Ep transistor (VP), and eventually turn the Ep transistors off. This helps to reduce the static current which directly impacts on the noise margin and the power consumption of the proposed logic. In the second scenario (when no change/rise) the output voltage will be remained high. In this case, the keeper transistors (KN) will continue to reduce and discharge the voltage of VN nodes and therefore turning the EN transistors off, also the keeper transistors (KP) remain off and voltage of VP nodes will be float. The main necessity to have the keeper transistors KN is turn the evaluation transistors (EN) off to minimize the current dissipation during the evaluation phase when there is no raising input signals edge (input signals remain 0). This reduces the static power consumption significantly as discussed in [14]. The ULVSFG logic demonstrates significant speed improvements in comparison to conventional static CMOS logic [14]- [19]. However, as mentioned before, a major limitation in the single-rail domino logic is that only non-inverting logic can be implemented [1] [2]. As a solution, we propose a dual-rail version of the ULVSFG NOR gate in the next section.

Proposed Dual-Rail NOR Gates
As mentioned before, the NOR gate shown in Figure 2(b), is a single-rail logic, and although it is quite high speed logic when compared to static CMOS logic, it is not enough to implement inverting logics. Figure 3 shows the proposed ULVSFG, precharge to 1, dual-rail, NP-domino, NOR gate. Operation of the proposed NOR gate (shown in Figure 3) is similar to the operation of the single-rail version shown in Figure 2(b). The circuit has both true (A, B) and complementary version of the input signals ( A , B ) and produce both true (Q) and complementary output signals, and follows the sequences of the normal NP-domino logic (precharge and evaluate phases). SFG technique is used to boost the current of EN transistors in the evaluate phase as it is done in the single-rail version. In the precharge phase, as discussed before, output nodes charge to VDD by turning the RN and EP devices on. In this phase the gate of EN devices charge to VDD by turning the RP devices on. As in the single-rail version, in the precharge phase, since the logic gate is an NP-domino type logic, all input signals (including complementary versions) come from the previous stage dual-rail domino gates (which are precharge to 0), and are low during the precharge phase, while making a conditional 0 to 1 transition during evaluation. In the evaluate phase, each input signal, either remains at 0 or goes to high logic level (VDD). When a true input signal (A, B) remains at low (0) logic, the complementary version of that signal goes high, and when the true signal goes from 0 to 1 logic level, the complementary version remains at the low (0) level. As it mentioned before during precharge phase both Q and Q precharge to 1 and output signal either remain in 1 or goes low (0), depend on the input signals in the evaluation phase. The functionality of the gate is quite similar to that of the conventional dual-rail ones. In this topology the role of the keeper transistors (KN and KP) are the same as the keeper transistors in the single-rail version, as discussed in Section II. These transistors (KN) remain off, when there is a raising input signals edge. The KN devices turn on when there is no raising input signals edge (input signals remain 0) and discharge the floating nodes (VN) and this causes the evaluation transistors (EN) become off and the current dissipation to be reduced significantly, during the evaluation phase. This reduces the static power consumption significantly [14]. In the evaluation phase, simulation results show that a falling transition (1 to 0) in the output (Q) takes less than 50 pS, while the same transition in the conventional dual-rail NOR gate, shown in Figure 1(b), takes 1.7 nS. However, in both circuits, the falling transition (1 to 0) in the Q side is slower than Q side. This happens since both circuits are using stacked (cascode) transistors in the Q path. For the proposed NOR gate falling transition in Q takes 140 pS and for the conventional NOR gate shown in Figure 1(b) it takes 2.4 nS to switch from high to low level. However it is considerable that the structures utilizing the SFG technique (both single and dual-rail versions) are sensitive to the delay of the input signals. If the input voltage signals rise late enough, the voltages on the SFG nodes will be discharged by the ON currents of the keeper transistors (KN) and also by the leakage currents of the devices connected to the SFG nodes. In this case the structures will lose the benefits of having the higher voltage (over than VDD) on the SFG nodes and this causes significant speed reduction in the evaluation phase and even failure in the functionality of the gate for a delayed input signal. This condition happens when these structures are utilizing in the high-depth logic circuits. Therefore, the timing of the input rising signals is important for the proper operation of the structures which are utilizing the floating gate technique. If the voltages of the SFG nodes are lower than VDD, the EN transistors will not be able to pull down the voltages of output nodes to GND. The timing issue gives constraints in term of a valid timeframe for a delayed input. The evaluation speed of the SFG gates is affects by the delay of the input edge. This will affect the next gate and all gates in a chain, thus limiting the number of cas-caded gates. Simulation results show that the evaluation delay of the proposed SFG dual-rail NOR gate is approximately 50 ps at 300 mV power supply and the input edge delay less than 1 ns. For a 1.7 ns delay at the input signal, the evaluation delay becomes more than 80 ps. Furthermore, the simulated data show that the swing will not be sufficient enough for high-speed operation.

Modified Dual-Rail SFG Domino NOR Gate
In order to make the proposed ULVSFG NOR gate more robust and less dependent on the delay of the input edge, a new keeper structure should be applied. In the new keeper structure, the keeper devices should be off before raising the input signals, and conditionally turn on, depend on the transitions on the output nodes. One way to make it is applying a signal to the drain terminals of the keeper devices (KN), instead of CK signal, which are similarly linked to the timing of the input signals. These signals should be precharged to 1, when the output of the ULV gate is precharged to 1, and switche to 0 if the both input signals (A and B) remain 0 for the entire evaluation phase. Figure 4 shows the modified version of the dual-rail SFG, NOR gate which is tolerant to the delay of the input signal. In this topology the drain terminals of the keeper devices (KN) are connected to the output nodes of the complementary side, instead of the CK signal in the NOR gate shown in Figure 3. In this modified version of the ULVSFG dual-rail NOR gate, KN1 and KN2 keeper devices will be ON, if both A and B input signals remain low and both A and B signals become high (1) and consequently a high to low transition happens in Q . In this condition, both KN3 and KN4 will be OFF, and Q remains in the high level. On the other hand, if there is at least one raising transition on the input signals (A, B), a high to low transition will happen in the output voltage of the NOR gate (Q), and both KN1 and KN2 transistors become OFF. In this case both KN3 and KN4 devices will be ON and will discharge the SFG nodes (VN3 and VN4). Thanks to improvement in the robustness of the modified NOR gate (shown in Figure 4), in terms of holding the voltage of the SFG nodes (until a input signal edge arrives), larger logical depths is feasible to implement with SFG technique.

Simulation Results
The simulations for the designed logic circuits are done using Cadence software (version 6.1.6) in a typical 90 nm TSMC CMOS technology. Low threshold voltage devices are chosen to speed up the circuit. To verify the effect of the SFG technique on the performance of the ULV dual-rail NOR gate, ULVSFG NOR gates, shown in Figure 1(b), Figure 3 and Figure 4 are designed in the same device size, power supply voltage (300 mV) and load capacitors (CL = 2 fF), and finally the characteristics are studied. In the all designed circuits, a 2 fF capacitor is chosen for the input (Cin) capacitors. Simulation result shows that this size is the optimum capacitor size for the maximum speed, when the minimum size devices are used for the precharge devices. In the evaluation phase, simulation results for ULVSFG NOR gates shown in Figure 3 and Figure 4, show that a falling transition (1 to 0) in the output (Q) takes less than 50 pS, while the same transition in the conventional dual-rail NOR gate, shown in Figure 1(b), takes 1.7 nS. However, in both circuits, the falling transition (1 to 0) in the Q side is slower than Q side. This happens since both circuits are using stacked (cascode) transistors in the Q path. For the proposed NOR gate falling transition in Q takes 140 pS and for the conventional NOR gate shown in Figure 1(b) it takes 2.4 nS to switch from high to low level. Figure 5 shows the transient simulations results for the both simple ULVSFG NOR gate shown in Figure 3, and for modified version shown in Figure 4 when the input signal arrive with different delays compared to the CK signal, and in this case, there is at least one rising edge in the input signals (True sides) of the NOR gates which pulls down the output voltage signal to the Virt.GND. In Figure 5(a), there is not significant delay in input signal compared to CK signal and both NOR circuits shown in Figure 3 and Figure 4 manage to pull down the output voltage to Virt.GND in the almost same evaluation time. In this case, for both circuits, the voltage of the SFG node (VN1) is larger than 500 mV which is well above VDD = 300 mV. It is considerable that the Virt.GND signal is settled down to 0 before arriving the input signal. In Figure 5(b), there is a 3nS delay in the input signal compared to CK signal and both NOR circuits shown in Figure 3 and Figure 4 manage to pull down the output voltage to Virt.GND. However the modified version is faster than ULVSFG NOR gate shown in Figure 3, since it holds the voltage of the SFG node, longer and has larger voltage at that node. In this case the voltage of the SFG node (VN1) in modified version is larger than 500 mV while it is reduced to 460 mV in the simple ULVSFG NOR gate. In Figure 5(c), there is a 4.5 nS delay in the input signal compared to CK signal and both NOR circuits shown in Figure 3 and   Figure 3, since it hold the voltage of the SFG node, longer and has larger voltage at that node when input signal arises. In this case the voltage of the SFG node (VN1) in modified version is larger than 500 mV while it is reduced to less than 400 mV in the simple ULVSFG NOR gate. The evaluation delay of the simple ULVSFG NOR gate is significantly increased by the delay of the input signal as shown in Figures 5-7. The simulated response for the ULVSFG NOR gates, when a delayed input-signal edge of 7 ns relative to the clock signal (CK), are shown in Figure 6. As expected the voltage of the SFG node in the ULVSFG NOR gate shown in Figure 3, is reduced well below VDD by the leakage currents and the ON currents of the KN transistors. The EN transistors are not be able to pull down the output to 0 (Virt. GND) given the voltage swing at the capacitive coupled input signal. Clearly, the responses of the ULVSFG gates are significantly affected by the input delay as depicted in Figure 7. For the modified version shown in Figure 4, the case is different, the functionality of the structure will not be affected by the delayed input edge and the voltages of the SFG nodes of the gate remain stable at VDD. With given feature sizes of Table 1 for the ULVSFG NOR gates, the longest delay for an input signal edge is approximately 4.7 nS to manage to respond correct logically. The timing of the input signals significantly degrade the performance of the simple ULVSFG NOR gates as shown in Figures 5-7. This is because the initial charge of the SFG node in the ULVSFG NOR gate is reduced to smaller than VDD/2 and the current provided by the EN transistor is reduced significantly. The input swing gives the same capacitive transfer to the SFG node, but due to the fact that the voltage of the SFG node is VDD/2, the maximum peak would be approximately 300 mV on this node. One other crucial aspect is the outputs voltage swing. Considering a chained structure, the succeeding gate would receive an input signal with a lower swing than expected and hence would give a slower response. Hence the simple ULVSFG NOR gate has less noise margin comparing to modified version. In summary, the ULVSFG NOR gate suffers both from the increased output delay as well as the degenerated voltage swing (less noise margin). For smaller delays in the input signal, e.g. less than 3 ns, the response of the simple ULVSFG NOR is only affected by the reducing speed in the evaluation phase compared to the modified version shown in Figure 4 which is not sensitive to delay of the input signals and has better output signal swing and noise margin. By increasing the delay at the input signal,    Figure 7. For input delays above 1.5 ns the delay of the ULVSFG NOR gate increases almost exponentially, whereas the delay of the modified ULVSFG NOR gate is stable at approximately 50 ps. The details of the evaluation delay of the ULVSFG NOR gate compared to modified version, is shown in Figure 7.
The improvements as the data shows are 25 times at 4.8 nS. The delay for the proposed NOR gates are less than 5% of the delay of the conventional dual-rail NOR gate in same device size and power supply voltage. Simulation results show that the proposed circuit is operating properly with power supplies down to 100 mV. At those low power supplies, the speed reduces significantly, structures become more sensitive to process variations and overall performance of the structure reduces. However, ULVSFG structures are faster and more robust than conventional static CMOS and dual-rail domino logic in those ultra low voltage power supplies, as mentioned in [13]- [19].

Conclusion
In this paper, new NOR gate based on the ULVSFG dual-rail domino logic structure is presented. By applying the floating gate technique to the conventional dual-rail NOR gate, speed of the circuit increased significantly at the cost of increasing the complexity of the structure. Using the proposed method, delay of the ULVSFG Domino dual-rail NOR gate, is reduced more than 20 times in the evaluating phases and structure becomes robust significantly. The delay for the proposed NOR gates is less than 5% of the delay of the conventional dual-rail NOR gate in same device size and power supply voltage. Also a new keeper structure is introduced which makes the SFG technique more robust against the delay of the input signal. Using the new keeper structure, high-depth logics are feasible to implement with the SFG technique. Simulation results using 90 nm TSMC CMOS process parameters and Cadence software, confirm the predicted improvements.