### IJAE1. International Journal of Research in Engineering and Technology 15551. 2517-1105 | plosin. 2521-7500

# A HIGH SPEED DYNAMIC RIPPLE CARRY ADDER

# Y. Anil Kumar<sup>1</sup>, M. Satyanarayana<sup>2</sup>

<sup>1</sup>Student, Department of ECE, MVGR College of Engineering, India.
<sup>2</sup>Associate Professor, Department of ECE, MVGR College of Engineering, India.

# **Abstract**

Adder, which is one of the basic building blocks of a processor affect the performance of the processor. There are many adder architectures each of them have their own advantage. Ripple Carry Adder (RCA) architecture occupies the minimum area among the other architectures with lesser power dissipation. RCA experiences more delay due to its carry propagation in critical path; apart from the delay it also experiences glitches. Constant delay (CD) logic solves both the delay problems and glitch related problems. CD logic, due to its pre-evaluated characteristics delivers high speed but due its bulkier nature it is used only in the critical path. In this paper two new techniques are presented which modifies the conventional timing block (requires ten transistors) in CD logic and two new timing blocks one with eight transistors and other with nine transistors are developed. The CD logic with the two new timing block is used in critical path of RCA to achieve higher speed performance with lesser area compared to conventional CD logic. The CD logic with 9-transistor timing block achieves 70% and 39% delay reduction compared to Static and Domino logics. It also achieves 21% and 5% reduction in power dissipation and delay. The 8-transistor version also achieves reduction of delay by 65% and 29% compared to Static and dynamic logic. The two versions of timing blocks have their own advantages where 9-transistor version provides high speed and 8- transistor version provides lesser power dissipation. Simulations are carried out in 130 nm at 1V power supply using mentor graphics tools.

Key Words: Critical Path, Feed Through Logic, Constant Delay logic, Pre-evaluated logic, and Timing block.

\*\*\*

#### 1. INTRODUCTION

Addition is very crucial to perform fundamental arithmetic operations. It is used extensively in many VLSI designs and is by so far the most frequently used operation in general-purpose system and in application-specific processors. Also, because the operations of subtraction, multiplication, division and address calculation usually rely on the operation of addition, addition is often seen as an indispensable part of the arithmetic unit. It is dubbed the heart of any microprocessor, DSP architecture, and data processing system.

The carry propagation from each bit to its higher position results in a substantial delay. So the adder which lies in the critical delay path effectively determines the system's overall speed. An efficient adder builds an efficient system. This leads to increasing popularity of smaller and more durable mobile computing and communication systems. There are many adder architectures namely the Ripple Carry Adder (RCA), the Carry Look-Ahead Adder (CLA), the Carry Skip Adder (CSK), the Carry Select Adder (CSL), the Carry Save Adder (CSA) and the Conditional Sum Adder (COS). Each architecture has its own advantages.

Among all the adder architectures, the RCA occupies the smallest area and offers good performance for random data input. But the delay depends on length of carry propagation path. As the number of inputs increases delay increases linearly. For an n-bit RCA the delay is nT, where T is the delay of a full adder block. The overall performance of an RCA depends on design on Full adder block.

#### 2. FULL ADDER BLOCK

There are many full adder (FA) architectures, where the conventional CMOS adder uses 32 transistors, the highest among the adders and least number of transistors required to design a full adder are six. But the CMOS logic and dynamic logic provides less power dissipation. But the dynamic logic suffers from cascading problem. Domino logic overcomes the cascading problem with an extra inverter [3]. The cascading problem in dynamic logic and an extra inverter overhead is compensated by NORA and ZIPPER logic, but NORA logic suffers from charge leakage and ZIPPER logic needs non overlapping clocks that creates area overhead [1-2].

Some other versions of full adders include Complementary pass transistor logic full adder, Transmission gate full adder, 17-transistor full adder, 14- transistor full adder, 10-transistor full adder [5] etc.,

In this paper a high speed dynamic logic is proposed which is derived from (Constant Delay) CD logic [8]. Before discussing about CD logic, Feed Through Logic (FTL) [4] should be understood. FTL is shown in fig. 1 overcomes the area over head in domino logic and cascading problem in dynamic logic. By removing the footer transistor and placing a pre-discharge transistor parallel to output node, the cascading problem is solved without an extra inverter. But drawback of FTL is higher power dissipation than dynamic and domino logic. This is due to the short circuit path from VDD to Ground when M1 and NMOS pull down network conducts simultaneously. CD logic overcomes the short circuit problem in FTL with the help of an additional timing block. This timing block prevents the pull up transistors and

eISSN: 2319-1163 | pISSN: 2321-7308

NMOS pull down logic to conduct simultaneously. The inverter at the output is to eliminate the noise. CD LOGIC can implement both inverting logic and non inverting logic. Fig. 2 shows buffer implemented using constant delay logic. In this paper six types of full adder configurations are compared with each other, where A, B and CIN are the inputs and COUT and SUM are the outputs. Fig. 3(a) and 3(b) shows the sum generation units, where 3(b) consumes less power but provides more delay compared to CMOS sum generation. Figure 4(a) - (d) shows the carry generation units of Conventional CMOS logic, Domino logic, 10-Transistor Full adder and CD logic respectively. Out of the 4 types of carry generation units the 10-Transistor FA requires least number of transistors to generate carry. It generates the inverted version of carry, so to use it in RCA the COUT signal should be given to an inverter input. The noise in **COUT** signal is removed by the inverter, to remove the noise in SUM signal two inverters or a buffer should be added to the SUM signal output. But this addition of inverters makes it increase its size and power dissipation. When 10-T FA is used in RCA, by connecting the sum output of the last stage to a buffer, the noise at the output is reduced. CMOS logic and Domino logic carry generation unit consumes the lesser power. CD logic provides lesser delay above all the carry generation units because of its pre-evaluation concept.

In this paper CD logic technique with optimized timing block is utilized to design an 8-bit RCA. Importance of CD logic and methods to overcome the drawbacks of CD logic are briefly explained in the next section.



Fig -1 feed through logic (FTL)



Fig -2 Constant delay logic buffer



Fig -3 Sum generation units



Fig -4 Carry generation units

# 3. PROPOSED TIMING BLOCKS OF CD LOGIC.

CD logic in D-Q mode shows high speed performance due to its pre-evaluation nature. But when compared to domino logic(without keeper), CD logic requires extra 11 transistors and extra 13 compared to dynamic logic. In CD logic the extra number of transistors is mainly due to timing block (TB), so optimizing the timing block reduces the area overhead and power consumption. The timing block should be optimized in such a way that the delay should not be increased.

Two optimized designs of timing blocks are proposed in this paper which performs the same logical function of the original timing block.

# 8-T Timing Block

In first design the number of transistors in the timing block is reduced by two. Fig. 5 shows the original timing block from where the transistors M1 and M3 are removed to get the same operation performed. Fig. 6 shows the first modification in timing block i.e., 8-T timing block. The delay gets increased if this timing block is used because

eISSN: 2319-1163 | pISSN: 2321-7308

transistor M3 has clock as an input i.e., if CLK='1' then TOUT should be pulled to zero.



Fig -5 Timing Block of CD logic

As M3 is absent in this circuit TOUT is made zero through M2 and M1 using delay inverted clock signal. So TOUT is inverted version of clock, i.e., when CLK='1' TOUT='0', but with some delay. This delay makes the pre-charging slower, which increases the delay. To reduce the delay, the W/L ratio of pre-charge transistors should be increased. To avoid this 9-T timing block is developed.



Fig -6 8-T timing block

### 9-T Timing block

In the second design one transistor is reduced as shown in fig. 7, but the 8-T timing block has some limitations for low leakage paths, and 9-T timing works well with the low leakage paths but dissipates more power at high leakage paths. The drawback in 8-T timing block is overcome by adding transistor M3. So now, charging the TOUT node is faster compared to that in 8-T timing block. But here the disadvantage is more power dissipation due to extra transistor.



**Fig -7** 9-T timing block

# 4. IMPLEMENTATION OF 8-BIT RIPPLE

## **CARRY ADDER**

In RCA the carry chain constitute the critical path, so speeding up the carry helps in improving the speed of the RCA. Ripple Carry adder is subjected to a glitching problem. Due to delay in previous stage carry signals the glitches occur. In CD logic delay from the previous stage is to be considered seriously as the evaluation time for this logic type is very less. Evaluation time is said to be the window width. The window width is equal to the three inverter delay. Generally in dynamic logic the evaluation time is the whole evaluation period, but in CD logic it is just a part of the evaluation period, so the delay should be less than the window width to prevent false logic evaluation. If the glitches from the previous state last for the window width period then false evaluation takes place. To eliminate this problem the clock signal should be delayed such that the window width period can be delayed and the glitches can be avoided. Figure 8 shows the arrangements of inverters to generate appropriate clock signals for each stage. CLK1, CLK2 and CLK3are delayed clock signals for different stages of RCA. This arrangement is only for dynamic logic, as the static logic doesn't operate on clock signals.

In this paper an 8-bit RCA is simulated using six different full adder blocks. The sum generation unit same for all adder blocks. The simulated results in Table I are of the RCA circuit that used CMOS sum generation unit. Even though 12-T adder provides lesser transistors and noise less output, it creates more delay which is due to the inverters INV1 and INV2 in fig 3(b). Due to the extra delay the clock arrangement also should be changed. It requires three extra delay clock signals as shown in fig 9, CLK for stage1, CLK1 for stage2, CLK2 for stage3, CLK3 for stage4, CLK4 for stage5, CLK5 for stage6, CLK6 for stage6 and stage7. Other full adder (FA) with 28 transistors with sizing strongly in favour of Cout computation [6] can also be used. A more energy-efficient pass-transistor FA design [7] can also be used in the full adder design

eISSN: 2319-1163 | pISSN: 2321-7308

Among the compared RCA architectures even though the 10-Transistor FA RCA requires the least number of transistors the delay is almost equal to the CMOS 32-Transistor FA RCA and power dissipation is very high. Considering low leakage conditions Domino logic carry generation circuit is simulated without a keeper transistor and that reduced 8 transistors. If the keeper transistor is included the power dissipation would be equal to that of CMOS 32-T FA RCA. CD logic occupies more area than CMOS logic but it has the better power delay product. 8-T TB CD logic even though provides more delay than CD logic it has second least power delay product. 9-TB CD

logic is the fastest among the compared logics and having the third least power delay product. Another logic, the Current Comparison Domino (CCD) Logic [9] consumes very less power which is not discussed in this paper but not faster than CD logic.

Ripple carry adder occupies the least area among all the other adder architectures, but has more delay when number of inputs increases. But with modified CD logic in critical path of RCA makes it suitable for 8-bit applications. Instead of going to CLA it is better to opt RCA which occupies less area with increasing speed.



Fig -8 8-bit Ripple carry adder



Figure 9 Clock signals arrangement for CD logic RCA with 12-T sum unit

Table -1 Comparison Of Rca Architectures With Different Logics

| Logic type           | S7 Delay   | COUT DELAY | Number of transistors | Power Dissipation | Power Delay product (10 <sup>-18</sup> ) |
|----------------------|------------|------------|-----------------------|-------------------|------------------------------------------|
| Conventional 32-T FA | 2.113 ns   | 2.07 ns    | 276                   | 20.5962 n Watts   | 43.519                                   |
| Domino Logic         | 1.04 ns    | 587.718 ps | 160                   | 14.238 n Watts    | 14.807                                   |
| CD Logic             | 677.780 ps | 550.134 ps | 300                   | 51.191 n Watts    | 34.656                                   |
| 10-T FA              | 2.512 n s  | 2.527 ns   | 96                    | 59.889 u Watts    | 150441                                   |
| 8-T TB CD logic      | 734.65 ps  | 612.456 ns | 284                   | 32.8427 n Watts   | 24.131                                   |
| 9-T TB CD logic      | 643.247 ps | 575.767 ps | 292                   | 42.3167 n Watts   | 27.220                                   |



Fig -10 Power and Delay characteristics of different types of 8-bit RCA architectures

#### 5. CONCLUSION

In this paper different RCA architectures are analyzed. Even though the 10-transistor FA occupies less area it failed to reduce the power and delay. CMOS logic despite of its bulky nature provides lesser power dissipation, but the delay increased by 200% when compared to CD logic. Two new techniques are employed in CD logic to reduce the power dissipation and delay. The two techniques 8T- Timing block and 9-T timing block used CD logic succeeded in reducing the power dissipation compared to its predecessor but the 8-T timing CD logic failed to reduce the delay. But the 8-T TB CD logic has lesser delay compared to CMOS logic, Domino Logic and 10-T FA RCA. Fig. 10 shows the Power-Delay product and Delay characteristics graph. 8-T TB CD logic has the better Power-Delay product among the other CD logic versions. The 8-transistor version also achieves reduction of delay by 65% and 29% compared to Static and dynamic logic. But the 8-T TB version has 14% more delay compared to 9-T TB version. Even though 8-T TB version dissipates less power than 9-T TB version, as the main aim is to reduce the critical path 9-T TB version is considered. Domino logic has the lesser power delay product but the delay is 53.4% more compared to CD logic and 61.7% more than 9-TB CD logic. So the domino logic dissipated less power and CD logic provides more speed. As the VLSI domain mainly opts for high speed and low power designs, 9-TB CD logic is suitable in replacing critical path circuits and the remaining circuit can be designed with domino logic or some other low power logic.

# REFERNCES

- [1]. N. Goncalves and H. De Man, "NORA: A racefree dynamic CMOS technique for pipelined logic structures", IEEE J. Solid-State Circuits, vol. 18, no. 3, pp. 261–266, Jun. 1983.
- [2]. C. Lee and E. Szeto, "Zipper CMOS", IEEE Circuits Syst. Mag., vol. 2,no. 3, pp. 10–16, May 1986.
- [3]. Sung-mo (Steve) Kang and Yusuf Leblebici. "CMOS Digital Integrated Circuits Analysis And Design", 3<sup>rd</sup> edition, WCB McGraw Hill, 2003.

- [4]. V. Navarro-Botello, J. A. Montiel-Nelson, and S. Nooshabadi, "Analysis of high-performance fast feed through logic families in CMOS", IEEE Trans. Circuits Syst. II, xp. Briefs, vol. 54, no. 6, pp. 489–493, Jun. 2007.
- [5]. Kiat-Seng Yeo and Kaushik Roy "Low-Voltage, Low-Power VLSI Subsystems", Tata McGraw-Hill Edition 2009.
- [6]. N. Weste and D. Harris, "CMOS VLSI Design: A Circuits and Systems Perspective", 4th ed. Reading, MA: Addison Wesley, Mar. 2010.
- [7]. M. Aguirre-Hernandez and M. Linares-Aranda, "CMOS full-adders for energy-efficient arithmetic applications", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 99, pp. 1–5, Apr. 2010.
- [8]. Pierce Chuang, David Li and Manoj Sachdev "Constant Delay Logic Style" IEEE Transactions On VLSI Systems, Vol. 21, No. 3, pp. 554-565, March 2013.
- [9]. Ali Peiravi, Mohammad Asyaei, "Current-Comparison-Based Domino: New Low-Leakage High-Speed Domino Circuit for Wide Fan-In Gates", IEEE Transactions VLSI Systems, Vol. 21, No. 5,pp. 934-943, May 2013.

## **BIOGRAPHIES**



**Yerninti Anil Kumar** received B.tech degree in Electronics and Communication Engineering from LENDI College of Engg. and Tech. Pursuing M.tech (VLSI) in MVGR college of Engineering.



**Dr. Moturi Satyanarayana** received B.tech in Electronics and Communication Engineering from Nagarjuna University. M.tech in Radar and Microwave Engineering from Andhra University, and PhD in Antenna Arrays from Andhra

University. Member of IETE, SEMCE, ISTE and ISOI.