# Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic Style

Sumeer Goel, Student Member, IEEE, Ashok Kumar, Senior Member, IEEE, and Magdy A. Bayoumi, Fellow, IEEE

*Abstract*—We present a new design for a 1-b full adder featuring hybrid-CMOS design style. The quest to achieve a good-drivability, noise-robustness, and low-energy operations for deep submicrometer guided our research to explore hybrid-CMOS style design. Hybrid-CMOS design style utilizes various CMOS logic style circuits to build new full adders with desired performance. This provides the designer a higher degree of design freedom to target a wide range of applications, thus significantly reducing design efforts. We also classify hybrid-CMOS full adders into three broad categories based upon their structure. Using this categorization, many full-adder designs can be conceived. We will present a new full-adder design to one of the proposed categories.

The new full adder is based on a novel XOR-XNOR circuit that generates XOR and XNOR full-swing outputs simultaneously. This circuit outperforms its counterparts showing 5%-37% improvement in the power-delay product (PDP). A novel hybrid-CMOS output stage that exploits the simultaneous XOR-XNOR signals is also proposed. This output stage provides good driving capability enabling cascading of adders without the need of buffer insertion between cascaded stages. There is approximately a 40% reduction in PDP when compared to its best counterpart.

During our experimentations, we found out that many of the previously reported adders suffered from the problems of low swing and high noise when operated at low supply voltages. The proposed full adder is energy efficient and outperforms several standard full adders without trading off driving capability and reliability. The new full-adder circuit successfully operates at low voltages with excellent signal integrity and driving capability. To evaluate the performance of the new full adder in a real circuit, we embedded it in a 4- and 8-b, 4-operand carry-save array adder with final carry-propagate adder. The new adder displayed better performance as compared to the standard full adders.

*Index Terms*—Adders, deep-submicrometer design, hybrid-CMOS design style, low-power, noise.

# I. INTRODUCTION

THE DEMAND and popularity of portable electronics is driving designers to strive for smaller silicon area, higher speeds, longer battery life, and more reliability. Power is one of the premium resources a designer tries to save when designing a system. Full adders are fundamental units in various circuits, especially in circuits used for performing arithmetic operations

The authors are with the Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504 USA (e-mail: sxg4999@cacs. louisiana.ed; ak@cacs.louisiana.ed; mab@cacs.louisiana.ed).

Digital Object Identifier 10.1109/TVLSI.2006.887807



Fig. 1. Power breakdown in high-performance microprocessors [2].

such as compressors, comparators, parity checkers, and so on [1]. Full adders are often in the critical paths of complex arithmetic circuits for multiplication and division. These in turn form the core of any system and thereby influence the overall performance of the entire system. Enhancing the performance of the full adder can significantly affect the system performance. Fig. 1 shows the power consumption breakdown in a modern day high-performance microprocessor [2]. The datapath consumes roughly 30% of the total power of the system. Adders are an extensively used component in datapaths and, therefore, careful design and analysis is required for these units to obtain optimum performance.

At the circuit level, an optimized design is desired to avoid any degradation in the output voltage, consume less power, have less delay in critical path, and be reliable even at low supply voltage as we scale towards deep submicrometer. Good driving capability under different load conditions and balanced output to avoid glitches is also an important virtue. Since the full-adder cells are duplicated in large numbers, layout regularity, and interconnect complexity are also important.

Several logic styles have been used in the past to design fulladder cells. Each design style has its own merits and demerits. Classical designs of full adders normally use only one logic style for the whole full-adder design. One example of such design is the standard static CMOS full adder [3]. This full adder is based on regular CMOS structure with conventional pull-up and pull-down transistors providing full-swing output and good driving capabilities. The main drawback of static CMOS circuits is the existence of the pMOS block, because of its low mobility compared to the nMOS devices. Therefore, the pMOS devices need to be sized up to attain the desired performance. The input capacitance of a static CMOS gate is large because each input is connected to the gate of at least a pMOS and a nMOS device.

Manuscript received May 12, 2005; revised February 10, 2006 and June 1, 2006. This work was supported in part by the U.S. Department of Energy and by the Governor's Information Technology Initiative.

This is another reason for speed degradation of static CMOS gates.

Another conventional adder is the complementary pass-transistor logic (CPL) [3]. It provides high-speed, full-swing operation and good driving capability due to the output static inverters and the fast differential stage of cross-coupled pMOS transistors. But due to the presence of a lot of internal nodes and static inverters, there is large power dissipation. The layout of a CPL cell is also not as straightforward as a static CMOS cell due to its irregular transistor arrangement.

The dynamic CMOS logic style provides a high speed of operation because the logic is constructed with only high mobility nMOS transistors. Also, due to the absence of the pMOS transistors, the input capacitance is also low, thus enhancing the speed of operation. However, it has several inherent problems such as charge sharing and high clock load. It has higher switching activity and lower noise immunity. It consumes a large portion of the power in driving the clock lines. Moreover, dynamic logic style is more susceptible to leakage. Due to these reasons, we do not include dynamic logic style in our discussions in this paper.

Some other full-adder designs include transmission-function full adder (TFA) [4] and transmission-gate full adder (TGA) [5]. These designs are based on transmission-function theory and transmission gates, respectively. These adders are inherently low power consuming. These logic styles are good for designing XOR or XNOR gates. The main disadvantage of these logic styles is that they lack driving capability. This is attributed to the fact that the inputs are coupled to the outputs. When TGA or TFA are cascaded, their performance degrades significantly.

The remaining adder designs use more than one logic style for their implementation. We call this the hybrid-CMOS logic design style. Examples of adders built with this design style are DB cell [6], NEW14-T adder [7], and hybrid pass logic with static CMOS output drive full adder [8] (we will use HPSC as an abbreviation) and new-HPSC [9] adder. Some of these full adders are shown in Fig. 2. These designs exploit the features of different logic styles to improve upon the performance of the designs using single logic style. All hybrid designs use the best available modules implemented using different logic styles or enhance the available modules in an attempt to build a low power full-adder cell. Generally, the main focus in such attempts is to reduce the numbers of transistors in the adder cell and, consequently, reduce the number of power dissipating nodes. This is achieved by utilizing intrinsically low power consuming logic styles like TFA or TGA or simply pass transistors. In doing so, the designers often trade off other vital requirements such as driving capability, noise immunity, and layout complexity. Most of these adders lack driving capabilities as the inputs are coupled to the outputs. Their performance as a single unit or in small chains is good but when large adders are built by cascading these 1-b full-adder cells, the performance degrades drastically. The performance degradation can be handled by inserting buffers in between stages to enhance the delay characteristics. However, this leads to an extra overhead and the initial advantage of having a lesser number of transistors is lost.

A hybrid-CMOS full adder can be broken down into three modules [6]. Module I comprises of either a XOR or XNOR circuit or both. This module produces intermediate signals that are



Fig. 2. Standard existing full-adder cells.

passed onto Module II and Module III that generate *Sum* and  $C_{\text{out}}$  outputs, respectively. There are several circuits available in [1] and [6] for each module and several studies have been conducted in the past using different combinations to obtain many adders [1], [6], [10].

In this paper, we broadly categorize hybrid-CMOS adder designs in three categories based on their structure. We then present a new improved circuit for the simultaneous generation of the XOR and XNOR outputs to be used in module I. The new circuit is based on CPL style. We also propose a new output unit. This unit is used in module III. Using the new circuits in



Fig. 3. General form of XOR-XOR-based full adder.

module I and III, we build a new hybrid-CMOS full-adder cell. The new adder is optimized for low PDP and is compared with the classical static-CMOS, CPL, TFA, TGA, NEW14T, HPSC, and NEW-HPSC full-adder cells. The proposed full-adder design exhibits low PDP, full-swing operation, and excellent driving capabilities without trading off area and reliability. To evaluate the performance of the new full adder in a real circuit, we embedded it in a 4- and 8-b, 4-operand carry-save array adder with final carry-propagate adder. The new adder displayed better performance as compared to the standard full adders and can be cascaded without the need of buffer insertion.

# II. FULL-ADDER CATEGORIZATION

We categorize hybrid-CMOS full-adder cells [11] in, broadly, three categories depending upon their structure and logical expression of the Sum output. The Sum and Carry ( $C_{out}$ ) outputs of a 1-b full adder generated from the binary inputs A, B, and  $C_{in}$  can be generally expressed as

$$Sum = A \oplus B \oplus C_{in} \tag{1}$$

$$C_{\text{out}} = A \cdot B + C_{\text{in}} \cdot (A \oplus B).$$
<sup>(2)</sup>

These outputs can be expressed in many different logic expressions and, thereby, determine the structure of the circuit. Based upon these different logic expressions, many full-adder cells can be conceived. Moreover, the availability of different modules, as discussed earlier, provides the designer with more choices for a 1-b adder implementation. We classify the different possible structures for full adders into three broad categories. These are as follows.

# A. XOR-XOR-Based Full Adder

In this category, the Sum and Carry outputs are generated by the following expression, where H is  $A \oplus B$  and H' is the complement of H. The general form of this category is shown in Fig. 3

$$Sum = A \oplus B \oplus C_{in} = H \oplus C_{in}$$
(3)

$$C_{\rm out} = A \cdot H' + C_{\rm in} \cdot H. \tag{4}$$

The Sum output is generated by two consecutive two-input XOR gates and the  $C_{out}$  output is the output of a 2-to-1 multiplexer with the select lines coming from the output of the first module. The first module can be either a XOR-XNOR circuit or just a XOR gate. In the first case, the *H* output of the XOR-XNOR circuit is XORed with the carry from the previous stage ( $C_{in}$ ) in



Fig. 4. General form of XNOR-XNOR-based full adder.



Fig. 5. General form of centralized full adder.

module II. The H and H' outputs are used as multiplexer select lines in module III. The *Sum* adders belonging to this category are presented in [12], [13].

#### B. XNOR-XNOR-Based Full Adder

In this category, the Sum and Carry outputs are generated by the following expression. The general form of this category is shown in Fig. 4.

$$\operatorname{Sum} = \overline{(A \oplus B)} \oplus C_{\operatorname{in}} = \overline{H' \oplus C_{\operatorname{in}}}$$
(5)

$$C_{\rm out} = A \cdot H' + C_{\rm in} \cdot H. \tag{6}$$

In this category, module I and module II are XNOR gates and module III is a 2-to-1 multiplexer. If the first module uses a XOR-XNOR circuit, then the H' output is XNORed with the  $C_{in}$ input to produce the Sum output. The static energy recovery full adder (SERF) [14] belongs to this category and uses a XNOR gate for module I and II and a pass transistor multiplexer for module III.

# C. Centralized Full Adder

In this category, the Sum and Carry outputs are generated by the following expression. The general form of this category is shown in Fig. 5

$$Sum = H \oplus C_{in} = H \cdot C'_{in} + H' \cdot C_{in} \tag{7}$$

$$C_{\rm out} = A \cdot H' + C_{\rm in} \cdot H. \tag{8}$$

Module I is a XOR-XNOR circuit producing H and H' signals and module II and III are 2-to-1 multiplexers with H and H' as select lines. The adder in [8] is an example of this category. It utilizes the XOR-XNOR circuit presented in [7] and proposes a



Fig. 6. (a) Circuit in [7]. (b) Circuit in [9]. (c) Proposed circuit.

new circuit for output module III. The simultaneous generation of H and H' signal is critical in these types of adders as they drive the select lines of the multiplexers in the output stage. In another case (i.e., nonsimultaneous H and H'), there may be glitches and unnecessary power dissipation may occur. The final outputs cannot be generated until these intermediate signals are available from module I.

#### III. PROPOSED CIRCUITS FOR MODULE I AND III

Hybrid-CMOS full adders can be divided into three modules [6]. Module I consists of either an XOR or XNOR circuit. Module II and module III generate Sum and  $C_{out}$ . We present two new circuits, one for each module I and module III.

# A. Module I

In recent times, simultaneous generation of XOR and XNOR has been widely used for module I [7], [8]. This feature is highly desirable as nonskewed outputs are generated that are used for driving the select lines of the multiplexer inside the full adder. Fig. 6(a) shows a configuration using only six transistors and is presented in [7]. This circuit has been widely used to build full-adder cells [7], [8]. The circuit has a feedback connection between XOR and XNOR function eliminating the non-full-swing operation. The existence of  $V_{dd}$  and Gnd connections give good driving capability to the circuit and the elimination of direct connections between them avoids the short circuit currents component. However, when there is an input transition that leads

 TABLE I

 OPERATION OF THE PROPOSED XOR-XNOR CIRCUIT

| Α | В | Н      | H'     | Function                        |
|---|---|--------|--------|---------------------------------|
| 0 | 0 | Good 0 | NOP    | P2 turns ON and pulls H' to VDD |
| 0 | 1 | Bad 1  | Good 0 | P1 turns ON and pulls H to VDD  |
| 1 | 0 | NOP    | Good 0 | P1 turns ON and pulls H to VDD  |
| 1 | 1 | Good 0 | Bad 1  | P2 turns ON and pulls H' to VDD |
|   |   |        |        |                                 |

to the input vector AB:XX-11 or AB:XX-00, there is a delay in switching the feedback transistors. This occurs because one of the feedback transistors is switched ON by a weak signal and the other signal is at high impedance state. Consider the case when the input vector AB changes from "01" to "00." Initially, when the input AB is in state "01," logic "0" is passed through the nMOS transistor N1 and logic "1" is passed through the pMOS transistor P2. However, when the next input state "00" arrives, a weak "0" is passed through two pMOS transistors P1 and P2 to XOR output, and XNOR output is at high impedance state since N1 and N2 are OFF. This weak "0" switches ON the feedback transistor  $FB_P$ , present in the feedback loop, so that the XNOR output is pulled up to logic "1," which switches ON the feedback transistor  $FB_N$  to discharge the output node XOR to logic "0." This causes the increase in delay. As the supply voltage is scaled down, this delay tends to increase tremendously. This also causes the short circuit current to rise and causes the short circuit power dissipation to increase and eventually increase the power-delay product. To reduce this problem careful transistor sizing needs to be done to quickly switch the feedback transistors.

An improved version of the circuit in [8] is presented in [9]. Two pull-up pMOS-transistors and two pull-down nMOS-transistors are added to the circuit to restore full-swing operation [see Fig. 6(b)]. The improved circuit performs successfully at low supply voltages but this comes at the expense of increased area and number of transistors. Another disadvantage of the new circuit is that each of the inputs drives four gates instead of two gates doubling the input load. This will cause slow response when this circuit is cascaded.

We propose a novel XOR–XNOR circuit using eight transistors that generates XOR and XNOR outputs simultaneously. This circuit provides a full-swing operation and can operate at low voltages. The proposed XOR–XNOR circuit is based on complementary pass-transistor logic using only one static inverter instead of two static inverters as in the regular CPL style XOR circuit. The proposed circuit is shown in Fig. 6(c). The first half of the circuit utilizes only nMOS pass transistors for the generation of the outputs. The cross-coupled pMOS transistors guarantee full-swing operation for all possible input combinations and reduce short-circuit power dissipation. The circuit is inherently fast due to the high mobility nMOS transistors and the fast differential stage of cross-coupled pMOS transistors.

Table I indicates the functioning of the proposed circuit more clearly. For any input vector, the pMOS transistors are switched ON by a good 0 and, therefore, avoid any static power dissipation. Owing to the lower *Vthn* and high electron mobility of the nMOS transistors, the circuit has a faster response as compared to the previous circuit.

The proposed XOR-XNOR circuit was compared to circuits in Fig. 6(a) and (b). The simulation results at 1.8-V  $V_{\rm DD}$  and

TABLE II SIMULATION RESULTS FOR PROPOSED XOR–XNOR CIRCUIT IN  $0.18 - \mu$  m Technology at 50-MHz Frequency and 1.8-V V<sub>DD</sub>

|                            | Circuit in [7] | Circuit in [9]                                   | Proposed       |
|----------------------------|----------------|--------------------------------------------------|----------------|
| No. of Tr.                 | 6              | 10                                               | 8              |
| Power (µW)                 | 7.524          | 8.431                                            | 8.750          |
| Delay (ns)                 | 0.305          | 0.210                                            | 0.191          |
| PDP (e-15)                 | 2.294          | 1.770                                            | 1.671          |
| Area $(\lambda^2)$         | 4455(1)        | 6384(1.43)                                       | 4550(1.02)     |
| Improvement (PDP)          | 37.5% 5%       |                                                  |                |
| А                          |                |                                                  |                |
| Inputs $00 \rightarrow 01$ | → 11 → 10 → 0  | $0 \rightarrow 10 \rightarrow 11 \rightarrow 11$ | 01 <b>→</b> 00 |
| XOR                        |                |                                                  |                |
| XNOR                       |                |                                                  |                |

Fig. 7. Input test pattern for XOR-XNOR circuits.

TSMC 0.18- $\mu$ m technology are shown in Table II. Fig. 7 shows the input pattern used. We consider all the possible input transitions with an output transition at every input transition. The results indicate that the performance of the proposed circuit is better than the performance of the compared circuit. The proposed circuit is  $1.6 \times$  faster than the circuit in [7] but consumes more power. This is due to the CPL structure that is inherently high power consuming but is expected to be lesser than the regular CPL XOR circuit due to the reduced number of transistors. The circuit in [9] and the proposed circuit, consume almost the same power but our circuit is 10% faster. Owing to the higher speed of our circuit, there is almost 5%-37.5% saving in PDP in this module. The proposed circuit uses only 2% more area when compared to the circuit in [7] but the circuit in [9] uses 43% more area. The increase in area is due to the addition of four more transistors, particularly the two pMOS pull-up transistors. Table II shows the area and the normalized values in parentheses. For simplicity, we will use the normalized value of area for comparison from hereon. We use the proposed circuit in module I of the new hybrid-CMOS adder design.

### B. Module III

The output of module III can be expressed as

$$C_{\text{out}} = A \cdot H' + C_{\text{in}} \cdot H. \tag{9}$$

This expression can be realized with a 2-to-1 multiplexer with H and H' as the select lines. The most common implementation of the previous expression is using transmission gates (TG). Fig. 8(a) shows the circuit for a 2-to-1 multiplexer. The problem with the TG multiplexer is that it cannot provide the required driving capability to drive cascaded adder stages. One solution to this problem is to have an output buffer as shown in Fig. 8(a). This would incur extra delay and an overhead of four transistors.



Fig. 8. Circuits for module III.

TABLE III Simulation Results for the Proposed Module III Circuit in 0.18- $\mu$ m Technology at 50-MHz Frequency and 1.8-V V<sub>DD</sub>

|                    | Circuit in [8]         | Proposed |  |
|--------------------|------------------------|----------|--|
| No. of Tr.         | 10                     | 10       |  |
| Power (µW)         | 1.337                  | 1.437    |  |
| Delay (ns)         | 0.1829                 | 0.1224   |  |
| PDP (e-15)         | 0.245                  | 0.175    |  |
| Area $(\lambda^2)$ | 4100(1.12)             | 3648(1)  |  |
| Improvement        | 40% improvement in PDP |          |  |

Another possibility is to use the complement of the expression, i.e.,

$$\overline{C}_{\text{out}} = \overline{A} \cdot H' + \overline{C}_{\text{in}} \cdot H. \tag{10}$$

In this case, two inverters will be required to invert the A and  $C_{in}$  inputs and one inverter at the output. This will result in unbalanced Sum and  $C_{out}$  output switching times and extra delay.

A circuit based on the static-CMOS logic style is presented in [8]. This circuit overcomes the problems of the TG multiplexer design. It uses ten transistors and is shown in Fig. 8(b). This circuit possesses all the features of static CMOS logic style such as robustness to voltage scaling and good noise margins.

We propose a hybrid design for module III. We use the inherently low power consuming TG logic style and the robust static-CMOS logic style to create a new hybrid-CMOS circuit. The proposed circuit is shown in Fig. 8(c). The new circuit also utilizes ten transistors and possesses the properties of both static-CMOS and TG logic styles. The carry is evaluated using the following logic expression:

$$C_{\text{out}} = \overline{(A \oplus B)\overline{C}_{\text{in}} + \overline{A} \cdot \overline{B}}.$$
 (11)

A transmission gate preceded by a static inverter is used to implement  $(A \oplus B)\overline{C}_{in}$ . H and H' are the complementary gate signals to this TG. This unit propagates the  $\overline{C}_{in}$  signal to the output when H is at logic "1" and H' is at logic "0." Two pMOS pull-up transistors in series with two nMOS pull-down transistors are used to generate  $\overline{A} \cdot \overline{B}$ . Complementary A and B signals are not required. When A and B are at logic "0" they switch ON both the pMOS transistors to generate  $\overline{C}_{out}$  and assign it logic "1." When A and B are at logic "1" they switch ON both the nMOS transistors to generate  $\overline{C}_{out}$  and assign logic "0." At all other times, this section remains OFF. The static inverter at the output produces the desired  $C_{out}$  output. Table III shows the results of proposed circuit when compared to the circuit in [8].



Fig. 9. Circuits for module II.

Due to the additional inverter in the proposed design, it consumes slightly more power as compared to the circuit in [8]. There is redundant switching at the input since the complement of  $C_{in}$  is generated even if it is not propagated to the output. This can be avoided by placing the inverter after the TG but this causes a problem as charge can leak through the closed TG and cause a reversal of voltage level at the output. This tradeoff has to be made but this guarantees excellent signal integrity without any glitches. The proposed circuit is  $1.5 \times$  faster than the compared one and requires slightly lesser area for the layout despite the same number of transistors. Transistor sizes for both the circuits are shown in the figure. There is an overall 40% reduction in the PDP achieved by the proposed circuit. We use the proposed circuit for module III in the proposed hybrid-CMOS full-adder design.

#### IV. PROPOSED HYBRID-CMOS FULL ADDER

In this section, we will present the new full-adder design based on the new circuits proposed in the Sections I –III. Before we present the new adder, we review some of the existing and most frequently used circuits that can be used in the different modules of the full adder. By using these circuits many different hybrid-CMOS full adders can be conceived. We will discuss the advantages and disadvantage of each circuit and cite some examples of the existing adders that employ them. Since we have already discussed different circuits for module I and module III, we will restrict our discussion to circuits for module II here. These circuits that can be used in module II are shown in Fig. 9. These circuits are required to generate the Sum output given inputs H, H', and  $C_{in}$ . These circuits essentially perform the XOR or the XNOR function and can be used in the first module of the adder too.

Fig. 9(a) and (b) [15] uses four transistors each and generate the Sum output using  $C_{in}$  and H or H' inputs. These circuits have been used in many adders for both module I [6], [14] and module II [14]. These circuits, however, have some threshold-loss problems. In circuit (a), for an input vector "00," it produces a bad logic "0" as the pMOS pass transistors pass a poor logic "0" through them. Similarly, for circuit (b) for an input vector "11" the output is at a poor logic "1." The output

signal, when given to the following gate, can cause a functional failure or cause long delay in switching a transistor. This may lead to redundant power consumption and possible glitches.

Fig. 9(c) and (d) are powerless/groundless XOR and XNOR circuits [16], respectively, each requiring four transistors. These circuits use the  $C_{in}$  and H or H' inputs to generate the Sum output. These circuits are used to build 41 ten-transistor adders and are presented in [1]. Again, these circuits also suffer from threshold-loss problems. For example in circuit (c), for the input vector "00" and "01," the circuit produces poor voltage levels. Circuits with this problem have very poor driving capabilities and cause significant glitches.

Fig. 9(e) and (f) [17] require five transistors each and provide full-swing outputs. These circuits use all the three inputs (i.e.,  $C_{in}$ , H, and H') to generate the Sum output signal. They do not suffer from threshold loss and are good candidates for module II. Several authors have previously used this circuit in their adders [6], [10]. We classify this circuit as a hybrid-CMOS design style circuit due to the presence of a TG, pass-transistor, and static pull-down network. All these modules have either ground or power supply connections thereby eliminating direct path between ground and power supply.

Fig. 9(g) has transmission-function implementation of XOR and XNOR functions. This circuit does not have supply rails thereby eliminating short circuit current. Fig. 9(h) is essentially the complement and has an inverter to produce Sum. This provides good driving capability due to the presence of the static inverter. This circuit is one of the best performers among all the circuits mentioned before in terms of signal integrity and average power-delay product [6]. Both the circuits avoid the problem of threshold loss and have been widely used in adder implementation [8]. We employ this circuit for our full-adder design.

#### A. Proposed Full Adder

As mentioned earlier in Section II, the centralized full adders, both XOR and XNOR circuits are present (both in module I) that generate the intermediate signals H and H'. These signals are passed on to module II and III along with the carry from the previous stage and the other inputs A and B to produce the Sum and



Fig. 10. Proposed hybrid-CMOS full adder.

 $C_{\text{out}}$ . For the new adder, we use our two proposed circuits and one existing circuit in the three modules. The proposed adder is shown in Fig. 10.

In module I, the proposed XOR–XNOR circuit produces balanced full-swing outputs. It has high-speed operation due to the cross-coupled pMOS pull-up transistors providing the intermediate signals quickly. Since the other two modules rely heavily on the intermediate signal H and H' to produce the final outputs, the delay response of module I is critical. The proposed XOR–XNOR circuit is  $1.7 \times$  faster than the best available XOR–XNOR circuits. Due to this reason, the new adder is faster than the compared adders.

Module II is a transmission-function implementation of XNOR function to generate the Sum' followed by an inverter to generate Sum. This provides good driving capability to the circuit. Due to the absence of supply rails there are no short circuit currents. The circuit is free from the problem of threshold loss and has the lowest PDP amongst all circuits that are used for module II [6].

Module III employs the proposed hybrid-CMOS output stage with a static inverter at the output. This circuit has a lower PDP as compared to the other existing designs. The static inverter provides good driving capabilities as the inputs are decoupled from the output. The structure of the circuit is very symmetric and, therefore, the layout is regular. Due to the low PDP of module II and module III, the new adder is expected to have low power consumption. In Sections V and VI, the simulation environment and experimental results will be discussed.

#### V. SIMULATION ENVIRONMENT

#### A. Simulation Setup

All the circuits are implemented using MAGIC 7.1 layout editor and extracted using TSMC 0.18- $\mu$ m technology. Simulations are carried out using HSPICE. Parasitic capacitances are also considered in the results. To simulate a real environment, input buffers for all inputs of the test circuit are used. The transistor sizes of these buffers are chosen such that there is sufficient signal distortion as expected in an actual circuit. A minimum output load of fan-out of four inverters (FO4) is used for power and delay measurements, the value of which amounts to 5.6 fF (1.4 fF for each inverter in 0.18- $\mu$ m technology). The



Fig. 11. Simulation test bench.

generic simulation test bench used is shown in Fig. 11 along with the transistor sizes of each buffer.

The circuit performance of the test circuits is evaluated in terms of worst-case delay, power consumption, and power-delay product for a range of supply voltages (0.8–1.8 V) at 50-MHz frequency. The delay is calculated from 50% of voltage level of input to 50% of voltage level of resulting output all the rise and fall output transitions. For the calculation of the power-delay product, worst-case delay is chosen to be the larger delay amongst the two outputs. Different loading conditions are also considered to evaluate the performance of the test circuits (5.6–200 fF).

An input transition may or may not result in change at the output node. Even if there is no switching at the output node, some internal node may be switching. This switching activity results in some power dissipation. For an accurate result, all the possible input combinations are considered for all the test circuits. Fig. 12 shows the input stimulus used for the full-adder circuits. In Sections V-B –V-D, the simulation environment and experimental results will be discussed.

# B. Noise Immunity Test Setup

Digital circuits are inherently low-pass in nature and, thus, can filter out noise pulses with high amplitude provided that noise width is sufficiently narrow. A noise pulse with sufficient width may cause unnecessary switching in a digital circuit leading to malfunction. Noise immunity curves (NIC) [18] are used to measure the noise-tolerance of circuits to input noise pulses. The NIC of a digital gate is a locus of points  $(T_{\text{noise}}, V_{\text{noise}})$ , where  $T_{\text{noise}}$  is the noise pulsewidth and  $V_{\text{noise}}$  corresponds to the noise pulse amplitude, for which the gate just makes a logic error.

Each point on the noise immunity curve indicates that if the amplitude of the noise pulse lies above that point for that particular width then the circuit will erroneously produce an output. All points below the NIC fall in a safe zone. Hence, the higher the NIC of a gate, the less susceptible the gate is to noise.

For quantitative evaluation of noise immunity, a metric called average noise threshold energy (ANTE) [19] derived from the NIC is used. It is a measure that can be employed to compare the noise signature of various gates. This metric is given by ANTE =  $E (V_{noise}^2 T_{noise})$ , where E() denotes the expectation operator. Higher ANTE value for a circuit indicates that the circuit has higher noise immunity. These noise metrics are widely implemented in [20] and [21].



Fig. 12. Input stimulus for a full-adder cell. The first three are inputs A, B, and  $C_{in}$  and the remaining two are outputs. Frequency of the inputs is 50 MHz with a supply voltage of 1.8 V.



Fig. 13. Noise injection circuit [22].

For simulation purposes, we generate a noise pulse using a noise injection circuit as shown in Fig. 13 [22]. The circuit basically is a tunable delay line to provide a noise pulse of the desired width and amplitude so as to produce a glitch in the output. The amplitude and the width of the noise pulse is controlled by changing  $V_{dd,n}$  and  $V_c$ , respectively. The amplitude of the noise pulse is constant with varying width. This noise pulse causes a progressively bigger glitch at the output of the test circuit. The NIC is plotted by finding the width of the noise pulse, for a given noise pulse amplitude, that is sufficient to cause a significant glitch in the output. This glitch must be sufficient to cause malfunction in the circuit following the circuit being tested.

## C. Transistor Sizing

In order to have a fair comparison, all the simulated circuits are prototyped at optimum transistor sizing. The transistor sizes of all the simulated circuits have been included in the figures. In the circuits, the numbers depict the width (W) of the transistors with the minimum feature size as  $2\lambda$ . All the circuits have been sized to achieve best PDP. The performance of the test circuits with minimum transistor sizes was measured followed by transistor sizing as suggested in [6].

# D. Real Environment Test Setup

To evaluate the performance of the full adders in a real circuit, each adder is embedded in a 4- and 8-b 4-operand carry–save array adder (CSA) with final carry–propagate adder. This adder architecture is adder intensive and forms a good test bench to test the performance of the proposed adder. The architecture for the 4-operand n-b CSA is shown in Fig. 14. Since it is impractical to generate all the possible input vectors for this setup, 64 random sequences are used as the input vectors. The simulations are carried for different supply voltages ranging from 1.8 to 0.9 V. This gives a fair indication of the performance of the full adders.

#### VI. SIMULATION RESULTS AND DISCUSSION

By optimizing the transistor sizes of the full adders considered, it is possible to reduce the delay of all adders without significantly increasing the power consumption, and transistor sizes can be set to achieve minimum PDP. All adders were designed with minimum transistor sizes initially and then simulated. To achieve minimum PDP, an iterative process of redesigning and transistor sizing after post-layout simulations was carried out.

Comparison of full adders designed to achieve minimum PDP is discussed below. In particular, three subsections refer to delay, power, and PDP, respectively. In each subsection, effects of varying supply voltage and load are considered.

# A. Delay Comparison

The values of delay obtained for considered values of  $V_{\rm DD}$  (0.8–1.8 V) and Load (5.6–200 fF) for all the full adders are shown in Figs. 15(b) and 16(b), respectively. To make the comparison easier, Table III shows the delay values at 1.8 V  $V_{\rm DD}$ . It is apparent that amongst the existing conventional full adders, the adders without driving capability (TGA and TFA) have the smallest delays. This can also be observed from Table III as TGA and TFA are amongst the faster adders. TFA has slightly lower delay than TGA at higher supply voltages but the trend reverses at lower supply voltages. The static-CMOS full adder and



Fig. 14. 4-operand CSA with final carry-propagate adder.

CPL full adder follow the TGA and TFA adders, CMOS steadily remaining ahead of the CPL adder at each supply voltage. For varying load conditions, TGA and TFA have the low delay at small loads, but the speed degrades significantly at higher loads. Among the existing full adders, CMOS shows the least speed degradation followed by the CPL full adder. This shows that under heavy load conditions, adders with driving capability perform better than those without it (TGA and TFA). Due to these reasons, we will compare the hybrid-CMOS adders to the conventional CMOS adder.

Among the nonconventional or hybrid-CMOS full adders, the proposed hybrid-CMOS full adder shows minimum delay at all supply voltages when compared to the CMOS, HPSC, NEW14T, and NEW-HPSC full adders. At 1.8 V  $V_{\rm DD}$ , the proposed adder is 30%, 55%, 88%, and 29% faster than CMOS, HPSC, NEW14T, and NEW-HPSC full adders, respectively. At lower supply voltages, the proposed full adder is the fastest. The delay of the proposed hybrid-CMOS adder is slightly higher than TGA and TFA but with increasing load, it displays minimum speed degradation. Overall, when compared to all adders, the proposed adder has minimum speed degradation with varying load. HPSC and NEW14T adders perform poorly at low voltages [8] and this can also be shown in the results. They both share the same circuit for XOR-XNOR functions and, therefore, have the same problem of delayed response for certain input vectors transitions that were described earlier in Section III. The final outputs are heavily dependant on the performance of module I as it drives the remaining modules and, hence, suffer a great deal due to its slow response. This shows that the XOR-XNOR cells used in these designs does not scale well with reducing supply voltage. The signal integrity of these circuits is severely degraded below 1.0 V rendering them unusable. The speed degradation for both the full adders is the worst among all adders under varying voltages. NEW-HPSC adder performs well at lower supply voltages, however, the proposed adder outperforms it at all supply voltages being 30%-60% faster. When the output load is increased, the proposed adder,

NEW-HPSC, and CMOS adder have the least speed degradation, the proposed adder having the best performance. TGA and TFA have the most degradation due to lack of driving capability. Overall, the proposed full adder has good speed response at different voltages owing to the proposed module I. When output load is increased, the proposed full adder has the best performance and this is owing to the proposed module III that has good driving capability.

#### B. Power Comparison

The average power dissipation is evaluated under different supply voltages and different load conditions and is summarized in Figs. 15(a) and 16(a), respectively. Table III tabulates the values at 1.8 V  $V_{\rm DD}$ . Among the conventional existing full adders, clearly CPL has the highest power dissipation. The CPL adder dissipates the most power because of its dual-rail structure and high number of internal nodes in its design. Therefore, the CPL topology should not be used if the primary target is low power dissipation.

The adders without driving capabilities (TGA and TFA) always dissipate less power than others and this can be shown in the graph. Between the two, TGA dissipates lesser power than TFA and the trend continues at low voltages. The degradation in performance of the TFA is higher than the TGA as supply voltage is scaled down. Behind, but closely following the two, comes the static-CMOS full adder. Under varying output load conditions, the adder without driving capability (TGA and TFA) show more degradation as compared to the ones with driving capability (CMOS and CPL). This is as expected since the speed degradation of these designs is highest.

The static-CMOS full adder shows the best performance amongst the conventional full adders under varying load. Among the nonconventional or hybrid-CMOS full adders, the proposed full adder and NEW-HPSC adder have the least power dissipation. The proposed full adder consumes 2% lesser power as compared to the NEW-HPSC adder at 1.8 V  $V_{\rm DD}$  but when



Fig. 15. Power, delay, and PDP results for different supply voltages.

the supply voltage is scaled down, NEW-HPSC adder consumes slightly lesser power. The performance of these adders remains invariably good under varying voltages and, therefore, they are both suitable for low voltage operation. The power dissipation of the proposed adder is roughly 25% less than the next lowest power consuming adder (TGA). With increasing output load, the power dissipation of these adders remains the least as compared to all the considered full adders. The remaining two hybrid-CMOS adders (HPSC and NEW14T)



Fig. 16. Power, delay, and PDP results under different load conditions.

show high degradation in performance with decreasing supply voltages. Even though they perform well with varying load, they are unsuitable for operation at low voltages.

#### C. PDP Comparison

The PDP is a quantitative measure of the efficiency of the tradeoff between power dissipation and speed, and is particularly important when low-power operation is needed. The values of PDP is evaluated under different supply voltages and different output loads are summarized in Figs. 15(c) and 16(c), respectively. The PDP curve in Fig. 15(c) has flat minimum for intermediate values of  $V_{\rm DD}$  but the PDP curve for varying

Authorized licensed use limited to: Shahid Beheshti University. Downloaded on November 5, 2009 at 02:38 from IEEE Xplore. Restrictions apply

output load follows the same trend as delay and power dissipation curves for varying load.

Among conventional adders, the adders without driving capability (TGA and TFA) have the lowest PDP. The PDP of TFA is lesser than that of TGA for higher voltages but the trend reverses for lower voltages suggesting that from energy point of view TGA is a better choice. Among adders with driving capability, the CPL adder as expected has the highest PDP amongst the conventional adders. The static-CMOS adder has roughly 60% lower PDP as compared to the CPL adder for all values of supply voltages. Under variable output load conditions, the static-CMOS adder shows the least PDP and the least degradation with increasing load rendering it the best amongst the conventional full adders. Despite having lowest PDPs, TGA and TFA performances degrade drastically with increasing output loads. This shows that although these adders show good performance as standalone units, if cascaded or used in a high fan-out situation, they may not deliver the required performance. This is because the input is not decoupled from the output. In a real circuit, buffers have to be inserted in between two adders to assure good signal strength. This will increase the delay as well as the power consumption of the whole design.

In the case of nonconventional or hybrid-CMOS adders, the proposed hybrid-CMOS full adder displays the best PDP characteristics for both varying supply voltages and output loads. Inverters at the output of module II and module III ensure sufficient drive strength. Moreover, the power dissipation result of the proposed full adder includes the power consumed by the two output inverters, and still is less than other conventional designs. When compared to the NEW-HPSC adder, the proposed adder consumes 30% less energy. The number of transistors in the design is higher than most of the compared adders but it still provides better results. This shows that reduction in the number of transistors is not the only solution to reducing power consumption in a circuit. There is roughly 26% reduction in PDP compared to the TFA that has the lowest among the compared designs. The performance of the new full-adder cell is attributed to the proposed XOR-XNOR circuit and the novel hybrid-CMOS output module. As shown earlier, these circuits show good improvement in PDP and are faster compared to their counterparts. The remaining two hybrid-CMOS adders (HPSC and NEW14T) have poor PDP characteristics. For low supply voltages, the PDP rises drastically making them unsuitable for low-voltage operation.

#### D. Area and Layout Complexity

Table IV tabulates the area occupied by each adder considered at 1.8 V  $V_{\rm DD}$  in 0.18- $\mu$ m technology and sized to give best PDP. The value indicated is the square of  $\lambda$ . The area has been normalized with respect to the adder with the smallest area. Among the conventional full adders, the CPL has the highest area. This is attributed to the highly irregular arrangement of the CPL adder and high number of transistors leading to an irregular layout and increased layout complexity. The static-CMOS full adder has less number of transistors as compared to the CPL adder and occupies much lesser area because of its highly regular structure. But it still has a higher number of transistors as compared to the remaining full adders, therefore, taking more area than the

TABLE IV Simulation Results for the Proposed Full Adder in 0.18- $\mu\,m$  Technology at 50-MHz Frequency and 1.8 V  $V_{\rm DD}$ 

| Adder    | No. of Tr. | Power (nW) | Delay (ps) | PDP (e <sup>-17</sup> ) | Area $(\lambda^2)$ |
|----------|------------|------------|------------|-------------------------|--------------------|
| CMOS     | 28         | 60.27      | 456.8      | 2.7531                  | 1.53               |
| CPL      | 32         | 82.56      | 554.2      | 4.5754                  | 4.34               |
| TGA      | 20         | 48.83      | 326.1      | 1.5923                  | 1.51               |
| TFA      | 16         | 44.99      | 353.7      | 1.5912                  | 1.23               |
| HPSC     | 22         | 42.06      | 664.6      | 2.7198                  | 1.30               |
| NEW14T   | 14         | 49.02      | 548.9      | 2.6907                  | 1.0                |
| NEW-HPSC | 26         | 36.20      | 455.7      | 16.50                   | 1.58               |
| Proposed | 24         | 35.82      | 352.3      | 1.2619                  | 1.30               |

remaining adders. The TGA has 20 transistors and has slightly lesser area than the static-CMOS full adder. Although it has a lesser number of transistors, it occupies almost the same area as the static-CMOS full adder. This is because the TGA adder has higher transistor sizes to achieve minimum PDP. TFA has the least area as it has the least number of transistors amongst all the conventional full adders.

Among the nonconventional or hybrid-CMOS full adders, the NEW14T has the smallest area since it has the lowest number of transistors. The layout of this adder is somewhat symmetric but the feedback transistors need special attention when sizing is done that increases the layout complexity. The same problem is encountered in the HPSC but its layout is more regular due to the new output module proposed that is based on static-CMOS design style. Although it has eight more transistors than the NEW14T adder, it occupies only 30% more area. This is due to the regular structure of the new output stage. The proposed area occupies roughly the same area as the HPSC adder even when it has two more transistors. This is because module I does not need any special consideration while sizing. The layout is very regular owing to the new module III that is based on TG logic style and static-CMOS logic style. The proposed adder still has lesser area requirements when compared to the conventional adders and has similar area requirements when compared to other hybrid-CMOS full adders. The performance improvement in the proposed adder comes at the cost of negligible increase in area. This tradeoff can be made if area is not of the utmost importance. The NEW-HPSC adder takes the most area due to the highest number of transistors, however, the layout is very regular.

# E. Simulation Results for Four-Operand CSA

All the considered full adders are embedded in a four-operand carry-save adder with operand length 4 and 8 b. Power dissipation for all considered adders is measured for each arrangement at 1.8 V  $V_{\text{DD}}$  and is shown in Fig. 17(a).

A general observation made from the graph is that as the number of bits of the operands increase, the power dissipated in the CSA increases proportionally. Among the conventional full adders, the static-CMOS full adder displays the best performance. The adders without driving capability (TGA and TFA) show the poorest performance and, hence, prove that they are not suitable for cascading to form larger adders. For both 4 and 8 b, the NEW-HPSC adder has the best performance closely followed by the proposed adder. However, the energy consumption of the proposed adder is lower than the NEW-HPSC adder. We simulated at different supply voltages (1.8 to 0.9 V) for a realistic comparison. Fig. 17(b) and (c) show the PDP for 4- and 8-b



Fig. 17. (a) Power consumption results in the CSA architecture at 1.8 V  $V_{\rm DD}$ . (b) PDP results for 4-b CSA at different supply voltages. (c) PDP results for 8-b CSA.

CSA architectures, respectively. The proposed full adder consumes the least energy at voltages other than 1.8 V for both configurations. It is closely followed by NEW-HPSC adder which has the best results at 1.8 V.

# F. Noise Immunity Curves

The noise immunity curves and the ANTE results for the adders are shown in Fig. 18. All the hybrid designs show good noise immunity as compared to the standard designs. The HPSC adder has the highest ANTE followed by the NEW-HPSC adder. This shows that more reliable circuits can be designed using hybrid-CMOS logic design style. Among the nonconventional full adders, the proposed full adder has the lowest ANTE but it is still higher than the conventional full adders.



Fig. 18. (a) Noise immunity curves for the full adders. (b) ANTE results.

# VII. CONCLUSION

Hybrid-CMOS design style gives more freedom to the designer to select different modules in a circuit depending upon the application. Using the adder categorization and hybrid-CMOS design style, many full adders can be conceived. As an example, a novel full adder designed using hybrid-CMOS design style is presented in this paper that targets low PDP. The proposed hybrid-CMOS full adder has better performance than most of the standard full-adder cells owing to the novels design modules proposed in this paper. It performs well with supply voltage scaling and under different load conditions. When embedded in a four-operand CSA, it outperforms all the other adders making it suitable for larger adders. The proposed adder has better noise immunity as compared to the standard adder such as static CMOS, making it suitable for deep-submicrometer operation. We recommend the use of hybrid-CMOS design style for the design of high-performance circuits.

# ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers for their constructive critique from which this paper greatly benefited.

#### REFERENCES

- H. T. Bui, Y. Wang, and Y. Jiang, "Design and analysis of low-power 10-transistor full adders using XOR-XNOR gates," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 49, no. 1, pp. 25–30, Jan. 2002.
- [2] V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, "Reducing power in high-performance microprocessors," in *Proc. Conf. Des. Autom.*, 1998, pp. 732–737.
- [3] R. Zimmermann and W. Fichtner, "Low-power logic styles: CMOS versus pass-transistor logic," *IEEE J. Solid-State Circuits*, vol. 32, no. 7, pp. 1079–1090, Jul. 1997.
- [4] N. Zhuang and H. Wu, "A new design of the CMOS full adder," *IEEE J. Solid-State Circuits*, vol. 27, no. 5, pp. 840–844, May 1992.
- [5] N. Weste and K. Eshraghian, "Principles of CMOS VLSI design," in A System Perspective. Reading, MA: Addison-Wesley, 1993.
- [6] A. M. Shams, T. K. Darwish, and M. A. Bayoumi, "Performance analysis of low-power 1-bit CMOS full adder cells," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 10, no. 1, pp. 20–29, Feb. 2002.
- [7] D. Radhakrishnan, "Low-voltage low-power CMOS full adder," *IEE Proc. Circuits Devices Syst.*, vol. 148, no. 1, pp. 19–24, Feb. 2001.
- [8] M. Zhang, J. Gu, and C. H. Chang, "A novel hybrid pass logic with static CMOS output drive full-adder cell," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2003, pp. 317–320.
- [9] C.-H. Chang, J. Gu, and M. Zhang, "A review of 0.18-µm full adder performances for tree structured arithmetic circuits," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 13, no. 6, pp. 686–695, Jun. 2005.
- [10] M. Sayed and W. Badawy, "Performance analysis of single-bit full adder cells using 0.18, 0.25, and 0.35 μm CMOS technologies," in *Proc. Int. Symp. Circuits Syst.*, 2002, pp. III-559–III-562.
- [11] S. Goel, S. Gollamudi, A. Kumar, and M. Bayoumi, "On the design of low-energy hybrid CMOS 1-bit full-adder cells," in *Proc. Midwest Symp. Circuits Syst.*, 2004, pp. II-209–212.
- [12] H. A. Mahmoud and M. Bayoumi, "A 10-transistor low-power highspeed full adder cell," in *Proc. Int. Symp. Circuits Syst.*, 1999, pp. I-43–46.
- [13] A. Fayed and M. A. Bayoumi, "A low-power 10 transistor full adder cell for embedded architectures," in *Proc. IEEE Int. Symp. Circuits Syst.*, 2001, pp. IV-226–229.
- [14] R. Shalem, L. K. John, and E. John, "A novel low power energy recovery full adder cell," in *Proc. Great Lake Symp. VLSI*, 1999, pp. 380–380.
- [15] J. Wang, S. Fang, and W. Feng, "New efficient designs for XOR and XNOR functions on the transistor level," *IEEE J. Solid-State Circuits*, vol. 29, no. 7, pp. 780–786, Jul. 1994.
- [16] H. T. Bui, A. K. Al-Sheraidah, and Y. Wang, "New 4-transistor XOR and XNOR designs," in *Proc. 2nd IEEE Asia Pacific Conf. ASICs*, 2000, pp. 25–28.
- [17] H. Lee and G. E. Sobelman, "New low-voltage circuits for XOR and XNOR," *IEEE Proc. Southeastcon.*, pp. 225–229, 1997.
- [18] G. A. Katopis, "Delta-I noise specification for a high-performance computing machine," *Proc. IEEE*, vol. 73, no. 9, pp. 1405–1415, Sep. 1985.
- [19] Wang and N. R. Shanbagh, "Noise-tolerant dynamic circuit design," in *Proc. IEEE Int. Symp. Circuits Syst.*, 1999, pp. 549–552.
- [20] S. Goel, M. Elgamel, M. A. Bayoumi, and Y. Hanafy, "Design methodologies for high-performance noise-tolerant XOR–XNOR circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 53, no. 4, pp. 867–878, Apr. 2006.
- [21] S. Goel, T. K. Darwish, and M. A. Bayoumi, "A novel technique for noise-tolerance in dynamic circuits," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI*, 2003, pp. 203–206.
- [22] G. Balamurugan and N. R. Shanbhag, "The twin-transistor noise-tolerant dynamic circuit technique," *IEEE J. Solid-State Circuits*, vol. 36, no. 2, pp. 273–280, Feb. 2001.



**Sumeer Goel** (S'01) obtained a B.Tech. degree in electronics and communications engineering from Punjab Technical University, Jalandar, India, in 2001, and the M.S. degree in computer engineering from The Center for Advanced Computer Studies (CACS), University of Louisiana at Lafayette, Lafayette, in 2003, where he is currently pursuing the Ph.D. degree.

His research interests include high-performance digital circuit design, algorithms, and VLSI architectures for video processing with a focus on motion estimation algorithms. At the circuit level, he has also devised methods for noise reduction, which appeared in the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS (TCAS-I) in April 2006. He is currently the President of the IEEE Computer Society student chapter at the University of Louisiana at Lafayette.

Mr. Goel is the recipient of two Best Student Paper Awards and was also nominated in the Ten Best Papers at the IEEE-MWSCAS in 2005.



Ashok Kumar (S'98–M'99–SM'06) received the B.Tech. degree in computer science and engineering from the Institute of Technology, BHU Varanasi, India, in 1990, and the M.S. and Ph.D. degrees from the Center for Advanced Computer Studies (CACS) at the University of Louisiana at Lafayette, Lafayette, in 1997 and 1999, respectively.

Currently, he is a member of the Research Faculty at CACS. From 1990 to 1994, he worked as an Engineer and a Senior Engineer with Uptron India Limited, Lucknow, India. After getting his Ph.D. degree,

he worked with Cadence Design Systems, Lowell/Chemsford, MA, for nearly four years as a Senior Member of Technical Staff for Research and Development operations. His research interests include power and noise efficient circuit design, development of algorithms for computer-aided design (CAD) tools, digital signal processing, energy-efficient wireless sensor networks, and their deployment. He has served on the program committees of eight conferences as well as on the proposal-evaluation committees of the National Science Foundation.



Magdy A. Bayoumi (S'80–M'84–SM'87–F'99) received the B.Sc. and M.Sc. degrees in electrical engineering from Cairo University, Cairo, Egypt, the M.Sc. degree in computer engineering from Washington University, St. Louis, and the Ph.D. degree in electrical engineering from the University of Windsor, Windsor, ON, Canada.

He has been a faculty member of the University of Louisiana at Lafayette since 1985, where he is the Director of the Center for Advanced Computer Studies (CACS), Department Head of the Computer

Science Department, the Edmiston Professor of Computer Engineering, and the Lamson Professor of Computer Science at the Center for Advanced Computer Studies. His research interests include VLSI design methods and architectures, low power circuits and systems, digital signal processing architectures, parallel algorithm design, computer arithmetic, image and video signal processing, neural networks, and wideband network architectures. He was the Vice President for the technical activities of the IEEE Circuits and Systems Society. He is the chairman of the Technical Committee on Circuits and Systems for Communication and the TC on Signal Processing Design and Implementation. He was a founding member and the chairman of the VLSI Systems and Applications Technical Committee. He is a member of the Neural Network and the Multimedia Technology Technical Committees. He has been on the Technical Program Committee for ISCAS for several years, and he was the Publication Chair for ISCAS'99. He was the General Chairman of the 1994 MWSCAS and he is a member of the Steering Committee of this symposium. He was an Associate Editor of the IEEE Circuits and Devices Magazine, the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, the IEEE TRANSACTIONS ON NEURAL NETWORKS, and the Transaction on Circuits and Systems II. He was the co-chairman of the Workshop on Computer Architecture for Machine Perception in 1993 and is a member of the Steering Committee of this workshop. He was the General Chairman for the 8th Great Lake Symposium on VLSI in 1998. He was also the General Chairman of the 2000 Workshop on Signal Processing Design and Implementation. Dr. Bayoumi is an Associate Editor of Integration, the VLSI Journal, and the Journal of VLSI Signal Processing Systems. He is a regional Editor for the VLSI Design Journal and on the Advisory Board of the Journal on Microelectronics Systems Integration. He has edited and co-edited three books in the area of VLSI signal processing. He served on the Distinguished Visitors Program for the IEEE Computer Society from 1991 to 1994 and he is on the Distinguished Lecture program of the Circuits and Systems society. He is the faculty advisor for the IEEE Computer student chapter at UL Lafayette.

Dr. Goel is the recipient of the UL Lafayette 1988 Researcher of the Year Award and the 1993 Distinguished Professor Award at UL Lafayette.