# **HIGH-LEVEL CIRCUIT MODELING FOR POWER ESTIMATION**

Christian V. Schimpfle, Sven Simon and Josef A. Nossek

Institute of Network Theory & Signal Processing Munich University of Technology Arcisstr. 21, 80333 Munich, Germany e-mail: Christian.Schimpfle@ei.tum.de

# ABSTRACT

In this paper, a method for accurate modeling of timing behavior and power consumption of digital circuits is presented. The model is based on the parameter extraction of the basic cells by circuit level simulations. This information is then embedded in a high-level description (VHDL) for every basic cell. The circuit simulations for each cell are executed only once in order to create a cell library including the VHDL models. Due to the complex timing model in the high-level descriptions, the switching activity in a larger circuit can be determined quite accurately. Besides timing, the power consumption for every possible input transition is also included in the cell descriptions. Thus, quite accurate power estimation of large circuits is possible by a simple event driven simulation with a conventional VHDL-simulator. The computation time of this method is about four orders of magnitude shorter than power estimation by SPICE. The presented model in combination with the VHDL-simulator is applied for the comparing different multiplier architectures in terms of power consumption.

# **1 INTRODUCTION**

In recent years, low power digital design has been of large interest for many VLSI engineers. Reducing the power consumption is still one of the most important goals in the design process as the mobility and complexity of digital systems is rapidly increasing. Besides the development of suitable design methods, the accurate estimation of the power consumption of a system at all levels of the design process is necessary. Probabilistic approaches for estimating the switching activity have been considered in [3] and more recently in [5]. Also fast simulation based methods have been developed, e.g. in [9].

However, the accurate estimation of the switching activities inside a circuit is only one part of a quantitative power estimation method. Also the power consumption per switch at the corresponding nodes has to be determined properly. In [8] partitions of the circuit are simulated at circuit level, then the total power is determined by summation of the switching activities weighted with the predetermined power consumption of the partitions. For this approach, pseudo random stimulus vectors are used for the circuit level simulations which allow to predict only an average power consumption per switch. Another simulation based approach is presented in [4] where the power consumption of each cell is determined by circuit simulation for each input/output transition. The total power is then estimated by summation of the power consumptions of the corresponding input/output transitions after logic simulation based on a simple unit delay model.

In this paper, a method for power estimation in VLSI circuits is presented that is based on accurate timing and power modeling of the basic cells. This modeling is called device level based cell modeling (DCM) [7]. All possible input transitions of a basic cell, e.g. a NAND-gate or an adder, are considered separately and are characterized by a specific delay and a specific power consumption. All characteristic parameters are extracted from simulation at circuit level, e.g. with SPICE and have direct influence on the basic cell model. This model is then formulated in the hardware description language VHDL. A VHDL description for each basic cell is stored in a library. For power estimation of the whole circuit, a conventional VHDL-simulator can be used.

The paper is organized as follows: In Section 2 the delay model is presented. It is also shown, how glitching activities can be considered. Section 3 describes the modeling of the basic cell power consumption. Section 4 gives an introduction in the VHDL formulation of the model. The final high-level power estimation method is presented in Section 5. Section 6 shows a comparison between different multiplier architectures in terms of power consumption and experimental results are presented. A conclusion is given in Section 7.

### 2 DELAY MODEL

The delay model is essential for the accuracy of switching activity estimation in a digital circuit. A simple zero delay model ignores effects like glitching completely, which may cause considerable underestimation of the switching activity. In combinatorial circuits glitches can cause the major part of power consumption and cannot be neglected.

In general, every different transition at the inputs of a cell affects the output after a different delay time. For example, the shortest delay of an AND-gate in  $0.25 \mu m$ technology loaded with an adder is 0.4ns for the input transition  $11 \longrightarrow 00$  and the longest delay is 0.75ns for the transition  $00 \longrightarrow 11$ . In order to have a model that reflects the timing behavior of a basic cell more realistically, every possible input transition of the cell has to be considered separately. For each transition i a specific delay  $t_{s,i}$  can be extracted from circuit simulation, e.g. with SPICE. If n is the number of inputs,  $2^{2n} - 2^n$  input transitions are possible. For cells with a large number of inputs, this may result in a very complex model. However, for some transitions the delay time differs only by a very small amount so that these transitions can be collected in a group of transitions with one representative delay. This strategy reduces the complexity considerably.

The delay of a circuit depends also on the load capacitance. For generality of the model, different load situations have to be taken into account. Without a loss in generality it can be assumed, that the load capacitances of gates in CMOS circuits differ only by discrete values, e.g. by multiples of minimum sized inverter input capacitances or gate capacitances. In Fig. 1 the delay of a CMOS adder loaded with another k adders  $(k = 0, \dots, 4)$  of the same type is plotted for each of the  $2^{2 \cdot 3} - 2^3 = 56$  (n = 3) possible input transitions. The dots are connected by lines to make the differences more visible. Delays equal to zero occur when the input transition causes no output transition. From Fig. 1 it is clear that equi-distant loads cause equi-distant delays, this means that the contribution of the load capacitance to the gate delay is additive. This can also be derived from the RC-delay model of a CMOS gate. Let  $C_{d/s}$  be the drain or source capacitances of the transistors,  $\hat{R}$  the channel resistances and  $C_L$  the load (for simplicity all  $C_{d/s}$  and R are assumed to be equal). For a two-input CMOS NAND-gate the *RC*-delay  $\tau$  is then given by:

$$\tau = 8RC_{d/s} + 2RC_L$$

The delay for transition i now can be formulated as follows:

$$t_{d,i} = t_{0,i} + K \cdot t_{incr} \quad , \tag{1}$$

where  $t_{0,i}$  is the delay without load, K is the number of load gates, and  $t_{incr}$  is the time increment per gate. This model is used in the VHDL description of the basic cells. Besides the delays of all transitions, the glitching behavior of the cells is also included in the model. Due to different signal path lengths inside a cell, glitches can be "generated" by the cell itself even when all input signals arrive at the same time. The length of the glitches depends on the output load and can be calculated with (1). Like for the delay times, the glitching behavior of all basic cells is derived from SPICE simulations and is then included in the VHDL descriptions.



Figure 1: Delays for all possible input transitions of an adder with different loads. The values are derived from SPICE simulations of extracted layouts.

### **3 POWER MODEL**

In the previous section, the data dependency of the delay has been discussed. The power consumption of a digital CMOS circuit depends on the input data as well. In previous works, e.g. [8], it is assumed, that every transition causes the same power dissipation in a cell. For each cell only an average value of the power dissipation is determined. This will lead to usable results only if all data is equally distributed. The power model presented here considers the power consumption of all transitions separately. All necessary parameters are extracted from circuit level simulations of the basic cells. Once the parameters have been extracted, they are included in the model descriptions of the cells besides the parameters of the delay model. Again, like for the delay model, all possible transitions at the cell inputs are considered in the circuit simulation. Thus, for every transition a specific power consumption value can be determined.

The power consumption  $P_i$  for a transition *i* of a cell can be split up into two parts: the power  $P_{0,i} = fC_{0,i}V_{dd}^2$ consumed for charging the internal nodes, where  $C_{0,i}$ is the internal capacitance that has to be charged for transition *i*, and the power  $P_{L,i} = fC_{L,i}V_{dd}^2$  consumed for charging the load capacitance  $C_{L,i}$  at transition *i*, where  $C_{0,i}$  and  $C_{L,i}$  can also be zero when no internal or load capacitances are charged at transition *i*:

$$P_i = P_{0,i} + f C_{L,i} V_{dd}^2 \quad , \tag{2}$$

where  $V_{dd}$  is the supply voltage and f is the clock frequency. Like for the delay, the contribution to the power consumption caused by the load is additive. The load capacitance in a CMOS circuit can be assumed to have only discrete values (multiples of minimum sized gate capacitances). Then the (2) can be formulated as follows:

$$P_i = P_{0,i} + K \cdot P_{incr} \quad , \tag{3}$$

where  $P_{incr}$  is the power increment per gate and K is the number of gates to be charged. This power model is included in the VHDL description of the basic cells.

Fig. 2 shows the power consumption for all possible input transitions of a CMOS adder loaded with another k adders,  $k = 0, \ldots, 4$ . The dots for the power values



Figure 2: Power consumption for all possible input transitions of an adder with different loads. The values are derived from SPICE simulations of extracted layouts.

at a transition are connected by lines for each of the different load cases. Input transitions where the power consumption is equal for all load cases do not lead to a transition at the output. Therefore they cause only internal power consumption which is independent from the load.

### 4 MODEL DESCRIPTION IN VHDL

The models derived in Sections 2 and 3 can now be formulated in VHDL. The resulting cell library contains each basic cell in form of a VHDL-file. For power estimation of a circuit built up from these basic cells, the VHDL-descriptions from this library are used. The decisive advantage of this method is, that any conventional VHDL-simulator can be used for power estimation. Only the special VHDL models guarantee for high accuracy. With this method, parameters extracted at the lowest architectural level by circuit simulation are used for high-level power estimation. The accuracy is related to the circuit simulator used for parameter extraction.

In each cell description the different transitions are distinguished by *if* - *else if* constructs. For every condition, the corresponding delay is calculated according to (1) and then included in a "transport - after" instruction. A special construct is used to avoid the propagation of glitches that are shorter than certain threshold. Thus, the effect that very short glitches vanish after a chain of a certain number of gates is taken into account. The power value for each transition is calculated according to (3) and added to the total power consumption of the cell. The general form of an architecture block in the VHDL-description of a basic cell is given in the following:

ARCHITECTURE cell\_arc of cell ISBEGINPROCESSBEGINIF transition(i = j) THEN $Y \ll$  TRANSPORT val AFTER ( $t_{0,i}+t_{incr} \cdot K$ ); $P_i := P_{0,i} + P_{incr} \cdot K$ ;ELSIF transition(i = j + 1) THEN

END IF;

#### $P_{cell} := P_{cell} + P_i;$ END PROCESS; END cell\_arc;

Y is the output of the cell and *val* is the output value after transition *i*. In order to determine the load at the output, every cell description includes an additional output constant that is a measure for the input capacitance of the cell (the load that the cell represents for the previous cell) and an additional vector valued input variable. Into this vector valued input variable the input load (i.e. the corresponding number of gates) of the cells in the fanout is written, one in each vector element. The value of K is then calculated as the sum of all vector elements.

## 5 HIGH-LEVEL POWER ESTIMATION

After the cell library is constructed with the model descriptions of all necessary basic cells in VHDL, the first phase of the DCM (device level based cell modeling) power estimation is completed. This phase has to be executed only once. The final power estimation of circuits built up from the basic cells in the library is carried out in a second phase. The circuits are also described in VHDL. The basic step is an event driven VHDL-simulation. The total power consumption using the DCM method is then calculated by:

$$P_{DCM} = \frac{1}{N} \sum_{j=1}^{M} \sum_{i=1}^{S_j} P_{j,i} \quad , \tag{4}$$

where N is the number of clock periods, M is the number of cells in the circuit,  $S_j$  is the number of transitions at the input of cell j during simulation, and  $P_{j,i}$  is the power consumption of cell j for transition i. Fig. 3 gives an overview of the proposed high-level power estimation method.



Figure 3: The DCM power estimation method.

# 6 APPLICATIONS AND RESULTS

The new DCM power estimation method allows a fast comparison between different circuit realizations

in terms of power consumption. For large circuits simulations with SPICE are in most cases too slow for comparing numerous design alternatives in acceptable time. Three multiplier architectures have been chosen for comparison with the new method: A Braun multiplier array, a Wallace tree and a Booth multiplier, each for  $16 \times 16$ -multiplications. Each multiplier architecture has been realized based on three different full adder types: a symmetric CMOS adder (CMOS), shown in [6], a transmission-gate-logic adder (TG), shown in [2], and a 14 transistor low power adder cell (T14) presented in [1]. The VHDL-description library for the basic cells includes the three full adder types, NAND-, NOR-, XOR-gates, a Booth-encoder, multiplexers, inverters and transmission gates. Table 1 shows the power consumptions and worstcase delays of the different full adders, all with the same output load. A  $0.25\mu m$  process has been used for all examples. Table 2 shows the performance of the DCM

| Adder | $P(\mu W)$ | Delay (ns) |  |
|-------|------------|------------|--|
| CMOS  | 3.39       | 0.89       |  |
| TG    | 5.23       | 0.84       |  |
| T14   | 4.07       | 0.78       |  |

Table 1: Power consumption and delay of the considered full adders determined with SPICE.

method compared to SPICE for  $5 \times 5$  bit Braun multipliers, realized with the three different adder types (all computation times in this section are measured on a SUN Ultra Sparc 10). For larger circuits the speed advantage becomes even more significant, as the computational effort for SPICE grows quadratically with the number of nodes. The results for the different multiplier architec-

| $5 \times 5$ Multipl. | P <sub>SPICE</sub><br>(mW) | P <sub>DCM</sub><br>(mW) | $t_{SPICE}$ (sec) | $t_{DCM}$ (sec) |
|-----------------------|----------------------------|--------------------------|-------------------|-----------------|
| 5x5 (CMOS)            | 0.081                      | 0.089                    | 38398             | 5               |
| 5x5 (TG)              | 0.144                      | 0.150                    | 25354             | 5               |
| 5x5 (T14)             | 0.141                      | 0.145                    | 16855             | 5               |

Table 2: Comparison between high-level power estimation and SPICE.

tures are shown in Table 3. Using the symmetric CMOS adder leads to the lowest power consumption for all multipliers. It is interesting that the "T14" adder gives the worst results for the Booth multiplier and Wallace tree, although the single adder cell consumes less power than the transmission gate adder (see Table 1). The reason for this effect is that the "T14" adder causes more glitching in these multipliers. This example shows that using basic cells with lower power consumption does not necessarily lead to the best solution for a complete circuit. It also depends on the application.

| $16 \times 16$ Multiplier | P    | CPU time |
|---------------------------|------|----------|
|                           | (mW) | (sec)    |
| Booth(CMOS)               | 4.12 | 46       |
| Booth(TG)                 | 4.84 | 50       |
| Booth(T14)                | 4.91 | 57       |
| Braun(CMOS)               | 1.99 | 94       |
| Braun(TG)                 | 3.94 | 114      |
| Braun(T14)                | 3.37 | 99       |
| Wallace(CMOS)             | 1.47 | 65       |
| Wallace(TG)               | 2.64 | 71       |
| Wallace(T14)              | 3.45 | 78       |

Table 3: Power consumption of multipliers based on different adders.

### 7 CONCLUSION

In this paper a high-level modeling method for fast power estimation in digital circuits has been presented. The method is based on the VHDL description of basic cells that accurately models the cells power and timing behavior. Examples show the good accuracy within 9% of SPICE. The method is well suited for comparing different circuit implementations in terms of power consumption. It allows fast conclusions in the design optimization process.

#### References

- E. Abu-Shama and M. A. Bayoumi. A New Cell For Low Power Adders. Proc. of the Int, Midwest Symposium for Circuits and Systems, 1995.
- [2] M. S. Elrabaa, I. S. Abu-Kather, and M. I. Elmasry. Advanced Low-Power Digital Circuit Techniques. *Kluwer Academic Publishers*, 1997.
- [3] A. Ghosh, S. Devadas, K. Keutzer, and J. White. Estimation of Average Switching Activity in Combinational and Sequential Circuits. *Proceedings of the* 29<sup>th</sup> Design Automation Conference, pages 253–259, June 1992.
- [4] L. E. Lucke, J. Lee, and B. Vinnakota. Power Estimation Using Input/Output Transition Analysis (IOTA). Proceedings Int. Symposium on Circuits and Systems, 6:49–52, June 1998.
- [5] R. Marculescu, D. Marculescu, and M. Pedram. Probabilistic Modeling of Dependencies During Switching Activity Analysis. *IEEE Trans. on Computer-Aided Design*, 17(2):73–83, February 1998.
- [6] T. G. Noll. Carry-Save Architectures for High-Speed Digital Signal Processing. *Journal of VLSI Signal Processing*, 3:121–140, 1991.
- [7] C.V. Schimpfle, S. Simon, and J. A. Nossek. Device Level Based Cell Modeling for Fast Power Estimation. Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS'99, Orlando, May 1999.
- [8] A. E. Schlegel and T. G. Noll. Switching Activity Optimization in Digital CMOS Circuits. Proceedings of the Int. Symposium on Low Power Electronics and Design, 1996.
- [9] P. H. Schneider. PAPSAS: A Fast Switching Activity Simulator. *Proceedings of PATMOS'95 Workshop*, pages 350–360, October 1995.