3. Choosing the best stage effort

Initial circuit Timing overview Best stage effort Library mapping Gate retiming Input buffering Better accuracy Prior art Summary Conclusions

Calculating the best stage effort

According to the theory of logical effort, the fastest circuit occurs when each stage bears an equal effort. That is, the values of gh in the expression d = τ × (p+gh) all have the same value. The best stage effort is the one that minimises the delay, and is used to determine the best number of stages in a logic network.

We label the best stage effort ρ and it occurs when the parasitic delay of an inverter, pinv equals ρ × (ln(ρ) - 1), or
pinv+ρ × (1 - ln(ρ)) = 0

If the inverter parasitic delay is 0, i.e. an ideal inverter, then the best stage effort occurs when ρ = e or 2.7. If the inverter parasitic delay is 1 (as it is in the text on logical effort referenced by the patent), then the best stage effort occurs when ρ = 3.6. For a DSM technology like 0.13um, the inverter parasitic delay is much higher. For the vsclib characterised in 0.13um the inverter parasitic delay is 3.6, which gives a value for best stage effort ρ = 5.4.

Using the best stage effort to calculate fixed gate timing

In the method of the patent, an initial fixed delay is given to each function. This delay equals that of the best stage effort:
d = τ × (p + ρ)
τ is the technology time constant, 9.7ps for the 0.13um vsclib;
p is the parasitic delay of each function. This is obtained by averaging the values for each cell implementing the function (in practice each of the different drive strengths);
ρ is the best stage effort, which should be derived from a library analysis, but in the method of the patent is simply set to a fixed value of 3.6. For the 0.13um vsclib, the value of ρ = 5.4. We will see whether using the wrong value of ρ makes any difference.

The parasitic delay, logical effort and the two fixed delays are shown in the table on the right. The fixed delay used in the method described in the patent is in the column with ρ=3.6.

Consideration of the errors applying logical effort to non-inverting gates

An advantage of the logical effort theory is the simplification of timing information so that each function has one set of timing numbers. If the standard cell library is sufficiently rich, then the predicted timing will be met by choosing cells which are close to the desired drive strength.

This really only works for single stage inverting cells. For 2-stage non-inverting cells, the approximation is poor once the gain h, or COUT /CIN exceeds a value of about 4. The graph on the right shows a good clustering of parasitic delay and logical effort around a mean for the inverting NAND and NOR gates and inverters, but a worse clustering for the non-inverting AND and OR gates and buffers. For these gates, the weak drive strengths have less delay when lightly loaded, shown by a smaller value of parasitic delay, p. But when heavily loaded, the weak drive strengths will have a larger delay, as shown by the larger value of logic effort, g.

The nd2ab, or2v0 and or2v4 are all 2-OR gates implemented in different ways. As a result, their values of parasitic delay and logical effort are different. The patent does not teach how to handle this situation. The starting netlist of the 4-bit adder does not have any 2-OR gates, so the problem would only arise if a 2-NOR gate needs to be inverted.

There are also 3 different types of 2-XOR gate implementation. Here average values are taken from all the variants which leads to more substantial errors between the estimated and actual values. For example, there is a 15% difference between the estimated and actual parasitic delay for the worst case cell, the xor2v1x05. For the 2-XNOR gate, where the different drive strengths all have a similar implementation, the maximum difference between the estimated and actual parasitic delays is 10%.

 Logical effort characteristics of 4-bit adder cells 
Function Pin Parasitic
in ps
in ps
an2 a  9.2 0.8 123 141
b  8.7 0.8 119 136
aoi21 a  6.5 2.0  98 115
b  6.2 1.9  94 112
c  4.6 1.6  79  97
bf1 a  7.8 0.6 110 128
cgi2 a  7.1 3.4 103 120
b  6.7 3.6 100 117
c  5.6 2.0  89 107
iv1 a  3.6 1.0  70  87
nd2 a  4.3 1.3  76  94
b  4.0 1.3  74  91
nd2ab a  8.6 1.1 118 135
b  8.1 1.1 114 131
nr2 a  5.2 1.7  85 103
b  4.3 1.7  77  94
oai21 a  6.3 1.9  96 114
b  5.4 1.9  87 104
c  4.6 1.3  79  96
or2v0 a 10.3 1.0 135 152
b  9.3 1.0 124 142
or2v4 a 11.1 0.7 142 160
b 10.4 0.7 135 153
xnr2 a  9.5 1.7 126 144
b  8.3 2.7 115 133
xor2 a  9.3 1.7 125 142
b  7.6 2.1 109 126
values of p and g