| UP PREV NEXT |
| Initial circuit | Timing overview | Best stage effort | Library mapping | Gate retiming | Input buffering | Better accuracy | Prior art | Summary | Conclusions | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prior art from Logical Effort bookThe book Logical Effort, Designing Fast CMOS Circuits by Sutherland, Sproull and Harris published in 1999 splits a gate's delay into two parts: a fixed delay p, called parasitic or Prop delay; and a load dependent delay f called the effort delay. These delays are made independent of the technology by finding the technology time constant τ. Then the delay d of a gate is given byd = τ ( p+f ) For a path with more than one gate, the most efficient implementation occurs when the effort delay f is the same for each gate. The critical path is the one where this value of f is the smallest. Actually this is only true when there is no branching. When there is branching the stage effort should obey the relationship
g1h1 =
b2g2h2 ;
g2h2 =
b3g3h3 ; …
gi-1hi-1 =
bigihi ; from which To find the critical path then,
All the gates on the critical path then take this value of stage effort f. Other paths which also include these gates have their effort delay DF reduced by the value of f. 4-bit adder critical paths exampleThe table below shows the calculation for the critical paths. The path from a(1) to s(3) has the highest total parasitic delay P, the smallest effort f per stage and is the most critical. An effort f=3.1 is applied to all four instances along the path, which are highlighted.
These four instances now have their stage efforts fixed. The next path from b(1) to s(3) has only one instance, b1n whose stage effort is not fixed. The effort delay DF has the three known stage efforts subtracted to give a value of 3.4 which is applied to the instance b1n. The third path from b(1) to s(2) is the next most critical. The stage effort of instance b1n is fixed and subtracted to give the effort delay DF of 11.2. This is divided equally over the three highlighted instances c1, c1n and s2 as a stage effort of 3.7. The algorithm continues until the least critical path from a(3) to s(4) where a stage effort of 23.5 is applied to instance nd3. The stage efforts of all the cells is shown in the top schematic on the right Fig 8a. The critical path is the one with the lowest stage effort. Converting stage efforts to delayThe stage effort is f = gh where g is the logical effort and h is the electrical effort. The logical effort comes from the library analysis (eg see Figure 5.11 in the Logical Effort book), and the electrical effort h = COUT /CIN is used to calculate the input pin capacitance of each gate.The cell with the next highest pin capacitance is chosen instead of the nearest cell. This leads to an overbuffered circuit, so to compensate the load capacitance is reduced by 20% to 28fF. This isn't very nice, as this kind of arbitrary allocation of margin can work for one circuit but not for another, and generally needs to be calibrated. But here we are just looking for a mapping that works as a demonstration of prior art, rather than the best mapping which is an exercise in its own right. Take instance s4 as an example. The output load is 28fF and the input pin capacitances are calculated from the table below. The library cell with the next largest pin capacitances is then chosen, in this case an oai21v0x3. The CIN values are the ones adjusted for variations in logical effort between the cells as described in the section "Library analysis to derive the typical load value CS" on the Mapping to library cells using the patent algorithm page.
Working backwards from the outputs, each cell's load and ideal input capacitance can be found and used to select the right drive strength. The only exception, as before is to limit the gain or electrical effort h on non-inverting gates to 5. The result is shown in the second schematic on the right, Fig 8b. There are four differences to the schematic produced by the patent algorithm (Fig 5c).
The prior art gives a better netlist. This might just be the result for this circuit, but in principle this should be the case because the cell selection is done directly using the input pin capacitance instead of indirectly through area estimates. Buffered adder exampleThe stage efforts for the 4-bit adder with input buffers are shown in the third schematic on the right Fig 8c. These are used to derive input pin capacitances, then the nearest library cell and then the delay using the cell timing (Fig 8d). The differences between the three algorithms are shown in the table below.
Similarities and differences to the patent algorithmThe theory of logical effort separates a path delay D into parasitic delay P and effort delay DF (see for example Table 1.4 from the Logical Effort book). The parasitic delay is a function of each gate type and is fixed. The effort delay is a function of the ratio COUT /CIN for each gate and is a minimum when the effort f=gh is the same for each gate. If the path has N stages, then f=DF /N and DF=D-P.The patent algorithm also allocates the effort equally to each logic stage, but without calculating the path parasitic delay P or effort delay DF. Instead an arbitrary effort of 3.6 is assigned to each logic stage and used to calculate each critical path. If a path is too slow or too fast, then the negative or positive slack is divided equally amongst the gates along the path so that each of them still bears an equal effort. The patent directly links the cell area to its delay by deriving an area coefficient CS. This can allow delay constraints to be met while minimising area, albeit at the risk of a mismatch when selecting the library cell. There is a better match when two area coefficients CS0 and CS1 are used, but this is not considered in the patent. The prior art simply maps the calculated input capacitances for each cell to the nearest one in the library and hence if desired its area. What is new in the patent then is
|
Fig 8a. Adder stage efforts.![]() | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig 8b. Adder delays with
(i) all instances in same critical path given same stage effort;
(ii) wireload of 6fF per fanout;
(iii) input pin capacitances found by working backwards from the
output capacitance using each cell's stage effort;
(iv) gain limit of 5 used for non-inverting gates;
(v) library cell with next pin capacitance chosen;
(vi) timing from vsclib cells.![]() | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig 8c. Buffered adder stage efforts.![]() | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig 8d. Buffered adder delays with
(i) all instances in same critical path given same stage effort;
(ii) wireload of 6fF per fanout;
(iii) input pin capacitances found by working backwards from the
output capacitance using each cell's stage effort;
(iv) gain limit of 5 used for non-inverting gates;
(v) input buffer drive strengths tweaked;
(vi) library cell with next pin capacitance chosen;
(vii) timing from vsclib cells.![]() | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| UP PREV NEXT |
| 6-AUG-05 |