UP PREV NEXT |
Initial circuit | Timing overview | Best stage effort | Library mapping | Gate retiming | Input buffering | Better accuracy | Prior art | Summary | Conclusions |
This page gives a short technical summary of the report. Problem statementThe problem is to size the gates of a 4-bit adder shown on the right to meet a certain timing, input capacitance and output load specification. U.S. Patent number 6,725,438 describes a first method for the gate sizing using a single area coefficient. The second method uses two area coefficients. The third method using the prior art sizes the gates using the input pin capacitance.The methods are tested using the vsclib characterised using a generic 0.13um technology. Single area coefficient algorithmThe patent describes a method for analysing the cells in a standard cell library and deriving a coefficient C_{S} which allows the delay of a cell to be expressed as a function of its load and area.The stage effort of each gate is set to 3.6 to produce an initial circuit timing. This is then adjusted for each critical path in turn. Gates on paths with positive slack have their stage effort reduced by an equal amount. Gates on paths with negative slack have their stage effort increased by an equal amount until all paths have zero slack. Starting from the output, and given each gate's delay, their input capacitance can be calculated using expressions from logical effort theory. The input capacitance plus any wireload capacitance is the output capacitance of previous gates. The area of each cell can be estimated from its delay using the coefficient C_{S} and its load C_{OUT}. This is used to estimate the total circuit area and to select the cell with the nearest area from the library. |
Fig 9a. Initial 4-bit adder circuit
to meet timing of 350ps driving 35fF with input capacitance of 35fF. stage effort f = gC_{OUT} /C_{IN} gate delay d = τ ( p+f ) input capacitance C_{IN} = gC_{OUT} /( d/τ - p ) Definitions are on the Overview of CMOS timing models page. cell area A = C_{OUT} /(C_{S} × ( d/τ - p)) | ||||||||
Two area coefficients algorithmTwo area coefficients C_{S0} and C_{S1} to estimate the area from the delay give a better estimate than a single C_{S} one. |
cell area A = {C_{OUT} /( d/τ -p)-C_{S0} }/C_{S1} | ||||||||
Prior artThe book Logical Effort, Designing Fast CMOS Circuits by Ivan Sutherland, Bob Sproull and David Harris published by Morgan Kaufmann Publishers in 1999 gives the basic logical effort theory and equations.Each path is analysed and its total effort delay found by subtracting the parasitic delay from the path delay. Then the stage effort of each gate is the total effort delay divided by the number of stages. The critical path is the one with the lowest stage effort. The critical paths are evaluated in turn and used to assign stage efforts to each gate. Then working backwards from the outputs, each gate's input capacitance is calculated using the stage effort. The standard cell with the next highest input capacitance is chosen from the library. |
effort delay D_{F} = D-P stage effort = D_{F} /N input capacitance C_{IN} = gC_{OUT} /f | ||||||||
Initial schematic for the patent algorithm and the prior artThe schematic Fig 9b on the left below shows the fixed timing from a stage effort of 3.6 for the 4-bit adder. The schematic Fig 9c on the right below shows the stage efforts of each gate calculated according to the logical effort theory. | |||||||||
Fig 9b. 4-bit adder fixed delays. |
Fig 9c. 4-bit adder stage efforts. | ||||||||
Comprison of patent algorithm to prior artThe patent algorithm actually does the same operations to the circuit as the prior art. As a result the final netlist is nearly the same.What is new in the patent is
| |||||||||
Design considerationsThe method described in the patent does not by itself lead to the right solution for the 4-bit adder optimisation exercise.
| |||||||||
Fig 9d. Buffered 4-bit adder using patent
algorithm with a single area coefficient C_{S}. 86.7 gates |
Fig 9e. Buffered 4-bit adder using prior art. 89.3 gates | ||||||||
The schematic produced by the patent algorithm fails to meet the timing of 350ps. The timing fails also when using two area coefficients for mapping the library cells (Fig 7c). The problem is the undersized instances b0n and s0. The schematic produced by the prior art algorithm meets timing. |
UP PREV NEXT |
6-AUG-05 |