|UP PREV NEXT|
|Initial circuit||Timing overview||Best stage effort||Library mapping||Gate retiming||Input buffering||Better accuracy||Prior art||Summary||Conclusions|
An unoptimised 4-bit adder circuit will be used to test the gate sizing methods of the patent and the prior art .
4-bit adder before gate size optimisationThe initial 4-bit adder circuit is shown in Fig 1a on the right. The structure already has features which should allow for a fast circuit, like the carry generator cells c1/c1a and c2/c2a being placed in parallel. But the drive strengths of the cells have not been optimised, with each cell having the largest drive strength which uses unfolded transistors. The outputs are loaded with 35fF, we want the maximum input pin capacitance to be 35fF and the critical path to be 350ps. With the initial circuit, the largest input capacitance is 45fF on pin b(2), and the critical path is 545ps from b(1) to s(4).
Patents describing the optimisation algorithmThis report will use the methods described in the van Ginneken patents, U.S. patents nos. 6,453,446 and 6,725,438 to size the gates and produce the fastest adder circuit. A more accurate alternative will be considered as well as the prior art and a manual optimisation. The prior art will be based on the book Logical Effort, Designing Fast CMOS Circuits by Ivan Sutherland, Bob Sproull and David Harris published by Morgan Kaufmann Publishers in 1999.
Standard cell libraries used for the experimentThe cells in the circuit come from the vsclib characterised under typical conditions with a generic 0.13um technology.
Summary of initial circuit characteristicsFig 1a on the right shows the path delays for each gate calculated using a simple Prop-Ramp model with a 6fF wireload. The Prop delay is the average of the fixed rise and fall delays with a zero load. The Ramp delay is the average of the product of the drive strength and load capacitance. The load capacitance is the sum of the pin capacitances of the driven gates and the wireload, which here is 6fF for each fanout.
The full library delays are modelled with a lookup table (LUT) which varies the Prop and Ramp numbers as a function of the input transition time and output load for both rising and falling transitions. Here we use a simpler timing model as an aid to understanding the underlying concepts. In a full computer program as described in the patent, the detail from the full library timing would be used.
The average of the rise and fall delays is used except for the XOR and XNOR gates where the average of the non-inverting rise and fall delays is used. The critical path is highlighted in blue.
|Fig 1a. Initial 4-bit adder circuit
with vsclib timing
|UP PREV NEXT|