1. Optimising a 4-bit adder using the method described in the patent

Initial circuit Timing overview Best stage effort Library mapping Gate retiming Input buffering Better accuracy Prior art Summary Conclusions

An unoptimised 4-bit adder circuit will be used to test the gate sizing methods of the patent and the prior art .

4-bit adder before gate size optimisation

The initial 4-bit adder circuit is shown in Fig 1a on the right. The structure already has features which should allow for a fast circuit, like the carry generator cells c1/c1a and c2/c2a being placed in parallel. But the drive strengths of the cells have not been optimised, with each cell having the largest drive strength which uses unfolded transistors. The outputs are loaded with 35fF, we want the maximum input pin capacitance to be 35fF and the critical path to be 350ps. With the initial circuit, the largest input capacitance is 45fF on pin b(2), and the critical path is 545ps from b(1) to s(4).

Patents describing the optimisation algorithm

This report will use the methods described in the van Ginneken patents, U.S. patents nos. 6,453,446 and 6,725,438 to size the gates and produce the fastest adder circuit. A more accurate alternative will be considered as well as the prior art and a manual optimisation. The prior art will be based on the book Logical Effort, Designing Fast CMOS Circuits by Ivan Sutherland, Bob Sproull and David Harris published by Morgan Kaufmann Publishers in 1999.

Standard cell libraries used for the experiment

The cells in the circuit come from the vsclib characterised under typical conditions with a generic 0.13um technology.

Summary of initial circuit characteristics

Fig 1a on the right shows the path delays for each gate calculated using a simple Prop-Ramp model with a 6fF wireload. The Prop delay is the average of the fixed rise and fall delays with a zero load. The Ramp delay is the average of the product of the drive strength and load capacitance. The load capacitance is the sum of the pin capacitances of the driven gates and the wireload, which here is 6fF for each fanout.

The full library delays are modelled with a lookup table (LUT) which varies the Prop and Ramp numbers as a function of the input transition time and output load for both rising and falling transitions. Here we use a simpler timing model as an aid to understanding the underlying concepts. In a full computer program as described in the patent, the detail from the full library timing would be used.

The average of the rise and fall delays is used except for the XOR and XNOR gates where the average of the non-inverting rise and fall delays is used. The critical path is highlighted in blue.

Fig 1a. Initial 4-bit adder circuit with vsclib timing initial adder

4-bit adder delays and netlist drive strengths
critical path 545
gate count 40.3
c0 x2
c0a x2
s0n x1
s0 x2
a1n x2
b1n x2
xn1 x1
s1 x1
c1 x1
c1a x1
c1n x2
xn2 x1
s2 x1
c2 x1
c2a x1
xn3 x1
s3 x1
nd3 x2
nr3 x1
s4 x1