UP PREV NEXT

Buffering the Inputs

  
gate count               1758
number of cells           563
number of library cells   188
number of used cells       55
max fanin                   4
max input capacitance      94
max internal fanout        34
critical path  0fF       2142
critical path  6fF       2441

Buffering the inputs properly is quite difficult with BOOG and LOON. The flow used here is described at the bottom of this page.

When the inputs are buffered, there is a speed improvement of 2.2% and area increase of 8.0% compared to the netlist without any buffers. A benefit of the buffers is the reduction in fanin, from 17 down to 4, and consequent input capacitance reduction from 214fF down to 94fF.

Four out of nine of the available buffers are used.

TOTAL bf1v5x05 bf1v0x1 bf1v4x1 bf1v0x2 bf1v0x3 bf1v0x4 bf1v0x6 bf1v0x8 bf1v0x12
19 0 0 1 0 0 6 0 1 11

The 18 high drive buffers buffer the inputs. The bf1v4x1 is inserted during the decoupling buffer step.

Synthesis #9 was the first to use a reasonably complete library of 70 cells with three drive strengths and a synthesis flow which started with the weakest drive strengths and buffered up where required. The result now, with a library extended up to 188 cells and for some functions seven drive strengths, has been an 18% speed improvement at the cost of a 19% area increase.

The critical path is shown below.

    x 1           3                   61
 1  bf1v0x12     15  a->z      208   147
 2  nd4v0x3       1  d->z      311   103
 3  oai21v0x8     4  b->z      392    81
 4  xor2v0x4      1  b->z      473    81
 5  cgi2v0x3      3  c->z      578   105
 6  iv1v0x6       1  a->z      627    49
 7  cgi2v0x3      3  c->z      730   103
 8  iv1v0x6       1  a->z      779    49
 9  cgi2v0x3      3  c->z      889   110
10  iv1v0x6       1  a->z      938    49
11  cgi2v0x3      3  c->z     1055   117
12  iv1v0x6       1  a->z     1104    49
13  cgi2v0x3      3  c->z     1209   105
14  iv1v0x6       1  a->z     1258    49
15  cgi2v0x3      3  c->z     1361   103
16  iv1v0x6       1  a->z     1410    49
17  cgi2v0x3      4  c->z     1529   119
18  xnr2v0x3      1  a->z     1633   104
19  xor2v0x4      1  b->z     1722    89
20  cgi2v0x3      2  a->z     1817    95
21  iv1v0x4       1  a->z     1871    54
22  cgi2v0x3      2  c->z     1960    89
23  iv1v0x6       1  a->z     2009    49
24  cgi2v0x3      2  c->z     2090    81
25  an2v0x8       2  b->z     2194   104
26  an2v0x8       2  b->z     2304   110
27  xaon21v0x3    0  a2->z    2441   137
    r 15

The next experiment will check the library results using a "standard" Alliance synthesis flow.

Table of synthesis results  
  critical path (ps) gate count cell count porosity library cells used cells
synthesis 1 4279 1561 923 43%   9  8 basic inverters, NAND & NOR gates
synthesis 2 4236 1472 792 45%  15 12 AND & OR gates
synthesis 3 4157 1357 696 46%  19 16 AOI & OAI gates, 2/1 and 2/2
synthesis 4 4157 1357 696 46%  20 16 mxi2 2-way inverting mux
synthesis 5 3983 1343 668 48%  21 16 cgi2 carry generator inverting
synthesis 6 3948 1352 668 48%  28 18 inverters with multiple drive strengths
synthesis 7 3061 1433 666 51%  70 27 x2 drive strengths for all functions
synthesis 8 3056 1456 666 52%  70 30 BOOG with x1 drive strengths
synthesis 9 2960 1476 666 53%  70 32 BOOG with x05 drive strengths
synthesis 10 2963 1480 666 53%  76 34 nd2a and nr2a cells
synthesis 11 2963 1480 666 53%  79 34 nd2ab type of 2-OR
CyHP library 3778 1539 832 46%  18 17 Minimum size library
synthesis 12 2908 1362 553 54%  91 38 AND/OR into XOR/XNOR
synthesis 13 2893 1378 551 55% 103 39 aoi211, aoi31, oai211 & oai31
synthesis 14 2931 1400 562 55% 104 38 3-XOR gate, 1/2 stage delays
synthesis 15 2886 1390 536 56% 109 40 3-XOR/XNOR gates as 2×2-I/P gates
synthesis 16 2665 1514 538 60% 136 46 x3 drive strength cells
synthesis 17 2567 1571 540 61% 155 49 x4 drive strength cells
synthesis 18 2523 1611 540 62% 167 49 x6 drive strength cells
synthesis 19 2497 1625 538 62% 179 54 x8 drive strength cells
synthesis 20 2493 1628 541 62% 188 55 buffers to decouple non-critical paths
synthesis 21 2441 1758 563 64% 188 55 input buffers

BOOG will not insert any input buffers, so they must be added by LOON. LOON will only insert input buffers if the input is on a critical path. Each input is forced to be the critical path input by setting its input resistance very high while the other inputs have a zero input resistance.

This though creates the problem that the gates between the input (which might not be a real critical path input) and the slowest output will also be on the critical path, and will be sized up as needed to speed up the critical path. Once they have been sized up, they won't be down-sized later if needed to improve the real critical path. This means the real critical path can be slowed down by over-sized gates driving non-critical outputs.

To get around this problem,

  1. The input buffers are added immediately after BOOG and using the BOOG library. This library only has one drive strength per gate, so the gates cannot be wrongly up-sized.
  2. A special buffer called tempbf1 is added to the BOOG library. This buffer has a poor drive strength and high input capacitance, which means that it is never selected to decouple a critical path. Its timing is made symmetrical in order to make its average delay easy to calculate:

    ENTITY tempbf1 IS
    GENERIC (
      CONSTANT area          : NATURAL := 4608;
      CONSTANT cin_a         : NATURAL := 30;
      CONSTANT rdown_a_z     : NATURAL := 10000;
      CONSTANT rup_a_z       : NATURAL := 10000;
      CONSTANT tpll_a_z      : NATURAL := 80;
      CONSTANT tphh_a_z      : NATURAL := 80;
      CONSTANT transistors   : NATURAL := 4
    );

  3. The input resistance is calculated for each input using that input's capacitance, the pin cap of the tempbf1 buffer, and the buffer's srive strength. The point at which adding an input buffer becomes faster than not having one occurs when
    input_resistance×pin_capacitance = input_resistance×buffer_pin_capacitance+Prop+Ramp×pin_capacitance
    where
    Prop = 80
    Ramp = 10k
    buffer_pin_capacitance = 30f
    giving, as an example for the case when the input pin capacitance is 200fF,
    input_resistance = (Prop+Ramp×pin_capacitance)/(pin_capacitance-buffer_pin_capacitance)
    input_resistnce = (80+10×200)/(200-30)
    input_res = 12.235kΩ
  4. This input resistance is the one at which adding tempbbf1 matches the delay of the input resistance. Sometimes LOON inserts the buffer, and sometimes it doesn't. To make sure it always does, the Ramp time used in the equation above is increased from 10kΩ to 15kΩ, which increases the input resistance used to
    input_resistance = (80+15×200)/(200-30) = 18.118kΩ
  5. Once all the inputs have been buffered, cell tempbf1 (which doesn't exist) is replaced by a real buffer. In this experiment, the buffer used is the bf1v0x4. Further LOON synthesis steps will increase this buffer size if it is on the critical path, and otherwise it will remain at an x4 drive strength. The basic steps are shown in the script excerpt below. The script make_lax calculates the input resistance for each input using a Prop delay of 80; a Ramp delay of 15; a buffer pin capacitance of 30; and using the the input delay from the multi8_a.xsc file to get the input capacitance.

    # file CATAL in directory ../vsclib013_6 contains cell tempbf1
    # input file to boog is multi8_a.vbe
    export MBK_CATA_LIB=../vsclib013_6
    export MBK_TARGET_LIB=../vsclib013_b
    boog -x 0 -m 3 multi8_a
    x2y vst vst multi8_a multi8_b
    loon -x 0 -l loon0 multi8_b multi8_a
    for bit in 7 6 5 4 3 2 1 0
    do
    for operand in x y
    do
    ./make_lax multi8_a "${operand}(${bit})" 80 15000 30 loon4xy 2 0
    loon -x 0 -l loon4xy multi8_a multi8_0
    x2y vst vst multi8_0 multi8_a
    done
    done
    # file CATAL in directory ../vsclib013_0 does not contain tempbf1
    export MBK_CATA_LIB=../vsclib013_0
    loon -x 0 -m 1 multi8_a multi8
    The file loon4xy.lax looks like (for pin y(0)):
    #M{2}
    #I{
    y(0):24298;
    }

    where the line #M sets the LOON priority which has to be 2,3 or 4 for any buffer insertion to take place; and pin y(0) has an input resistance of 24.298kΩ while the other inputs have none.
    This LAX file is created in turn for each input, and the script make_lax extracts the input pin capacitance from the xsc file created by Alliance, using it to calculate the input pin resistance.

Doing the buffer insertion with LOON straight after BOOG, and with only tempbf1 available allows the delay to improve from 2493 to 2441. If instead the input buffer insertion is done at the end of the synthesis flow, then gates not on the real critical path are sized up and the critical path speed is only 2500, slightly worse than the 2493 achieved with no input buffer insertion.

UP PREV NEXT