| UP PREV NEXT |
gate count 1758 number of cells 563 number of library cells 188 number of used cells 55 max fanin 4 max input capacitance 94 max internal fanout 34 critical path 0fF 2142 critical path 6fF 2441 |
Buffering the inputs properly is quite difficult with BOOG and LOON. The flow used here is described at the bottom of this page.
When the inputs are buffered, there is a speed improvement of 2.2% and area increase of 8.0% compared to the netlist without any buffers. A benefit of the buffers is the reduction in fanin, from 17 down to 4, and consequent input capacitance reduction from 214fF down to 94fF.
Four out of nine of the available buffers are used.
| TOTAL | bf1v5x05 | bf1v0x1 | bf1v4x1 | bf1v0x2 | bf1v0x3 | bf1v0x4 | bf1v0x6 | bf1v0x8 | bf1v0x12 |
| 19 | 0 | 0 | 1 | 0 | 0 | 6 | 0 | 1 | 11 |
The 18 high drive buffers buffer the inputs. The bf1v4x1 is inserted during the decoupling buffer step.
Synthesis #9 was the first to use a reasonably complete library of 70 cells with three drive strengths and a synthesis flow which started with the weakest drive strengths and buffered up where required. The result now, with a library extended up to 188 cells and for some functions seven drive strengths, has been an 18% speed improvement at the cost of a 19% area increase.
The critical path is shown below.
x 1 3 61
1 bf1v0x12 15 a->z 208 147
2 nd4v0x3 1 d->z 311 103
3 oai21v0x8 4 b->z 392 81
4 xor2v0x4 1 b->z 473 81
5 cgi2v0x3 3 c->z 578 105
6 iv1v0x6 1 a->z 627 49
7 cgi2v0x3 3 c->z 730 103
8 iv1v0x6 1 a->z 779 49
9 cgi2v0x3 3 c->z 889 110
10 iv1v0x6 1 a->z 938 49
11 cgi2v0x3 3 c->z 1055 117
12 iv1v0x6 1 a->z 1104 49
13 cgi2v0x3 3 c->z 1209 105
14 iv1v0x6 1 a->z 1258 49
15 cgi2v0x3 3 c->z 1361 103
16 iv1v0x6 1 a->z 1410 49
17 cgi2v0x3 4 c->z 1529 119
18 xnr2v0x3 1 a->z 1633 104
19 xor2v0x4 1 b->z 1722 89
20 cgi2v0x3 2 a->z 1817 95
21 iv1v0x4 1 a->z 1871 54
22 cgi2v0x3 2 c->z 1960 89
23 iv1v0x6 1 a->z 2009 49
24 cgi2v0x3 2 c->z 2090 81
25 an2v0x8 2 b->z 2194 104
26 an2v0x8 2 b->z 2304 110
27 xaon21v0x3 0 a2->z 2441 137
r 15 |
The next experiment will check the library results using a "standard" Alliance synthesis flow.
| Table of synthesis results | |||||||
| critical path (ps) | gate count | cell count | porosity | library cells | used cells | ||
| synthesis 1 | 4279 | 1561 | 923 | 43% | 9 | 8 | basic inverters, NAND & NOR gates |
| synthesis 2 | 4236 | 1472 | 792 | 45% | 15 | 12 | AND & OR gates |
| synthesis 3 | 4157 | 1357 | 696 | 46% | 19 | 16 | AOI & OAI gates, 2/1 and 2/2 |
| synthesis 4 | 4157 | 1357 | 696 | 46% | 20 | 16 | mxi2 2-way inverting mux |
| synthesis 5 | 3983 | 1343 | 668 | 48% | 21 | 16 | cgi2 carry generator inverting |
| synthesis 6 | 3948 | 1352 | 668 | 48% | 28 | 18 | inverters with multiple drive strengths |
| synthesis 7 | 3061 | 1433 | 666 | 51% | 70 | 27 | x2 drive strengths for all functions |
| synthesis 8 | 3056 | 1456 | 666 | 52% | 70 | 30 | BOOG with x1 drive strengths |
| synthesis 9 | 2960 | 1476 | 666 | 53% | 70 | 32 | BOOG with x05 drive strengths |
| synthesis 10 | 2963 | 1480 | 666 | 53% | 76 | 34 | nd2a and nr2a cells |
| synthesis 11 | 2963 | 1480 | 666 | 53% | 79 | 34 | nd2ab type of 2-OR |
| CyHP library | 3778 | 1539 | 832 | 46% | 18 | 17 | Minimum size library |
| synthesis 12 | 2908 | 1362 | 553 | 54% | 91 | 38 | AND/OR into XOR/XNOR |
| synthesis 13 | 2893 | 1378 | 551 | 55% | 103 | 39 | aoi211, aoi31, oai211 & oai31 |
| synthesis 14 | 2931 | 1400 | 562 | 55% | 104 | 38 | 3-XOR gate, 1/2 stage delays |
| synthesis 15 | 2886 | 1390 | 536 | 56% | 109 | 40 | 3-XOR/XNOR gates as 2×2-I/P gates |
| synthesis 16 | 2665 | 1514 | 538 | 60% | 136 | 46 | x3 drive strength cells |
| synthesis 17 | 2567 | 1571 | 540 | 61% | 155 | 49 | x4 drive strength cells |
| synthesis 18 | 2523 | 1611 | 540 | 62% | 167 | 49 | x6 drive strength cells |
| synthesis 19 | 2497 | 1625 | 538 | 62% | 179 | 54 | x8 drive strength cells |
| synthesis 20 | 2493 | 1628 | 541 | 62% | 188 | 55 | buffers to decouple non-critical paths |
| synthesis 21 | 2441 | 1758 | 563 | 64% | 188 | 55 | input buffers |
BOOG will not insert any input buffers, so they must be added by LOON. LOON will only insert input buffers if the input is on a critical path. Each input is forced to be the critical path input by setting its input resistance very high while the other inputs have a zero input resistance.
This though creates the problem that the gates between the input (which might not be a real critical path input) and the slowest output will also be on the critical path, and will be sized up as needed to speed up the critical path. Once they have been sized up, they won't be down-sized later if needed to improve the real critical path. This means the real critical path can be slowed down by over-sized gates driving non-critical outputs.
To get around this problem,
ENTITY tempbf1 IS GENERIC ( CONSTANT area : NATURAL := 4608; CONSTANT cin_a : NATURAL := 30; CONSTANT rdown_a_z : NATURAL := 10000; CONSTANT rup_a_z : NATURAL := 10000; CONSTANT tpll_a_z : NATURAL := 80; CONSTANT tphh_a_z : NATURAL := 80; CONSTANT transistors : NATURAL := 4 ); |
# file CATAL in directory ../vsclib013_6 contains cell tempbf1
# input file to boog is multi8_a.vbe
export MBK_CATA_LIB=../vsclib013_6
export MBK_TARGET_LIB=../vsclib013_b
boog -x 0 -m 3 multi8_a
x2y vst vst multi8_a multi8_b
loon -x 0 -l loon0 multi8_b multi8_a
for bit in 7 6 5 4 3 2 1 0
do
for operand in x y
do
./make_lax multi8_a "${operand}(${bit})" 80 15000 30 loon4xy 2 0
loon -x 0 -l loon4xy multi8_a multi8_0
x2y vst vst multi8_0 multi8_a
done
done
# file CATAL in directory ../vsclib013_0 does not contain tempbf1
export MBK_CATA_LIB=../vsclib013_0
loon -x 0 -m 1 multi8_a multi8
The file loon4xy.lax looks like
(for pin y(0)):
#M{2}
#I{
y(0):24298;
} |
where the line #M sets the LOON priority
which has to be 2,3 or 4 for any buffer insertion to take place;
and pin y(0) has an input resistance
of 24.298kΩ while the other inputs have none.
This LAX file is created in turn for each input,
and the
script make_lax extracts
the input pin capacitance from the xsc file
created by Alliance, using it to calculate the input pin resistance.
Doing the buffer insertion with LOON straight after
BOOG, and with only tempbf1 available allows the
delay to improve from 2493 to 2441. If instead the input buffer
insertion is done at the end of the synthesis flow,
then gates not on the real critical path are sized up and the
critical path speed is only 2500, slightly worse than the 2493
achieved with no input buffer insertion.
| UP PREV NEXT |