vlsitechnology.org /2LM sclib /synthesis /looping loon |
Looping LOON |
Chapter
Section
The synthesis flow we have considered so far involves basically one BOOG and one LOON synthesis, and taking the fastest resulting netlist. Now we replace the script buf_loon with script loop_buf_loon which loops LOON until the fastest netlist has been found.
The looping is three deep inside a loop that continues until there is no further improvement in the delay. BOOG provides the starting netlist, and subsequent netlists are taken from the first synthesis loop which gives the fastest result at the end of three loops. Each synthesis loop then requires 84 LOON commands instead of 1, as is shown by the pseudo-code below.
min_delay=100000; old_min_delay=100001 while [ "$old_min_delay" -gt "$min_delay" ]; do old_min_delay=$min_delay for opt1 in 0 1 2 4; do ./buf_loon cct0 cct1 #4 times for opt2 in 0 1 2 4; do ./buf_loon cct1 cct2 #16 times for opt3 in 0 1 2 4; do ./buf_loon cct2 cct3 > buf_loon.log #64 times delay=$(grep '^Critical ' buf_loon.log | cut) if [ "$min_delay" -gt "$delay" ]; then min_delay=$delay cct0=cct1 fi |
Running loop_buf_loon with each BOOG netlist from the four min directories (x1, x2, x4, x8) and four opt levels (0, 1, 2, 4), so 16 netlists in all, on a Pentium M at 1.5GHz this takes nearly 4 hours to run. On a Core Duo T2600 at 2.16GHz it takes just over 2½ hours. (This ratio in job times implies that the T2600 at 2.16GHz matches the performance of a Pentium M at 2.33GHz…not much of an increase over the four year gap between the two machines.)
A script which securely loops LOON, handling crashes and lock ups and sometimes netlist corruption gives the fastest netlist, although with a long run time.
The use of a 45fF wireload gives a more realistic timing which better matches industry practice. However this netlist is not necessarily the fastest with the 0fF wireload timing which doesn't include the effect of wire capacitance.
A table summarising the delays, gate count and fanin for the different options which have been tested is shown below.
wireload 0fF | wireload 45fF | |||||
---|---|---|---|---|---|---|
delay | gates | max fanin |
delay | gates | max fanin |
|
Original Alliance lib and synthesis | 25042 | 1674 | 8 | 31429 | ||
Original Alliance lib and best synthesis | 22789 | 1961 | 26 | 30226 | ||
Library corrections and carry cell | 20014 | 2036 | 26 | 27036 | ||
Min drive strength BOOG synthesis | 19754 | 2021 | 63 | 28360 | ||
BOOG macros |
18131 | 2259 | 20 | 23967 | ||
LOON macros. wireload BOOG 0fF, LOON 45fF |
17984 | 22522 | 2301 | 19 | ||
find_rin, buf_loon scripts BOOG 0fF, LOON 45fF |
17909 | 22759 | 2282 | 4 | ||
loop_buf_loon script BOOG 0fF, LOON 45fF |
18327 | 22406 | 2319 | 3 |
The script loop_buf_loon, like buf_loon is structured like loon, so that the synthesis job is similar to the previous one. In addition it starts with script ./find_fanin to check whether fanin reduction is requested and if so it runs script ./find_rin. In the job on the right, the script is looped for the four starting libraries and four opt levels for boog.
Events 1 and 2 set up the BOOG looping. Event 3 defines the BOOG library path and event 4 runs boog.
Event 5 defines the library path for LOON synthesis: the full library with macros and 45fF wireload timing. Event 6 runs loop_buf_loon starting with the netlist from boog, writing the fastest netlist to multi8.vst by looping loon until it has been reached. The opt level defined in the LAX file is not used since all opt levels are looped inside loop_buf_loon in order to choose the fastest netlist. What is needed from the LAX file are the input resistances and output loads.
The fastest netlist comes when lib=x4 and opt1=4. The job output for this condition is shown on the right. This corresponds to the boog and loon sequence below (ignoring the ./find_rin script at the start of loop_buf_loon which is used to buffer the inputs).
1 $ boog -l loon_0000_300_4 multi8 multi8_4 2 $ loon -l loon_1500_300_4 multi8_4 multi8_4_4 3 $ loon -l loon_1500_300_2 multi8_4_4 multi8_4_42 4 $ loon -l loon_1500_300_2 multi8_4_42 multi8_4_422 5 $ loon -l loon_1500_300_0 multi8_4_422 multi8_4_4220 |
Event 5 with opt level 0 flattens the netlist in case the previous synthesis used any macros. The netlist from the initial BOOG synthesis has a fanin of 69, which is reduced to 3 with the first LOON synthesis.
Note the sequence which gives a crash. Some synthesis flows have a lot of crashes!
1 $ for lib in x1 x2 x4 x8; do 2 $ for opt1 in 0 1 2 4; do 3 $ MBK_TARGET_LIB=$ALLIANCE_MOS/vbe/sclib100_0_min_${lib} 4 $ boog -l loon_0000_300_${opt1} multi8 multi8_${opt1} 2>/dev/null >/dev/null 5 $ MBK_TARGET_LIB=$ALLIANCE_MOS/vbe/sclib100_45 6 $ ./loop_buf_loon -l loon_1500_300_4 multi8_${opt1} multi8 7 $ done 8 $ done |
Tue Jul 8 21:58:47 BST 2008 #multi8_4_0... Delay 29572, Area 1665972, Gates 2204 #multi8_4_1... Delay 29335, Area 1667988, Gates 2206 #multi8_4_2... crashed #multi8_4_4... Delay 23074, Area 1741824, Gates 2304 #multi8_4_12.. Delay 23038, Area 1728468, Gates 2286 #multi8_4_42.. Delay 22466, Area 1750896, Gates 2316 #multi8_4_422. Delay 22406, Area 1753164, Gates 2319 # elapsed time 366s #multi8_4_40... Delay 23074, Area 1741824, Gates 2304 #multi8_4_42... Delay 22466, Area 1750896, Gates 2316 #multi8_4_422.. Delay 22406, Area 1753164, Gates 2319 # elapsed time 210s #multi8_4_4220.. Delay 22406, Area 1753164, Gates 2319 Tue Jul 8 22:08:40 BST 2008 |