vlsitechnology.org /2LM sclib /synthesis /looping loon

Looping LOON

The synthesis flow we have considered so far involves basically one BOOG and one LOON synthesis, and taking the fastest resulting netlist. Now we replace the script buf_loon with script loop_buf_loon which loops LOON until the fastest netlist has been found.

The looping is three deep inside a loop that continues until there is no further improvement in the delay. BOOG provides the starting netlist, and subsequent netlists are taken from the first synthesis loop which gives the fastest result at the end of three loops. Each synthesis loop then requires 84 LOON commands instead of 1, as is shown by the pseudo-code below.

min_delay=100000; old_min_delay=100001 while [ "$old_min_delay" -gt "$min_delay" ]; do old_min_delay=$min_delay for opt1 in 0 1 2 4; do ./buf_loon cct0 cct1 #4 times for opt2 in 0 1 2 4; do ./buf_loon cct1 cct2 #16 times for opt3 in 0 1 2 4; do ./buf_loon cct2 cct3 > buf_loon.log #64 times delay=$(grep '^Critical ' buf_loon.log \| cut) if [ "$min_delay" -gt "$delay" ]; then min_delay=$delay cct0=cct1 fi

min_delay=100000; old_min_delay=100001
while [ "$old_min_delay" -gt "$min_delay" ]; do
 old_min_delay=$min_delay
 for opt1 in 0 1 2 4; do
  ./buf_loon cct0 cct1                   #4 times 
  for opt2 in 0 1 2 4; do
   ./buf_loon cct1 cct2                  #16 times 
   for opt3 in 0 1 2 4; do
    ./buf_loon cct2 cct3 > buf_loon.log  #64 times 
    delay=$(grep '^Critical ' buf_loon.log | cut)
    if [ "$min_delay" -gt "$delay" ]; then
      min_delay=$delay
      cct0=cct1
    fi

Running loop_buf_loon with each BOOG netlist from the four min directories (x1, x2, x4, x8) and four opt levels (0, 1, 2, 4), so 16 netlists in all, on a Pentium M at 1.5GHz this takes nearly 4 hours to run. On a Core Duo T2600 at 2.16GHz it takes just over 2½ hours. (This ratio in job times implies that the T2600 at 2.16GHz matches the performance of a Pentium M at 2.33GHz…not much of an increase over the four year gap between the two machines.)

A script which securely loops LOON, handling crashes and lock ups and sometimes netlist corruption gives the fastest netlist, although with a long run time.

The use of a 45fF wireload gives a more realistic timing which better matches industry practice. However this netlist is not necessarily the fastest with the 0fF wireload timing which doesn't include the effect of wire capacitance.

A table summarising the delays, gate count and fanin for the different options which have been tested is shown below.

	wireload 0fF			wireload 45fF
	delay	gates	max fanin	delay	gates	max fanin
Original Alliance lib and synthesis	25042	1674	8	31429
Original Alliance lib and best synthesis	22789	1961	26	30226
Library corrections and carry cell	20014	2036	26	27036
Min drive strength BOOG synthesis	19754	2021	63	28360
BOOG macros	18131	2259	20	23967
LOON macros. wireload BOOG 0fF, LOON 45fF	17984			22522	2301	19
find_rin, buf_loon scripts BOOG 0fF, LOON 45fF	17909			22759	2282	4
loop_buf_loon script BOOG 0fF, LOON 45fF	18327			22406	2319	3

The script loop_buf_loon, like buf_loon is structured like loon, so that the synthesis job is similar to the previous one. In addition it starts with script ./find_fanin to check whether fanin reduction is requested and if so it runs script ./find_rin. In the job on the right, the script is looped for the four starting libraries and four opt levels for boog.

Events 1 and 2 set up the BOOG looping. Event 3 defines the BOOG library path and event 4 runs boog.

Event 5 defines the library path for LOON synthesis: the full library with macros and 45fF wireload timing. Event 6 runs loop_buf_loon starting with the netlist from boog, writing the fastest netlist to multi8.vst by looping loon until it has been reached. The opt level defined in the LAX file is not used since all opt levels are looped inside loop_buf_loon in order to choose the fastest netlist. What is needed from the LAX file are the input resistances and output loads.

The fastest netlist comes when lib=x4 and opt1=4. The job output for this condition is shown on the right. This corresponds to the boog and loon sequence below (ignoring the ./find_rin script at the start of loop_buf_loon which is used to buffer the inputs).

1 $ boog -l loon_0000_300_4 multi8 multi8_4 2 $ loon -l loon_1500_300_4 multi8_4 multi8_4_4 3 $ loon -l loon_1500_300_2 multi8_4_4 multi8_4_42 4 $ loon -l loon_1500_300_2 multi8_4_42 multi8_4_422 5 $ loon -l loon_1500_300_0 multi8_4_422 multi8_4_4220

1 $ boog -l loon_0000_300_4 multi8 multi8_4
2 $ loon -l loon_1500_300_4 multi8_4 multi8_4_4
3 $ loon -l loon_1500_300_2 multi8_4_4 multi8_4_42
4 $ loon -l loon_1500_300_2 multi8_4_42 multi8_4_422
5 $ loon -l loon_1500_300_0 multi8_4_422 multi8_4_4220

Event 5 with opt level 0 flattens the netlist in case the previous synthesis used any macros. The netlist from the initial BOOG synthesis has a fanin of 69, which is reduced to 3 with the first LOON synthesis.

Note the sequence which gives a crash. Some synthesis flows have a lot of crashes!

1 $ for lib in x1 x2 x4 x8; do 2 $ for opt1 in 0 1 2 4; do 3 $ MBK_TARGET_LIB=$ALLIANCE_MOS/vbe/sclib100_0_min_${lib} 4 $ boog -l loon_0000_300_${opt1} multi8 multi8_${opt1} 2>/dev/null >/dev/null 5 $ MBK_TARGET_LIB=$ALLIANCE_MOS/vbe/sclib100_45 6 $ ./loop_buf_loon -l loon_1500_300_4 multi8_${opt1} multi8 7 $ done 8 $ done

1 $ for lib in x1 x2 x4 x8; do
2 $ for opt1 in 0 1 2 4; do
3 $ MBK_TARGET_LIB=$ALLIANCE_MOS/vbe/sclib100_0_min_${lib}
4 $ boog -l loon_0000_300_${opt1} multi8 multi8_${opt1} 2>/dev/null >/dev/null
5 $ MBK_TARGET_LIB=$ALLIANCE_MOS/vbe/sclib100_45
6 $ ./loop_buf_loon -l loon_1500_300_4 multi8_${opt1} multi8
7 $ done
8 $ done

Tue Jul 8 21:58:47 BST 2008 #multi8_4_0... Delay 29572, Area 1665972, Gates 2204 #multi8_4_1... Delay 29335, Area 1667988, Gates 2206 #multi8_4_2... crashed #multi8_4_4... Delay 23074, Area 1741824, Gates 2304 #multi8_4_12.. Delay 23038, Area 1728468, Gates 2286 #multi8_4_42.. Delay 22466, Area 1750896, Gates 2316 #multi8_4_422. Delay 22406, Area 1753164, Gates 2319 # elapsed time 366s #multi8_4_40... Delay 23074, Area 1741824, Gates 2304 #multi8_4_42... Delay 22466, Area 1750896, Gates 2316 #multi8_4_422.. Delay 22406, Area 1753164, Gates 2319 # elapsed time 210s #multi8_4_4220.. Delay 22406, Area 1753164, Gates 2319 Tue Jul 8 22:08:40 BST 2008

Tue Jul  8 21:58:47 BST 2008
#multi8_4_0... Delay 29572, Area 1665972, Gates 2204
#multi8_4_1... Delay 29335, Area 1667988, Gates 2206
#multi8_4_2... crashed
#multi8_4_4... Delay 23074, Area 1741824, Gates 2304
#multi8_4_12.. Delay 23038, Area 1728468, Gates 2286
#multi8_4_42.. Delay 22466, Area 1750896, Gates 2316
#multi8_4_422. Delay 22406, Area 1753164, Gates 2319
# elapsed time 366s
#multi8_4_40... Delay 23074, Area 1741824, Gates 2304
#multi8_4_42... Delay 22466, Area 1750896, Gates 2316
#multi8_4_422.. Delay 22406, Area 1753164, Gates 2319
# elapsed time 210s
#multi8_4_4220.. Delay 22406, Area 1753164, Gates 2319
Tue Jul  8 22:08:40 BST 2008