Você está na página 1de 6

ISSC 2012, NUI Maynooth, June 28-29

Dealing with the over-pessimism in ASIC physical design flow


Rashid Iqbal
Intel Shannon Ireland emai:rashid.iqbal@intel.com

_______________________________________________________________________________
Abstract A multi-million instance ASIC design with clocks frequencies up-to 1GHz,

employing automatic synthesis & layout methods, looks for every accuracy improvement in optimization and timing verification. This paper introduces two techniques to improve this accuracy. The first technique is the delay derate method. In this method, the implementation tool is provided with a global multiplier that reduces the interconnect delay of all interconnect timing arcs in the design. This technique helps to improve the design timing by reducing the number of buffers/inverters in the design. The additional advantages are reduction in the run time, area and power. The second technique improves accuracy in the timing verification by employing signoff STA (Static Timing Analysis) tool in the early stages of the physical design. The method provides an average timing improvement of 30% per cell stage, a significant performance boost for the critical timing paths.
Keywords ASIC, physical design, pessimism, synthesis, place and route

_______________________________________________________________________________ performs a coarse placement, creates a virtual layout and typically provided with the macro/IO placement positions from designer. It uses TLU+ [3] based RC estimation and an internal timing engine to perform the optimization and verification. After synthesis when the placer runs, the placement of standard cells is final but detail route has not been performed. The timing/DRC optimizations during critical synthesis and placement steps are based on less accurate (mostly pessimistic) information of delay and interconnect. Based on this inaccurate information, tools try to optimize paths, add buffers/inverters and select cells with particular drive strengths to fix timing and DRC violations. For a design running at 1.067 GHz with large macro count and high floorplan complexity, the over-fixing of timing and DRC based on large pessimism has severe impact on schedule and performance [4]. Any unnecessary optimization in the early stages caused by a pessimistic data, reduces timing margins for the later stages. Designers need early and accurate timing numbers because even a sub-block of exceeding 600K cells have large run times for placement and routing steps. A single iteration from synthesis to routing on such a design (with PnR tool high optimization options turned on) may take up to a week to complete. Hierarchical methodologies may reduce run time but development of accurate constraints/floorplan and then physical integration and timing closure becomes a complex task.

I INTRODUCTION The physical design phase of a high performance multi-million cell chip is extremely critical [1]. It starts with synthesis which converts RTL design into its equivalent optimized gate level netlist. The netlist is then taken to the layout tools which perform placement and routing (PnR) optimizations using internal timing engines. The interconnect RC (resistance/capacitance) during synthesis and placement is an estimation. The most accurate timing picture is obtained when interconnect RC of a routed design is extracted by a signoff extraction tool and timing is analyzed with a signoff STA (static timing analysis) tool [2]. Prior to signoff timing and extraction, different stages of the implementation flow have various level of pessimism in floorplan, layout, interconnect parasitic and delay calculation/information. The synthesis stage converts RTL into a technology independent netlist and then compiles the design to generate a technology dependent gate level netlist. The compile stage performs architectural, timing and DRC (Design Rule Constraints) optimizations to generate a fine tuned netlist that meets both timing and DRC constraints [2]. A modern approach is the use of topographical synthesis (topo-synthesis) which performs layout aware gate level synthesis to give better correlation with the post-layout timing. The topo-synthesis

The delay derate technique mentioned in this paper helps to counter the effects of the extra pessimism and therefore extra optimization performed by the tools. In this technique, the PnR tool is provided with a global multiplier number that reduces the interconnect delay of all interconnect timing arcs in the design. This reduction in delay is called delay derate. This derate is not performed on interconnect delays that are part of the clock tree. This derating helps to reduce the number of buffers/inverters in the design without sacrificing the performance. The additional advantages are the reduction in run time, area and power. Section II of this paper provides some data to prove the pessimism and section III discuses the derate technique and its effects. The applied derate exists in Synthesis/PnR tools and is removed during STA. Traditionally delay derating is applied only to cell instances to guard-band against ODV (On Die Variations). A scaling of interconnect RC is another common approach but the delay derate method has several advantages over typical RC scaling (discussed in section III). Traditional approach is to use a sign-off STA tool only on sign-off databases and rely on synthesis and PnR tools internal timing engines during implementation. The timing engines of implementation (synthesis and PnR) tools are less accurate compared to a signoff STA tool. A thorough timing analysis of the synthesis and PnR databases is critical to select options that can be modified for the follow up iteration of an implementation step or to make a decision to go for the next stage or not. For high performance designs, small timing differences can make a large percentage of the slack/clockperiod; thus becoming very important in any feasibility decision at the synthesis stage. The section IV of this paper explains how the pessimism in timing verification is reduced by employing signoff STA tool in the early stages of the physical design. Section IV explores different methods of feeding design information to the STA tool. Section V concludes the paper.
II INACCURACIES AFFECTING IMPLEMENTATION

delay calculation. The difference of cell count between synthesis and placement databases shows the difference in the floorplan and cell placement between the two stages. Additionally it also proves that placement tool does not remove unnecessary cells (coming from synthesis stage) completely. The placement step (with options and constraints selected for the best correlation with synthesis [5]) increases cell count and does not replace the weaker drive strength cells with the stronger. The ratio of toposynthesis buffers/inverters over non-topo buffers/inverters shows extra inverters/buffers added purely because of the floorplan and interconnect parasitic. We carried same design, with and without toposynthesis, through placement and routing and counted buffers and inverters. Since non-topo synthesis did not utilize wire load models (no wire R and C), we relied on placement and routing to insert necessary buffers/inverters to drive interconnects. The results in Figure 1 show that the topo-synthesis added a higher percentage of unnecessary buffers/inverters. This creates timing problems since the standard cells position change in placement stage (moving from non-signoff (NSOF) to signoff (SOF) information), resulting in an addition many new inverters/buffers. However the tool does not completely remove the unnecessary buffers and inverters added by the previous synthesis stage (where cell placement is not accurate). The Non-topo (without Wire Load Models [3]) synthesis does not use interconnect RC at all, so no buffers/inverters are added due to the interconnect effects. However when non-topo netlist is taken to the placement step, the number of inverters/buffers added by the placement are much less than that of non-topo scenario. The non-topo (without WLM) can be used for comparison purpose.

In order to emphasize the point on pessimism, we have collected some data on following: 1) The number of inverters/buffers added in synthesis, placement and routing 2) The comparison of the interconnect R and C values calculated by the implementation tools versus the values extracted by the signoff extraction tool. The excessive inverters/buffers represent the pessimistic R/C, floorplan and cell/interconnect Figure 1 (a) total inverter and buffer count in the design (b) percentage increase in inverters and buffers Figures 2 through 5 show the comparison of resistances and capacitances of same nets in the design. This comparison is performed on values taken from different implementation stages (synthesis, placement and signoff extraction) and the

signoff extraction. For example, Figure 2 shows how estimated resistances and capacitances in synthesis correlate to the signoff R and C. In Figure 2 a, signoff capacitances are plotted against themselves to generate the red 45 degree line. Green dots are synthesis capacitances, plotted on the y-axis against corresponding signoff cap values on xaxis. Synthesis capacitances above the 45 degree line show pessimism, under the line are optimistic and on the line match to the signoff capacitances. Figure 2 b shows % deviation of synthesis capacitances from the signoff . The x-axis is the percentage deviation and y-axis is the number of data points. Figure 3 gives similar comparison for the synthesis resistances. Figure 4 shows comparison of placement R and C. We can see from these charts that a large percentage of the estimated parasitics are pessimistic and the picture becomes more and more accurate (closer to 45 degree line) as we move from synthesis to routing.

Figure 3 Comparison of synthesis wire resistances with the extracted signoff resistances (a) 45 degree plot (b) percentage deviation

Figure 2 Comparison of synthesis wire capacitances with the extracted signoff capacitances (a) 45 degree plot (b) percentage deviation

Figure 4 Comparison of placement vs signoff extracted capacitances (a) 45 degree plot (b) percentage deviation graph

over R and C scaling is the short time to study the variation of the delay derate effects, as you need to change one value (the delay) than changing two (the interconnect R and C) . Since interconnect length and routing nature (metals, drivers and receiver cells) change from synthesis to routing, it is not easy to have R and C differences scaled correctly. Also it is clear from Figure 2-5 that R and C for different nets can increase or decrease by different amount (in comparison with the signoff parasitics). The interconnect delay derate models the product of R and C. By scaling R and C in synthesis and placement, tools select weaker cells on critical paths due to the scaled down values of the capacitances. Figure 6 shows the total number of weaker and stronger buffers (buffs) and inverters (invs) added by the synthesis and placement. The x50, x70 and x110 cells (shown in blue) are invs/buffs with higher drive strengths while others (shown in red) have lower drive strength. The comparison showed that although the floorplan picture was different in placement (accurate), low drive strength cells (added by synthesis stage) were not completely removed. Therefore, while it is important for the synthesis step to reduce the inv/buff count, it is also important to have stronger cells in synthesis, as one cannot rely on later stages to replace all the weaker cells with stronger cells. We are not saying that the design should not have weaker cells at all, but on critical paths that are struggling to meet setup timing, we need stronger drivers (especially if the routes are longer).

Figure 5 Comparison of placement estimated resistances with the signoff extracted resistances III) INTERCONNECT DELAY DERATE METHOD We struggled with the excessive inverters/buffers during synthesis and placement stages. Even after using proper floorplan (for macros) and aggressive timing effort options and constraints, it was not possible to generate a netlist containing smaller cell count and right sized drivers. This caused both timing and area problems. Our strategy was to create a less pessimistic picture within the implementation tools which would restrict them from doing unnecessary optimizations. Interconnect timing derate in synthesis and placement generates good results. The Derate means a reduction in the delay value of each interconnect delay timing arc (the timing arc from driver cell output to the receiver cell input pin). This timing derate was performed to model: o The delay differences between implementation tools and signoff timing caused by the inaccurate (mostly pessimistic) parasitics, floorplan and layout. The delay differences between implementation tool and signoff timing due to timing engine differences.

The net (interconnect) timing derate generated a much reduced cell count. The approach was to vary the amount of derate and analyze the database for its cell count, cells drive strengths and timing. We selected a derate number that gave us the optimum values of all these parameters. A typical approach is to scale R and C. The benefit of derate technique

Figure 6 (a) buffer/inverter count for toposynthesis (b) buffer/inverter strengths for placement on topo-synthesis With our method, the critical paths were optimized for delay and the cell count was considerably reduced at synthesis and placement stages. While we applied interconnect derate in synthesis and placement stages, no such derate was applied at CTS (Clock Tree Synthesis) & routing stages.

IV) ADDRESSING INACCURACIES IN TIMING


VERIFICATION

Figure 7 Inverter/Buffer count comparison of the database with different delay derates

Main motivation to use signoff STA tool on nonsignoff databases was to get better accuracy. Figures 10 to 12 explain how much accuracy is obtained with this approach. This section also explain three methods of bringing design to the STA tool and how accuracy varies between the three approaches. The design logical netlist is in verilog or ddc format while layout parasitic (R and C) are present in SPEF formats. The cell and interconnect delays can be represented in SDF (Standard Delay Format) format. {3] can be used to find detail explanation about ddc, SPEF and SDF formats. The sign-off STA reads post-route netlist (logical connection of cells and wires) and signoff interconnect parasitic file (R and C of wires). Following are the three ways of providing this information to the signoff STA tool: (A) Generate an SDF (Standard Delay Format) file from implementation tools (x) In STA, reading this SDF for both interconnect and cells timing arcs (y) In STA, reading this SDF for only interconnect timing arcs (B) Read a DDC [3] (binary design database) file (C) Read a SPEF (estimated R and C) file and a Netlist file ddc database contains both design netlist, delays and timing constraints. Since ddc contains all the design elements that the implementation tool has like clocks, constraints, exceptions, path-groups, derate, this approach can have pros and cons depending upon if we want to remove any stored information or not. A comparison of the similar paths between the two tools shows an average of 11% reduced cell delays and a 30% average reduction in interconnect delays with the signoff STA tool. The disadvantage with this approach is on large designs with complex constraints. On such designs, ddc read can take very long. On a block with 1.3M standard cells, this method can take upto 15 hours to read the database. For approach A-x and A-y, SDF file contains both cell and interconnect delays. Unlike ddc approach, this method needs to apply all the clocks and timing constraints. Approach A-x in STA tool shows same exact timing picture as seen in synthesis/PnR tools, so this method can be used to reproduce similar results in the STA tool (where it is much easier and faster to analyze). Approach A-y uses the STA tool to calculate cell delays (improvement of average 11% per cell) but the interconnect timing is same as

Figure 7 shows a design cell count with and


without interconnect derate. The reduction of cell count was much higher on critical paths. Figure 9 shows timing analysis on synthesis database with proper interconnect derate. Figure 8 shows signoff STA results of the same design when it is routed and interconnect R and C are extracted by the signoff extraction tool. By using the described derate not only cell count reduces but synthesis timing also shows better correlation with signoff and thus brings more accuracy to the early stage. The lower cell count gives less power, area and improves room for addition of new buffers during hold fixing. With more accurate and well correlated synthesis and placement timing; a good prediction of signoff quality can be made much earlier.

Figure 8 Timing results of the design after routing; STA with signoff STA and extraction with signoff extraction tool

Figure 9 Timing results of the same design at synthesis stage after applying proper delay derates

used in the implementation tool. The advantage is that for large designs, this method is very quick. The synthesis and PnR tools have estimated interconnect models (also called TLU+). The implementation tools dump all the estimated R & C values into a SPEF file and read it in the STA tool. This is a typical signoff STA flow (except that SPEF is not of signoff quality). This method allows the STA tool to calculate both cell and interconnect delays based on R and C data obtained from the implementation tool. From the timing results in Figure 10-12 it is clear that this approach gives the most accurate results when compared with the signoff-STA using signoff parasitics. The run time with this method is also very fast. Figure 10 shows a timing histogram (number of timing paths vs slack) of the synthesis database analyzed with various methods. The synthesis tool generates DDC, SDF (cell and interconnect) and SPEF which are analyzed in the signoff STA tool.

Figure 12 shows compares different methods on routing database. It also shows the timing data generated from the signoff STA based on the parasitics from the signoff extraction. We see that the estimated SPEF generated from the implementation tool gives the best match to the results of signoff STA.

Figure 12 Timing ran on routed database using different methods (also includes the signoff analysis) V) Conclusions To reduce the effects of inaccurate information during implementation, the interconnect delay is derated/reduced in synthesis and placement stages. The timing derate for interconnects helps in removing unnecessary excessive buffers and inverters. Selection of an appropriate derate value needs a study phase where derate is varied to achieve an optimum performance. Using signoff STA tool for timing verification of synthesis, placed and routed databases (with no signoff extraction) gives cell and interconnect delay improvement of up to 30%. The fastest and most accurate method is to generate SPEF file (with estimated resistances and capacitances of wires) and let signoff timing tool calculate both cell and interconnect delays in the design. REFERENCES [1] Joseph Williams et el, The Implementation of Two Multiprocessor DSPs: A Design Methodology Case Study, IEEE International Solid-State Circuits, 2001Chapter 4 , Wiley-IEEE Press, 2008. [2] Xiu, L., VLSI Circuit Design Methodology Demystified: A Conceptual Taxonomy, Chapter 4 , Wiley-IEEE Press, 2008. [3] Synopsys physical design tools (www.solvnet.synopsys.com) [4] Rashid Iqbal, Christopher Guerrier Physical Design Practices for a 1GHz SoC block on 32nm SNUG UK 2010. [5] Achieving Tight Correlation Between Synthesis and Layout Results Synopsys SolvNet Doc Id 026691 , April 29, 2009.

Figure 10 Comparison of timing analysis on postsynthesis databse with signoff timing tool Similarly Figure 11 compares the data generated from the placement stage, analyzed with the signoff STA tool. The SDF analysis was done using only the interconnect SDF portion.

Figure 11 Comparison of timing analysis on postplaced database with different techniques

Você também pode gostar