Guidelines for early power analysis

最新推荐文章于 2024-03-12 13:54:59 发布

aaojiao1342

最新推荐文章于 2024-03-12 13:54:59 发布

阅读量421

点赞数 1

原文链接：http://www.cnblogs.com/bluefish/archive/2013/06/09/3129429.html

版权

Siddharth Guha & Kiran Vittal - Atrenta

2/11/2013 7:00 AM EST

While design sizes and complexities are increasing steadily, the power budget for electronic devices is aggressively decreasing. This increased demand for low power design is driven by various factors. First, wireless devices cannot afford high power consumption due to the limitations of battery power. Second, even wired devices cannot afford high power consumption as the cooling costs are significant. Additionally, in the last few years, government bodies, such as the European Union, have recognized the need for energy efficient devices and have set strict regulations. So various forces are now compelling the market to produce power-efficient electronic devices.

It is very important for system-on-a-chip (SoC) designers to understand power consumption early in the design cycle to meet the desired power budget. However, one of the complexities involved is that in the initial stages of SoC design not much information is available to accurately estimate power. As the design progresses, power consumption becomes clearer with the availability of simulation vectors, technology libraries and decisions taken for synthesis and routing. On the other hand, the best time to optimize power is in the early stages of the design. The later it gets in the design flow, the harder it gets to make changes to reduce power. One of the biggest challenges for the designer is to have a set of tools and flows which can work right from the very early stage of the design through the later stages in the flow. This article discusses some of the challenges of setting up such a flow and shares five guidelines for early and accurate power analysis at the register transfer level (RTL) of abstraction. The RTL abstraction for an SoC is developed during the early stages.

Guideline 1: Leverage design activity information

One of the required pieces of information needed for any power analysis tool is the toggle, or activity information of the design. Simulation output files, like VCD and FSDB, contain detailed information of the switching activity of each net in the design. This is known as vector-based power estimation. Estimating power using this kind of information is very accurate but is time consuming.

On the other hand, vector-less power estimation is an approach to estimate the power based on probabilistic toggling information. This approach is much faster but can be also less accurate. Several case studies are available to explain why probabilistic power estimation can be inaccurate, primarily because of loss in spatial and temporal correlation between the signals. This is however not just related to the signals.

Consider that you are estimating the power of a memory and have the activity and duty cycles for each net connected to the memory. In the technology libraries, the power table for the memory is described as follows:

/*    DISABLED POWER */
              internal_power() {
              related_pg_pin : "VDD" ;
                when: "(!BISTEA & !MEA & !DFTMASK) & !LS";
               rise_power(INPUT_BY_TRANS) {
                 values ("0.342393, 0.342393, 0.342393, 0.342393, 0.342393");
               }
…
              }
/*    WRITE_SLOW POWER */
              internal_power() {
              related_pg_pin : "VDD" ;
                when: "(!BISTEA & MEA & WEA & !DFTMASK & RMEA & RMA[0] & !RMA[1] & !RMA[2] & !RMA[3] & !LS)";
               rise_power(INPUT_BY_TRANS) {
               values (" 5.791451,5.791451, 5.791451, 5.791451, 5.791451");
               }
               …

              }
/*    READ POWER */
              internal_power() {
              related_pg_pin : "VDD" ;
                when: "(!BISTEA & MEA & !WEA) & !DFTMASK & !RMEA & !LS";
               rise_power(INPUT_BY_TRANS) {
             values (" 5.067451, 5.067451, 5.067451, 5.067451, 5.067451");
}

The power for the memory varies significantly based on the different “when” conditions in the library model. So even if we get an accurate toggle rate and duty cycle of all the nets in the design, no simulation output will provide the duty cycle of these “when” conditions. This is because these “when” conditions are not present as nets in the design. So even if you have a very detailed VCD file for the design, to accurately calculate power, the power analysis needs to do an internal cycle-based simulation.

Adopting a hybrid approach

It is clearly evident that doing a cycle-based evaluation for each condition of the power table for each cell in a design is not a scalable solution for large SoCs. So instead of taking a purely probabilistic approach, or a complete cycle-based approach, power analysis flows can take a hybrid approach depending on the following factors:

Stage of the design, including availability of the RTL or netlist, or libraries for hard macro
Availability of simulation data
Design specific data – like memory, datapath, analog cells, black boxes, etc.

Figure 1: A typical early stage IP sub-system block

Suppose we have a design at an early stage of RTL coding. As shown in Figure 1, there are 4 blocks:

Block A: This is an RTL block for which simulation data is available.

Block B: This is an RTL block for which no simulation data is available so far.

Block C: This is a black box for which the RTL is still not available, but the designer is aware of some characteristics of this block.

Block D: This is block primarily consisting of memories and we have a simulation output file for this block.

As we can see, there is a fair variation in the progress and the characteristics of each block. Also, each block is at a different stage with respect to the availability of simulation data. So the early power analysis flow should be able to handle the best information available.

Block A has RTL with simulation data information. So the power analysis tool should be able to accept a simulation file at the block level. Since this block is mostly standard cell logic, power analysis tools will consume a VCD or FSDB data and convert it into toggle counts and duty cycles for each net. This will ensure that power estimation is much faster than a cycle-based approach. The error introduced here because of the loss of spatial and temporal correlation will not affect the accuracy of results for this kind of a design.

Block B is also an early stage RTL design where the simulation data is still not available. But at this stage, the designer has some information regarding the critical signals. These will be clocks and control signals.

Here, we can specify the clock period of the clock and the activity information or toggle rate for critical signals.

Many times, it is hard to specify the toggle rate for a signal internal to the design. However, even for vector-less power estimation, capturing the information for such signals is important. So the flow should allow specifying toggle information on such signals. One such signal is clock gating enables for blocks or registers.

Block C is a black box. There is no RTL information. So for such a case, the flow should be able to capture coarse design information, as shown below, in an early power analysis tool such as Atrenta’s SpyGlass® Power:

blackbox_power -instname block_c -equiv_nand2_count 3000 \
-register_count 100 –activity 0.3 -clocks a1 a2 -clock_percentage 0.5 0.5

The above command in the power analysis tool specifies that the black box will contain 3,000 NAND gate equivalent cells and 100 registers. Also, the average activity of this module will be 0.3. With this information and technology libraries the flow can estimate the power of this black box.

Block D contains many memories. Earlier in this section, we have seen that memories have a very high variation of dynamic power based on different access operations like “read” and “write”. So for this block, we need very accurate power estimation. A robust power estimation flow should be able to identify such logic from other logic in the design. Once it identifies such cells, it will enable very accurate tracing of each “when” condition for the cells. This is time-consuming, but the key is to be able to identify a critical number of cells that will benefit most from such detailed cycle-based evaluation.

The power analysis flow should be able to consume these different types of activity information and apply them based on design knowledge to estimate the power at an early stage in the design.

Guideline 2: Learn from an existing netlist design and apply it to the new RTL.

One of the key benefits of RTL power estimation is to get the power analysis early in the cycle. The flow does not go through the complete back-end steps. However, a good power analysis flow should be able to capture the intent of back-end analysis and apply it to the RTL. Scavenging an existing prototype design netlist can provide good information to RTL analysis tools for accurate power estimation as shown in Figure 2.

Many designs these days are derivative designs using the same technology node and libraries. In these cases, parts of the design have already gone through back-end place and route. So when we create a new design using exiting blocks, the early power analysis flow should be able to capture characteristics like capacitance, cell distribution, VT-mix, clock tree buffers, etc. It is important to support a completely automated flow of scavenging the key attributes from the netlist and apply them in RTL power estimation. At the same time, the flow should provide the flexibility for an advanced user to fine-tune the scavenged data.

Figure 2: Scavenging existing technology netlist for accurate RTL power analysis

The following factors affect components of power in the early analysis flow and relevant useful data can be brought into the RTL power estimation for new designs based on an existing netlist with the same technology nodes and libraries:

The synthesis engine should be fast enough but relatively accurate to match the area characteristics of actual implementation tools. Synthesis will have to use scan cells, as the final power correlation is being done with scanned netlist design.
In general, power analysis tools use minimum area-based cell mapping and may use cells that have very low drive strengths, and therefore this may result in power discrepancies. To work around this problem, use “don’t_use” or “don’t_touch” synthesis constraints on cells that have low drive strengths.
The power analysis tool needs to account for the impact of clock buffers added to clock trees and other buffers added to high-fanout nets.
In a few cases, libraries might have multiple power rails or blocks in the design that are in switched off power domains. In some cases, you may have different libraries that are operating at different voltages.
Clock power depends on the way clock gating is done in the design. By default, clock gating is not done in an early power analysis tool and the flow needs to infer an existing clock gating threshold.

Guideline 3: Do early physically-aware power estimation for timing sensitive designs

In advanced technology nodes, it is common that the overall power at RTL correlates well with the final netlist power. However, the individual sub-components of leakage power, internal power or combinational power do not match that of the final design. This is an inherent drawback of area-based synthesis for early power estimation and hence requires a solution that considers physical and timing constraints early at RTL to get more accurate results for power.

It is also important for the power analysis tool to read in the timing constraints in Synopsys Design Constraints (SDC) format to improve the power estimation results. The tool should also be able to take in physical libraries and do timing optimization and the changes for fixing design rule violations along with the slew calculation. The flow should also support the use of different versions of libraries (like nominal for power and worst for timing) for timing optimization and power computation. Further, with smaller geometries, the interconnect capacitance is becoming more significant. Thus many libraries do not have wire load models. In the absence wire load models, a flow that has a quick prototyping placement and floor plan module can extract fairly acute wiring capacitances.

Timing and physical optimization steps are time consuming and the tool needs to tradeoff fast run times at RTL and accuracy in power estimation

Figure 3: Early physically-aware power analysis

Guideline 4: Perform early RTL scan power estimation

SoC designs have multiple scan chains; each of them may have several thousand flops. If all the chains are run at the same time, then power dissipation is too high. Hence scan power is a key factor for deciding chip packaging. The power grid is designed with a certain maximum power, based on normal operational mode. If the power during test mode is significantly more, it may be necessary to slow down scan patterns or test certain blocks only. Both of these methods can cause excess cost due to higher test time.

So, estimating scan power early in the design phase is very important. If the estimated power is not under budget, then one needs to explore options to reduce test mode power. Here are a couple of options:

Run the tests at lower speed.
Some SoCs are designed with groups of scan chains. We can run one scan group at a time, so we can reduce the power with an increase in the test time. We need to find minimum number and arrangement of scan groups which will meet the power limit.

For early scan power analysis, it is required to specify the activity of the signals during scan operation. Automatic test pattern generation (ATPG) patterns can be viewed as statistically random, so a typical activity value for an ATPG pattern would be 0.5, or 50%. In some cases, where the designer uses low power ATPG to generate patterns, a lower activity value such as 0.3 or 0.1 can be used for RTL scan power analysis.

We have done some experiments to compare the power numbers generated by SpyGlass using a vector-less approach against a scan-inserted netlist using ATPG pattern VCD. We found that the power numbers at RTL correlate to within 10% of the final netlist. In Figure 4, the blue line shows the average power predicted by SpyGlass at RTL, before either the netlist or the ATPG patterns existed. As you can see, except for the chain test which is guaranteed highest activity, the average RTL prediction is very close to the actual value. This means that the RTL prediction can be used to make tradeoffs about scan chain grouping and to ensure that no “surprise” comes from excessive test mode power

Figure 4: Test mode power estimated at RTL with a vector-less approach

Guideline 5: Leverage formal (LEC) tool’s match points for netlist power estimation

As the design progresses from RTL to a netlist, the flow should be able to adapt the netlist for power analysis. However, gate level simulation is available much later. Also gate level simulation VCD or FSDB data are huge in size. So a suitable flow is to be able to estimate the gate level power with RTL simulation files. This flow has its own challenges. This is because when the RTL is synthesized to a gate level design, various name changes takes place. Module hierarchies may get flattened, vectors may get bit blasted and design constraints may change the name of the signals. So it is harder for a tool to automatically map the RTL simulation file information onto the gate level design. The flow should be able to consume the match points report of a logical equivalence check (LEC) tool to match the RTL and gate level register names as shown in Figure 5.

Figure 5: Logical equivalency check (LEC) tool provides mapping information for gate level power analysis based on RTL simulation data

Conclusion

This article shares five guidelines to perform an early power analysis with relevant and available data at each stage to avoid last-minute surprises in the SoC design process. This set of guidelines is applicable to any mobile or wired application and should be easy to adopt in any design implementation flow.

About the authors

Siddharth Guha is a Senior Engineering Manager at Atrenta India. Siddharth holds a bachelor’s degree in engineering from Netaji Subhas Institute of technology (NSIT), Delhi. Siddharth is primarily responsible for SpyGlass Power Estimation, Reduction and SEC products. You can reach him at sid@atrenta.com

Kiran Vittal is a Senior Director of Product Marketing at Atrenta, with over 23 years of experience in EDA and semiconductor design. Prior to joining Atrenta, he held engineering, field applications and product marketing positions at Synopsys Inc, ViewLogic Inc and Mentor Graphics Inc. Vittal holds an MBA from Santa Clara University and a bachelor's degree in electronics engineering from India. You can reach him at kvittal@atrenta.com.

转载于:https://www.cnblogs.com/bluefish/archive/2013/06/09/3129429.html