With today’s increasingly large and complex digital IC and system-on-chip (SoC) designs, design power closure and circuit power integrity are starting to become one of the main engineering challenges, thereby impacting the device’s total time-to-market.
The shear amount of power consumed by some devices can cause significant design problems. For example, a recently announced CPU consumes 100 amps at 1.3 volts, which equates to 130 Watts! This class of device requires expensive packaging and heat sinks. The heat gradient across the chip can cause mechanical stress leading to early breakdown, and the act of physically delivering all of this power into the chip is non-trivial. Thus, even in the case of devices intended for use in non-portable equipment where ample power is readily available, power-aware designs can offer competitive advantages with respect to such considerations as the size and cost of the power supply and cooling systems.
The majority of power considerations are exacerbated in the case of low-power designs. The increasing use of battery-powered portable (often wireless) electronic systems is driving the demand for IC and SoC devices that consume the smallest possible amounts of power.
Whenever the industry moves from one technology node to another, existing power constraints are tightened and new constraints emerge. Power-related constraints are now being imposed throughout the entire design flow in order to maximize the performance and reliability of devices. In the case of today’s extremely large and complex designs, implementing a reliable power network and minimizing power dissipation have become major challenges for design teams.
Creating optimal low-power designs involves making tradeoffs such as timing-versus-power and area-versus-power at different stages of the design flow. Successful power-sensitive designs require engineers to have the ability to accurately and efficiently perform these tradeoffs. In order to achieve this, engineers require access to appropriate low-power analysis and optimization engines, which need to be integrated with — and applied throughout — the entire RTL-to-GDSII flow.
Furthermore, in order to handle the complex interrelationships between diverse effects, it is necessary to use an integrated design environment in which all of the power tools are fully integrated with each other, and also with other analysis and implementation engines in the flow. For example, in order to fully account for the impact of voltage drop effects, it is important to have an environment that can derate for timing — on a cell-by-cell basis — based on actual voltage drops.
The timing analysis engine should then make use of this derated timing data to identify potential changes to the critical paths. In turn, the optimization engine should make appropriate modifications to address potential setup or hold problems that appear as a result of the timing changes.
This paper first describes the most significant power dissipation and distribution considerations. The requirements for a true low-power design environment that addresses these power considerations throughout the entire RTL-to-GDSII design flow are then introduced.
Power Dissipation Considerations
Dynamic power dissipation
These discussions assume the use of complementary metal oxide semiconductor (CMOS) devices, because this is currently the most prevalent digital IC implementation technology. Dynamic power dissipation occurs in logic gates that are in the process of switching from one state to another. During the act of switching, any internal capacitance associated with the gate’s transistors has to be charged, thereby consuming power. Of more significance, the gate also has to charge any external or load capacitances, which are comprised of parasitic wire capacitances and the input capacitances associated with any downstream logic gates.
Consider a simple inverter gate, in which only one of transistors T1 and T2 is usually on at any particular time (Figure 1). When the gate is in the process of switching from one state to another, however, both T1 and T2 will actually be on simultaneously for a fraction of a second. This causes a momentary short circuit between the VDD (logic 1, power) and VSS (logic 0, ground) rails, and the ensuing crowbar current results in a transitory power surge.
Figure 1 — When gate is switching, both transistors may be active simultaneously.
The amount of time the two transistors are simultaneously active is a function of their input switching thresholds and the slew (slope) of the input signal driving the gate.
One of the factors controlling the slew of the signal being presented to the inverter’s input is the size of the transistors forming the logic gate driving this signal. These need to be sufficiently large such that the signal transitions fast enough to keep the amount of time the inverter’s transistors are both active to a reasonable level (Figure 1b).
Now consider what happens if the driving gate’s transistors are too large and the driving gate is overpowered. In this case, the power savings achieved by minimizing the time where the inverter’s transistors are both on (Figure 1a) will be negated by the driving gate having to charge the increased capacitance associated with its over-sized transistors, thereby consuming excessive amounts of power. Furthermore, the extreme speed of the signal’s transitions will also cause signal integrity problems in the form of noise, overshoot, undershoot, and crosstalk.
By comparison, if the driving gate’s transistors are too small and the driving gate is underpowered, the inverter’s transistors will both be on for a significant amount of time (Figure 1c), thereby causing the inverter to consume unwarranted amounts of power (the under-driven input signal will also be susceptible to noise and crosstalk coupling effects from other signals).
Addressing dynamic power dissipation
For the purposes of this introductory paper, the amount of dynamic power dissipation may be represented using the following equation:
Dynamic Power = af x C x V2
af = Amount of activity as a function of the clock frequency (f)
C = Amount of capacitance being driven/switched
2
= The square of the supply voltage
This equation shows that the dynamic power dissipation may be reduced by minimizing the circuit activity and/or reducing the capacitance being driven and/or reducing the supply voltage.
One way to reduce the amount of switching activity is to reduce the frequency of the system clock. However, this will have a corresponding impact on the performance of the device. Another technique is to employ clock gating, which restricts the distribution of the clock to only those portions of the device that are actually performing useful tasks at that time. It is also possible to minimize local data activity (glitches and hazards) by applying appropriate delay balancing.
There are a number of ways in which the amount of capacitance may be reduced. One approach is to downsize the gates driving over-driven wires, thereby lowering the capacitances associated with these gates. Another technique is to use a power-aware placement algorithm to minimize the length of critical wires, which therefore reduces the size of their associated parasitic capacitances.
This power-aware placement should ideally be based on (or weighted by) the amount of switching activity associated with each wire. Yet another alternative is to exploit technology options such as using low-k dielectric (insulating) materials and low resistance/capacitance copper (Cu) tracks.
Lowering the supply voltage dramatically reduces a logic gate’s power consumption, but this also significantly reduces the switching speed of the gate. One solution is to use multiple voltage domains, which means having different areas of the chip running at different voltages. In this case, any performance-critical functions would be located in a higher voltage domain, while non-critical functions would be allocated to a lower voltage domain.
There are also interesting trade-offs that can be made between functional parallelism and frequency and/or voltage during the algorithmic and architectural stages of the design flow. For example, replacing one block of logic running at frequency ‘f’ and voltage ‘V’ with two copies of that block, each of which performs half of the task, and each of which is running at a lower frequency and/or a lower voltage. In this case, the total power consumption of this function may be reduced while maintaining performance at the expense of using more silicon real estate.
Static power dissipation
Static power dissipation is associated with logic gates when they are inactive — that is, not currently switching from one state to another. In this case, these gates should theoretically not be consuming any power at all. In reality, however, there is always some amount of leakage current passing through the transistors, which means they do consume a certain amount of power.
Even though the static power consumption associated with an individual logic gate is extremely small, the total effect becomes significant when we come to consider today’s ICs, which can contain tens of millions of gates. Furthermore, as transistors shrink in size when the industry moves from one technology node to another, the level of doping has to be increased, thereby causing leakage currents to become relatively larger.
The end result is that, even if a large portion of the device is totally inactive, it may still be consuming a significant amount of power. In fact, static power dissipation is expected to exceed dynamic power dissipation for many devices in the near future.
Addressing static power dissipation
There are two key equations that need to be considered when it comes to addressing static power dissipation. The first describes the leakage associated with the transistors:
Leakage = exp(-qVt/kT)
One important point about this equation is that it shows that static power dissipation has an exponential dependence on temperature (T). This means that as the chip heats up, its static power dissipation increases exponentially.
Another important point is that static power dissipation has an exponential dependence on the switching threshold of the transistors (Vt). In order to address low-power designs, IC foundries offer multiple Vt libraries. This means that each type of logic gate is available in two (or more) forms: with low-threshold transistors that switch quickly but have higher leakage and consume more power, or with high-threshold transistors that have lower leakage and consume less power but switch more slowly.
The second equation describes how the delay (switching time) associated with a transistor is affected by the switching threshold of that transistor (Vt) and the supply voltage to that transistor (VDD):
Delay = VDD x (VDD – Vt)-a
This means that engineers have to perform a complicated balancing act, because lowering the supply voltage reduces the amount of heat being generated, which in turn lowers the static power dissipation. However, lowering the supply voltage also increases gate delays. By comparison, lowering the transistors’ switching thresholds speeds them up, but this exponentially increases their leakage and therefore their static power dissipation.
One solution is to use multiple voltage domains as was introduced in the discussions on dynamic power dissipation above. Another option is to use low Vt transistors only on timing-critical paths, and to use high Vt transistors on non-critical paths. These two solutions may of course be used in conjunction.
Yet another technique is to selectively power-down leaking blocks using non-leaking transistors whenever those portions of the device are not required; for example, when those portions are placed in a “stand-by” mode. However, switching entire blocks on and off can cause dramatic current surges, which may require the use of additional circuitry to provide a “soft” (staged) power on/off for these blocks.
Figure 2 — Power distribution considerations include total power consumption, voltage drop, and electromigration effects.
Power Distribution Considerations
Packaging considerations
When it comes to power distribution, the first problem is to get the power from the outside world, through the device’s package, to the silicon chip itself. The wires used to distribute power throughout the chip have resistances associated with them — the longer the wires the larger the resistance, and the larger the resistance the greater the associated voltage drops. This means that traditional packaging technologies based on peripheral power pads are no longer an acceptable option in the case of today’s extremely large and complex designs.
The solution is to use a flip-chip packaging technology, in which pads located across the face of the die are used to deliver power from the external power supply directly to the internal areas of the chip. In addition to being able to support many more power and ground pads, this minimizes the distance the power has to travel to reach the internal logic. Furthermore, the inductance of the solder bumps used in flip-chip packages is significantly lower than that of the bonding wires used with traditional packaging techniques.
Temperature and performance considerations
Power consumption — both static and dynamic — increases a device’s operating temperature. In turn, this may require engineers to employ expensive device packaging and external cooling technology.
In order to accommodate variations in operating temperature and supply voltage, designers have traditionally been obliged to pad device characteristics and design margins. However, creating a device’s power network using excessively conservative design practices consumes valuable silicon real estate, increases congestion, and results in performance that is significantly below the silicon’s full potential. This is simply not an option in today’s highly competitive marketplace.
Yet another consideration is that the on-chip temperature gradient (the difference in temperatures at different portions of the device caused by unbalanced power consumption) can produce mechanical stress, which may degrade the device’s reliability.
Voltage drop effects
Deep submicron (DSM) and ultra-deep submicron (UDSM) devices are prone to voltage drop effects, which are caused by the resistance associated with the network of wires used to distribute power and ground from the external pins to the internal circuitry (in the case of DC related voltage drops, these are also often referred to as IR drop effects). Purely for the purposes of providing a simple example, consider a chain of inverter gates connected to the same power and ground tracks (Figure 3).
Figure 3 — A chain of inverters connected to the same power and ground tracks.
Every power and ground track segment has a small amount of resistance associated with it. This means that the logic gate closest to the IC’s primary power or ground pins (gate G1 in this example) is presented with the optimal supply. The next gate in the chain (G2 in this example) will be presented with a slightly degraded supply, and so on down the chain.
The problem is exacerbated in the case of transient or AC voltage drop effects. These occur when gates are switching from one value to another or — even worse — when entire blocks are switched on and off. This causes transitory power surges, which momentarily reduce the voltage supply to gates farther down the power supply chain.
The simple example circuit shown in Figure 3 consists only of inverter gates, but a real design typically contains tens of thousands of register (storage) elements triggered by a clock signal. The clock can cause large numbers of register elements to switch simultaneously, resulting in significant “glitches” in the power supply. In order to analyze and address these effects, it is necessary to take resistive, inductive, and capacitive effects into account.
The reason voltage drop effects are so important is that the input-to-output delays across a logic gate increase as the voltage supplied to that gate is reduced, which can cause the gate to miss its timing specifications. There is also an increase in the interconnect delays associated with wires driven by underpowered gates. Furthermore, a gate’s input switching thresholds are modified when its supply is reduced, which causes that gate to become more susceptible to noise.
Voltage drop effects are becoming increasingly significant, because the resistivity of the power and ground tracks rises as a function of decreasing feature sizes (track widths). These effects can be minimized by increasing the width of power and ground tracks, but this consumes valuable real estate on the silicon, which typically causes routing congestion problems.
In order to solve these problems, the logic functions have to be spaced farther apart, which increases delays (and power consumption) due to longer signal tracks. Thus, implementing an optimal power network requires the balancing of many diverse factors.
Electromigration effects
Electromigration occurs when the current density (current per cross-sectional area) in tracks is too high. In the case of power and ground tracks, electromigration effects are DC-based. The so-called “electron wind” induced by the current flowing through a track causes metal ions in the track to migrate. This migration creates “voids” in the “upwind” direction, while metal ions can accumulate “downwind” to form features called “hillocks” and “whiskers.”
Electromigration in power and ground tracks causes timing problems, because the increased track resistance associated with a void can result in a corresponding voltage drop. This will, in turn, cause increased delays and noise susceptibility in affected logic gates as discussed above.
Power and ground electromigration can also cause major functional errors to occur, because the voids may eventually lead to open circuits while the hillocks and whiskers may cause short circuits to neighboring wires.
Requirements for a true low-power design environment
RTL-to-GDSII
The majority of today’s design environments concentrate on analyzing and addressing power considerations towards the back end of the physical portion of the design process. This makes it almost impossible to fix any problems caused by poor decisions made during the early stages of the design.
A key requirement for a true low-power design environment is to provide early analysis of effects like voltage drop using whatever data is available at the time, and to then successively refine the analysis as more accurate data becomes available. This allows potential problems to be identified and resolved as soon as possible.
Creating optimal low-power designs involves making tradeoffs such as timing-versus-power and area-versus-power at different stages of the design flow. In order to enable designers to accurately and efficiently perform these tradeoffs, it is necessary for low-power optimization techniques to be integrated with, and applied throughout, the entire RTL-to-GDSII flow.
Power-aware design optimization techniques
There are a wide variety of power-aware design optimization techniques that can be brought into play. During the early (pre-synthesis) stages of the design, the RTL can be modified to employ architectural optimizations, such as replacing a single instantiation of a high-powered logic function with multiple instantiations of low-powered equivalents. The design may also be partitioned for implementation in multiple voltage (VDD) domains, and power-aware clock gating techniques can be automatically applied.
Following synthesis, power-aware mapping techniques may be used to optimize the netlist. These techniques include mapping highly active nodes into specific cells and mapping highly active input signals onto low capacitance input pins. When partitioning the design into multiple voltage (VDD) domains, appropriate level shifter elements need to be inserted into the netlist to connect logic elements across multiple domains. Furthermore, signals to and from domains that may be switched on and off require special attention so as to avoid any “floating net” problems.
A key element in the power-aware design process is to perform appropriate timing optimizations. This means that it is necessary to perform domain-based timing and power analysis throughout the flow. Furthermore, such analysis needs to account for delay increases caused by cell-specific voltage drops in the power rails. Potential optimizations include optimal sizing of gates using a gain-based synthesis flow and automatic selection of low and high thresholds when multi-Vt libraries are available.
Advanced techniques also enable optimization for power during floorplaning and placement. In order to correctly implement multiple voltage domains, it is necessary to separate the different power meshes for each domain. Power-aware cell placement based on weighting nets according to their activity can be used to minimize dynamic power consumption. The results from early voltage drop analysis can be used to determine better locations for any buffers that are to be inserted. And advanced clustering techniques can be applied to clock trees to reduce power consumption.
Appropriate on-chip decoupling capacitors should be added to minimize the inductive voltage drop effects caused by off-chip current variations over time. In order to lower the current-per-pad and bond-wire inductance, many pads are allocated for power and ground, thereby making the analysis of pad placement a non-trivial task. Flip-chip packaging technologies can be used to increase the number of pads connected to the power and ground supplies, thereby lowering the current-per-pad and also lowering the inductance.
The process of designing the power distribution network should be based on the results of early rail analysis performed when the power grids are still incomplete. Correct distribution of dissipating elements across the chip can avoid hot-spots and local voltage drop problems, and special wire-widening algorithms can be used to address voltage drop and electromigration issues.
Integrated tool suite
There are a number of very sophisticated power analysis tools available to designers. However, these tools are typically provided as third-party point-solutions that are not tightly integrated into the main design environment. These tools either require the use of multiple databases or they combine disparate data models into one database. This means that design environments based on these tools have to perform internal or external data translations and file transfers, making data management cumbersome, time-consuming, and prone to error.
Correlating results from different point-tools can be difficult, which means that problems may be discovered late in the design cycle or may never be detected at all. Perhaps the most significant problem with existing design environments, however, is that power, timing, and signal integrity effects are strongly interrelated in the nanometer domain, but conventional point-solution design tools do not have the capability to consider all of these effects and their interrelationships concurrently.
In order to fully account for the impact of voltage drop effects, for example, it is important to have an environment that can derate for timing — on a cell-by-cell basis — based on actual voltage drops. The timing analysis engine should then make use of this derated timing data to identify potential changes to the critical paths.
In turn, the optimization engine should make appropriate modifications to address potential setup or hold problems that appear as a result of the timing changes. This requires a design environment in which the power analysis, voltage drop analysis, derating calculations, timing analysis, and optimization engines all work seamlessly together. In the absence of such an integrated environment, one would have to transfer huge amounts of data (such as SDF files) between the different point tools and iterate between them in order to address the timing problems caused by voltage-drop-induced delays.
The lack of integration between power analysis tools and the rest of the environment can result in a tremendous amount of “false errors,” such as minor voltage drops in portions of the design that won’t affect the performance or functionality of the device. Engineers often overcompensate for these false errors and modify the power grid unnecessarily. In turn, this can cause these portions of the design to fail to meet their area or timing constraints and to become congested, and compensating for this can cause ripple effects throughout the rest of the design.
Even worse, the lack of integration between power analysis tools and the rest of the environment means that, when the results from the power analysis are used to locate and isolate timing and/or signal integrity problems, the act of fixing these problems may introduce new problems into the power network. This can result in numerous, time-consuming design iterations.
Ultimately, using point-solution power analysis tools can result in non-convergent solutions that prevent designs from meeting their time-to-market windows (or from being realized at all). Thus, a true low-power design environment should have all of the power analysis tools operating concurrently with the implementation tools, including synthesis, place-and-route, clock-tree, extraction, timing, and signal integrity analysis. Furthermore, all of the tools in the environment should operate on a common data model so as to provide them with concurrent access to analysis data and enable “on-the-fly” changes to the design.