Design Planning

Efficient design implementation of any ASIC requires an appropriate style or planning approach that enhances the implementation cycle time and allows the design goals such as area and performance to be met. There are two style small to medium ASIC’s, flattening the design is most suited; for very large and/or concurrent ASIC designs, partitioning the design into subdesigns, or hierarchical style, is preferred.

The flat implementation style provides better area usage and requires effort during physical design and timing closure compared to the hierarchical style.

The area advantage is mainly due to there being no need to reserve extra space around each subdesign partition for power, ground, and resources for the routing.

Timing analysis efficiencies arise from the fact that the entire design can be analyzed at once rather than analyzing each sub-circuit separately and then analyzing the assembled design later.

The disadvantage of this method is that it requires a large memory space for data and run time increases rapidly with design size.

The hierarchical implementation style is mostly used for very large and/or concurrent ASIC designs where there is a need for a substantial amount of performance degradation is mainly because the components forming the critical path may reside in different partitions within the design thereby partitioned logically or physically.

Logical partitioning takes place in the early stages of ASIC design (i.e. RTL coding). The design is partitioned according to its logical functions, as well as physical constraints, such as interconnectivity to other partitions or subcircuits within the design. In logical partitioning, each partition is place-and-routed separately and is placed as a macro, or block, at the ASIC top level.

Physical partitioning is performed during the physical design activity. Once the entire ASIC design is imported into physical design tools, partitions can be created which combine several subcircuits, or a large circuit can be partitioned into several subcircuits. Most often, these partitions are formed.

Reporting information on vias in Innovus/EDI

Problem

How can I report information on vias in Innovus/EDI?

Solution

There are several commands or scripts you can use to report data on vias in the design depending on your needs. This Article describes the most common methods:

Index:

  • Report Number of Vias Per Layer
  • Report Number of Vias Per Layer including multi-cut breakdown
  • Report Number of Occurrences of each Via Type
  • Report Number of Vias for specific net, area and/or type
  • Report Information Based on Via Cell

Report Number of Vias Per Layer:

Use reportRoute to report the number of vias on each layer:

innovus> reportRoute

At the end of the report is a via summary:

Total length: 1.746e+06um, number of vias: 597459
M1(V) length: 3.090e+04um, number of vias: 193642
M2(H) length: 3.053e+05um, number of vias: 223966
M3(V) length: 5.161e+05um, number of vias: 101703
M4(H) length: 3.809e+05um, number of vias: 35161

Report Number of Vias Per Layer including multi-cut breakdown:

Run the following command to report the single versus multi-cut via percentage for each layer. Use the -selected_net_only option to only report via information for selected nets.

innovus> pdi report_design -wire

Example output:

#Up-Via Summary (total 52663):
#                   single-cut          multi-cut      Total
#———————————————————–
#  Metal 1       20725 ( 98.6%)       285 (  1.4%)      21010
#  Metal 2       21858 (100.0%)         0 (  0.0%)      21858
#  Metal 3        7258 (100.0%)         0 (  0.0%)       7258
#  Metal 4        2475 ( 99.1%)        23 (  0.9%)       2498
#  Metal 5           0 (  0.0%)        39 (100.0%)         39
#———————————————————–
#                52316 ( 99.3%)       347 (  0.7%)      52663
#
#
#Vias used for rule ‘DEFAULT’
# VIA12_1cut_V              18770
# VIA12_1cut                 1940
# VIA12_2cut_N                148
# VIA12_2cut_S                101
# VIA12_2cut_E                 28
# VIA12_1cut_FAT_V             13
# VIA12_2cut_W                  6
# VIA12_2cut_HS                 2
# VIA23_1cut                13916
# VIA23_1stack_N             6496
# VIA23_1stack_S             1432
# VIA23_1cut_V                 12
# VIA23_1cut_FAT_C              2
# VIA34_1cut                 5285
# VIA34_1stack_E             1669
# VIA34_1stack_W              304
# VIA45_1cut                 1543
# VIA45_1stack_N              792
# VIA45_1stack_S              137
# VIA45_2cut_N                 16
# VIA45_2cut_S                  7
# VIA45_1cut_V                  3
# VIA56_2cut_E                 25
# VIA56_2cut_W                 14

Report Number of Occurrences of each Via Type:

Use the TCL script userRptViaStats.tcl provided with the Innovus software to report the number of times each via is used.

innovus> source <innovus_install>/share/fe/gift/scripts/tcl/userRptViaStats.tcl
innovus> userRptViaStats

The report will be as follows:

VIA03_N :               18198
VIA01_WW_r90 :          909
VIA03_BNP :             1
VIA01_BAR_N :           6973
VIA07_N :               3638
VIA05_BNP :             21
VIA01_LRG :             1653
VIA03_S :               16112
VIA01_3_13 :            1233
VIA02_3_13_BNP :        572
VIA07_BNP :             7
VIA03_LRG :             12
VIA01_BAR_S :           5604

Report Number of Vias for specific net, area and/or type:

Lastly, you can select vias based on net, area and/or type and report the number selected. Following is an example:

1. Step 1 is to select the desired vias using the command editSelectVia. editSelectVia has several options for selecting vias based on different criteria such as area, net, net type, cut layers, shapes and status. The following example selects vias in a given area on netA that are on signal nets:

editSelectVia -area 0 0 1500 1500 -nets netA -type signal

2. Next, store the selected vias in a variable using the dbGet command:

set via_list [dbGet selected]

3. Lastly, use TCL commands to report the number of vias in the list:

set via_count [llength $via_list]
puts $via_count

If user wants the details of each via he can write a report of the selected objects:

getReport{reportSelect} > via.rpt

Alternatively, you can also use ‘dbQuery –area <area coordinates> -objType viaInst’ to get the pointers of vias in a given area. For example :

foreach viaPtr [dbQuery -objType viaInst -area 0 0 500 500] {

puts “NET: [dbGet $viaPtr.net.name] VIA: [dbGet $viaPtr.via.name]”

}

Report Information Based on Via Cell:

To report information on a specific via cell, use the following dbGet commands:

For vias on special nets:

dbGet -p2 top.nets.sVias.via.name myViaName

For vias on regular nets:

dbGet -p2 top.nets.vias.via.name myViaName

These commands will return a pointer to each instance of the specified via type. You can assign this to a variable, query on its properties, and develop more elaborate scripts to meet your needs.

FinFET impact on dynamic power

FinFET transistors are now in production at the major foundries, having gone from drawing board to products on the shelf in record time. FinFET adoption has been growing steadily because they deliver better power, performance, and area compared to their planar counterparts. This makes them very compelling for smartphones, tablets, and other products that require long battery life and snappy performance. Figure 1 shows the advantages in speed, power usage, and density of TSMC’s 16nm FinFET process over two other processes.

Figure 1. FinFET performance, power, and area advantages (Source: TSMC. Presented at Open Innovation Platform 2014)

When Intel first used FinFETs at the 22nm node, they claimed 37% better performance (at the same total power) or 50% power reduction (at the same speed) than bulk, PDSOI, or FDSOI. These numbers are compelling, and continue to improve even down to 14nm, and presumably, beyond.

In terms of power usage, controlling power leakage has been a huge challenge for planar devices, especially at smaller nodes. By raising the channel and wrapping the gate around it, finFETs create a fully depleted channel to overcome the leakage problems of planar transistors. The better channel control of FinFETs leads to lower threshold and supply voltages.

While leakage is under control in FinFETs, dynamic power consumption accounts for a significant chunk of the total power. FinFETs have higher pin capacitances compared to planar transistors, which results in higher dynamic power numbers. According to Cavium networks “FinFETs bring a 66 percent increase in gate capacitance per micron compared to 28 nm process, and at the same level of the 130-nm planar node.” Figure 2 charts the gate capacitances of planar and FinFET devices.

 

Figure 2. FinFET gate capacitance compared to planar processes (Source: Cavium Networks)

So what does this mean to the design engineer and how does it change the design flow from an implementation perspective? Dynamic (aka switching) power needs to become a cost function during optimization and has to be considered at all the stages of the flow.

FinFETs add to the complexity of physical design flow. Tighter design rules and FinFET process requirements, such as voltage threshold-aware spacing, implant layer rules, etc., impose restrictions on synthesis, placement, floorplanning, and optimization engines that directly impact design metrics. And because FinFETs are being implemented at 16/14 nm, multi-patterning automatically becomes a part of any design using FinFETs, which adds yet another layer of complexity.

Design automation technologies for FinFETs need to be finFET-aware to reduce switching power and offer capabilities such as power-aware RTL synthesis, activity-driven placement and optimization, CTS (clock tree synthesis) power reduction, and concurrent optimization of both dynamic and leakage. Power optimization needs to start early in the design flow and the architecture selection needs to be power friendly to ensure lowest power when the design is realized.

The digital implementation process starts with RTL synthesis. Since FinFETs are used in the newest, largest designs, the RTL synthesis engine must have the capacity to handle 100+ million gates with reasonable runtimes. Of course, it must also deliver high-quality results, which can be achieved by running RTL synthesis at the full-chip level when all aspects of the chip can be taken into account. It also helps to be able to run multiple synthesis jobs with different design constraints to explore design alternatives. Having visibility on how the design metrics affect one another lets you make smart trade-offs to meet power, performance, and area metrics.

In order to meet power goals, your implementation flow needs to employ a variety of power-reduction strategies, starting from synthesis and continuing through the physical design flow. The most common strategies include multi-threshold libraries, clock gating, muti-corner/multi-mode (MCMM) power optimization, pin swapping, register clumping, remapping, and power-density driven placement. RTL-level power analysis is essential to analyze and fix power problems early in the design flow.  Ability to cross probe between RTL, and layout will help identify and debug problems early in the design flow and minimize last minute surprises. As mentioned earlier, power optimization needs to be done in all stages of the design flow and should be done concurrently with other design metrics, such as performance and area. The optimization engine should include dynamic power in its costing and employ transforms such as sizing cells, deleting cells or moving cell to reduce switching wire capacitance.

Design implementation tools for advanced nodes that utilize FinFETs must be enhanced and updated with close cooperation from the various foundries. A lot of engineering partnership goes on between the foundries, EDA companies, and mutual customers so that chip designers can take full advantage of each new process node.

FinFETs are already in production and have delivered on the promise of scalability, performance, and leakage power, but had added a lot more complexity to the design implementation flow. FinFET-aware design implementation and effective dynamic power control throughout the flow is critical to unleash the full potential of these 3D devices.

Floorplanning: concept, challenges, and closure

In today’s world, there is an ever-increasing demand for SOC speed, performance, and features. To cater to all those needs, the industry is moving toward lower technology nodes. The current market has become more and more demanding, in turn forcing complex architectures and reduced time to market. The complex integrations and smaller design cycle emphasize the importance of floorplanning, i.e., the first step in netlist-to-GDSII design flow. Floorplanning not only captures designer’s intent, but also presents the challenges and opportunities that affect the entire design flow, from design to implementation and chip assembly.

A typical SOC can include many hard- and soft-IP macros, memories, analog blocks, and multiple power domains. Because of the increases in gate count, power domains, power modes, and special architectural requirements, most SOCs these days are hierarchical designs. The SOC interacts with the outside world through sensors, antennas, displays, and other elements, which introduce a lot of analog component in the chip. All of these limitations directly result in various challenges in floorplanning.

Floorplanning includes macro/block placement, design partitioning, pin placement, power planning, and power grid design. What make the job more important is that the decisions taken for macro/block placement, partitioning, I/O-pad placement, and power planning directly or indirectly impact the overall implementation cycle.

Lots of iterations happen to get an optimum floorplan. The designer takes care of the design parameters, such as power, area, timing, and performance during floorplanning. These estimations are repeatedly reviewed, based on the feedback of other stakeholders such as the implementation team, IP owners, and RTL designers. The outcome of floorplanning is a proper arrangement of macros/blocks, power grid, pin placement, and partitioned blocks that can be implemented in parallel.

In hierarchical designs, the quality of the floorplan is analyzed after the blocks are integrated at the top level. That can results in unnecessary iterative work, wasted resource hours, and longer cycle times, which could mean missed market opportunities. This underscores the importance of floorplanning.
In this paper, we will discuss some of the good practices, techniques, and complex cases that arise while floorplanning in an SOC.

The first rule of thumb for floorplanning is to arrange the hard macros and memories in such a manner that you end up with a core area (to be used for SOG placement) square in shape. This is always not possible, however, because of the large number of analog-IP blocks, memories, and various other requirements in design.

Before going into the details of floorplanning, here are few general terms that the designer has to understand:

  1. Track: Track is a virtual guideline/path for the tool at which the signal routing happens in an SOC design. Tracks are defined for each metal layer in both preferred and non-preferred directions, which are used by the router. The router routes the signal assuming the track to be at the center of metal piece.
  2. Row: This is the area defined for standard-cell placement in the design. A row height is based on the height of the standard cells used in design. There can be rows of various sites/heights in the design based on the type of standard cells used.
  3. Guide: A module guide is the guided placement of a logical module structure in the design. The guide is a soft constraint. Some of the module guide logic can get placed outside the guide, and other logical module logic can be placed in the guide region.
  4. Region: The region is a hard constraint in the design, and the design for the module is self-contained inside the physical boundary of region. However, it is possible for outside modules to have some logic placed inside the region boundary.
  5. Fence: This is a hard constraint specifying that only the design module can be placed inside the physical boundary of fence. No outside module logic can be placed inside the fence boundary.
  6. Halo: The halo/obstruction is the placement blockage defined for the standard cells across the boundary of macros.
  7. Routing blockage: Routing blockage is the obstruction for metal routing over the defined area.
  8. Partial blockage: This is the porous obstruction guideline for standard-cell placement. It is very helpful in keeping a check on placement density to avoid congestion issues at later stages of design. For example, if the designer has put a partial placement blockage of 40% over an area, then the placement density is restricted to a maximum value of 60% in the area.
  9. Buffer blockage/soft blockage: This is a type of placement obstruction in which only buffer cells can be placed while optimization or legalization. No other standard-cell placement is allowed in the specified area during placement, but while legalization and optimization some cells can be placed in this region.

Let us now explore the various considerations and special scenarios in an SOC design one by one.
Plan your partitions depending upon architectural requirements and different power modes in an SOC.
Today, the design approach is shifting toward hierarchical closure. The hierarchical approach is also derived by the architectural requirements, such as safety; tool limitations due to higher gate counts; late IP deliverables, and different power modes in an SOC. The hierarchically partitioned blocks are implemented independently in terms of placement, routing, timing, and noise closure.

When partitions are merged at the top level, there are cases of many top-level nets detouring across the boundary of these partitions, resulting in unnecessary timing violations, wasted routing resources due to large net lengths, and buffer insertion to avoid DRVs (design-rule violations) on these nets. All these, in turn, result in increased power consumption and increased placement density. It may also include some critical nets like clock nets which need to meet particular latency targets. These issues often result in reopening the timing and routing closed partitions, finally hitting in terms of the design cycle and design resources.

So the floorplanner has to plan at the beginning only for providing metal channels for top-level routes. The routing resources used inside the partition block can’t be used for routing at the top level. One has to define certain routing resource chunks inside the partition block that are not used inside the block and hence can be used for top-level routing. The early decision for providing routing chunks avoids iteration at later stages of design. Figure 1 explains the scenarios.

Clock Tree Synthesis

Clock Tree Synthesis

  • Clock Tree Synthesis is a process which makes sure that the clock gets distributed evenly to all sequential elements in a design.
  • The goal of CTS is to minimize the skew and latency.
  • The placement data will be given as input for CTS, along with the clock tree constraints.
  • The clock tree constraints will be Latency, Skew, Maximum transition, Maximum capacitance, Maximum fan-out, list of buffers and inverters etc.
  • The clock tree synthesis contains clock tree building and clock tree balancing.
  • Clock tree can be build by clock tree inverters so as to maintain the exact transition (duty cycle) and clock tree balancing is done by clock tree buffers (CTB) to meet the skew and latency requirements.
  • Less clock tree inverters and buffers should be used to meet the area and power constraints.
  • There can be several structure for clock tree:
  • H-Tree
  • X-Tree
  • Multi level clock tree
  • Fish bone
  • Once the CTS is done than we have to again check the timing.
  • The outputs of clock tree synthesis are Design Exchange Format (DEF), Standard Parasitic Exchange Format (SPEF), and Netlist etc.
NOTES:
  • The normal inverters and buffers are not used for building and balancing because, the clock buffers provides a better slew and better drive capability when compared to normal buffers and clock inverters provides a better balance with rise and fall times and hence maintaining the 50% duty cycle.
  • Effects of CTS: Many clock buffers are added, congestion may increase, crosstalk noise, crosstalk delay etc.
  • Clock tree optimizations: It is achieved by buffer sizing, gate sizing, HFN synthesis, Buffer relocation.
Set Up Fixing:
  1. Upsizing the cells (increase the drive strength) in data path.
  2. Pull the launch clock
  3. Push the capture clock
  4. We can reduce the buffers from datapath .
  5. We can replace buffers with two inverters placing farther apart so that delay can adjust.
  6. We can also reduce some larger than normal capacitance on a cell output pin.
  7. We can upsize the cells to decrease the delay through the cell.
  8. LVT cells
Hold Fixing:
It is well understood hold time will be large if data path has more delay. So we have to add more delays in data path.
  1. Downsizing the cells (decrease the drive strength) in data path.
  2. Pulling the capture clock.
  3. Pushed the launch clock.
  4. By adding buffers/Inverter pairs/delay cells to the data path.
  5. Decreasing the size of certain cells in the data path, It is better to reduce the cells n capture path closer to the capture flip flop because there is less chance of affecting other paths and causing new errors.
  6. By increasing the wire load model, we can also fix the hold violation.
Transition violation
In some cases, signal takes too long transiting from one logic level to another, than a transition violation is caused. The Trans violation can be because of node resistance and capacitance.
  1. By upsizing the driver cell.
  2. Decreasing the net length by moving cells nearer (or) reducing long routed net.
  3. By adding Buffers.
  4. By increase the width of the route at the violation instance pin. This will decrease the resistance of the route and fix the transition violation.
Cap violation
The capacitance on a node is a combination of the fan-out of the output pin and capacitance of the net. This check ensures that the device does not drive more capacitance than the device is characterized for.
  1. The violation can be removed by increasing the drive strength of the cell.
By buffering the some of the fan-out paths to reduce the capacitance seen by the output pin.

Removing Shorts Over Macros

Question:
After detail routing, I see a lot of shorts in my design. All of them are over macros. I tried to fix these shorts by running the route_opt –incremental command, but this did not work. How can I fix these shorts?
Answer:
The router uses an internal cost function to determine whether to detour around macros or route over them. If there are many DRC violations over macros, you should review the factors that can cause DRC violations, such as the floorplan of the design, the cell density and congestion near the macros, and the FRAM completeness of the macro. If there are many shorts over macros, you should not use router options to fix them. This can lead to further DRC divergence around the macros and longer runtime.
To fix the shorts, use the following commands:
To fix the shorts, use the following settings:
Force the maximum number of routing iterations during detail routing by setting the ‐
force_max_number_iterations option of the set_route_zrt_detail_options command to true.
icc_shell> set_route_zrt_detail_options ‐force_max_number_iterations true
By default, Zroute might perform fewer than the maximum number of iterations if the DRC
violations do not converge.
Set the effort level for fixing shorts over macros to high by setting the ‐
repair_shorts_over_macros_effort_level option of the the set_route_zrt_detail_options command to high. When the effort level is high, any shorts over macros that remain after detail routing can trigger deletion and full ECO rerouting of the involved nets.
icc_shell> set_route_zrt_detail_options ‐repair_shorts_over_macros_effort_level high
The route_opt ‐incremental command runs onroute optimization and ECO routing with signal integrity; it is not intended to fix the shorts over macros.

Low-power design techniques span RTL-to-GDSII flow

With today’s increasingly large and complex digital IC and system-on-chip (SoC) designs, design power closure and circuit power integrity are starting to become one of the main engineering challenges, thereby impacting the device’s total time-to-market.

The shear amount of power consumed by some devices can cause significant design problems. For example, a recently announced CPU consumes 100 amps at 1.3 volts, which equates to 130 Watts! This class of device requires expensive packaging and heat sinks. The heat gradient across the chip can cause mechanical stress leading to early breakdown, and the act of physically delivering all of this power into the chip is non-trivial. Thus, even in the case of devices intended for use in non-portable equipment where ample power is readily available, power-aware designs can offer competitive advantages with respect to such considerations as the size and cost of the power supply and cooling systems.

The majority of power considerations are exacerbated in the case of low-power designs. The increasing use of battery-powered portable (often wireless) electronic systems is driving the demand for IC and SoC devices that consume the smallest possible amounts of power.

Whenever the industry moves from one technology node to another, existing power constraints are tightened and new constraints emerge. Power-related constraints are now being imposed throughout the entire design flow in order to maximize the performance and reliability of devices. In the case of today’s extremely large and complex designs, implementing a reliable power network and minimizing power dissipation have become major challenges for design teams.

Creating optimal low-power designs involves making tradeoffs such as timing-versus-power and area-versus-power at different stages of the design flow. Successful power-sensitive designs require engineers to have the ability to accurately and efficiently perform these tradeoffs. In order to achieve this, engineers require access to appropriate low-power analysis and optimization engines, which need to be integrated with — and applied throughout — the entire RTL-to-GDSII flow.

Furthermore, in order to handle the complex interrelationships between diverse effects, it is necessary to use an integrated design environment in which all of the power tools are fully integrated with each other, and also with other analysis and implementation engines in the flow. For example, in order to fully account for the impact of voltage drop effects, it is important to have an environment that can derate for timing — on a cell-by-cell basis — based on actual voltage drops.

The timing analysis engine should then make use of this derated timing data to identify potential changes to the critical paths. In turn, the optimization engine should make appropriate modifications to address potential setup or hold problems that appear as a result of the timing changes.

This paper first describes the most significant power dissipation and distribution considerations. The requirements for a true low-power design environment that addresses these power considerations throughout the entire RTL-to-GDSII design flow are then introduced.

Power Dissipation Considerations

Dynamic power dissipation

These discussions assume the use of complementary metal oxide semiconductor (CMOS) devices, because this is currently the most prevalent digital IC implementation technology. Dynamic power dissipation occurs in logic gates that are in the process of switching from one state to another. During the act of switching, any internal capacitance associated with the gate’s transistors has to be charged, thereby consuming power. Of more significance, the gate also has to charge any external or load capacitances, which are comprised of parasitic wire capacitances and the input capacitances associated with any downstream logic gates.

Consider a simple inverter gate, in which only one of transistors T1 and T2 is usually on at any particular time (Figure 1). When the gate is in the process of switching from one state to another, however, both T1 and T2 will actually be on simultaneously for a fraction of a second. This causes a momentary short circuit between the VDD (logic 1, power) and VSS (logic 0, ground) rails, and the ensuing crowbar current results in a transitory power surge.


Figure 1 — When gate is switching, both transistors may be active simultaneously.

The amount of time the two transistors are simultaneously active is a function of their input switching thresholds and the slew (slope) of the input signal driving the gate.

One of the factors controlling the slew of the signal being presented to the inverter’s input is the size of the transistors forming the logic gate driving this signal. These need to be sufficiently large such that the signal transitions fast enough to keep the amount of time the inverter’s transistors are both active to a reasonable level (Figure 1b).

Now consider what happens if the driving gate’s transistors are too large and the driving gate is overpowered. In this case, the power savings achieved by minimizing the time where the inverter’s transistors are both on (Figure 1a) will be negated by the driving gate having to charge the increased capacitance associated with its over-sized transistors, thereby consuming excessive amounts of power. Furthermore, the extreme speed of the signal’s transitions will also cause signal integrity problems in the form of noise, overshoot, undershoot, and crosstalk.

By comparison, if the driving gate’s transistors are too small and the driving gate is underpowered, the inverter’s transistors will both be on for a significant amount of time (Figure 1c), thereby causing the inverter to consume unwarranted amounts of power (the under-driven input signal will also be susceptible to noise and crosstalk coupling effects from other signals).

Addressing dynamic power dissipation

For the purposes of this introductory paper, the amount of dynamic power dissipation may be represented using the following equation:

Dynamic Power = af x C x V2

      Where:
      af = Amount of activity as a function of the clock frequency (f)
      C = Amount of capacitance being driven/switched
      V

2

      = The square of the supply voltage

 

This equation shows that the dynamic power dissipation may be reduced by minimizing the circuit activity and/or reducing the capacitance being driven and/or reducing the supply voltage.

One way to reduce the amount of switching activity is to reduce the frequency of the system clock. However, this will have a corresponding impact on the performance of the device. Another technique is to employ clock gating, which restricts the distribution of the clock to only those portions of the device that are actually performing useful tasks at that time. It is also possible to minimize local data activity (glitches and hazards) by applying appropriate delay balancing.

There are a number of ways in which the amount of capacitance may be reduced. One approach is to downsize the gates driving over-driven wires, thereby lowering the capacitances associated with these gates. Another technique is to use a power-aware placement algorithm to minimize the length of critical wires, which therefore reduces the size of their associated parasitic capacitances.

This power-aware placement should ideally be based on (or weighted by) the amount of switching activity associated with each wire. Yet another alternative is to exploit technology options such as using low-k dielectric (insulating) materials and low resistance/capacitance copper (Cu) tracks.

Lowering the supply voltage dramatically reduces a logic gate’s power consumption, but this also significantly reduces the switching speed of the gate. One solution is to use multiple voltage domains, which means having different areas of the chip running at different voltages. In this case, any performance-critical functions would be located in a higher voltage domain, while non-critical functions would be allocated to a lower voltage domain.

There are also interesting trade-offs that can be made between functional parallelism and frequency and/or voltage during the algorithmic and architectural stages of the design flow. For example, replacing one block of logic running at frequency ‘f’ and voltage ‘V’ with two copies of that block, each of which performs half of the task, and each of which is running at a lower frequency and/or a lower voltage. In this case, the total power consumption of this function may be reduced while maintaining performance at the expense of using more silicon real estate.

Static power dissipation

Static power dissipation is associated with logic gates when they are inactive — that is, not currently switching from one state to another. In this case, these gates should theoretically not be consuming any power at all. In reality, however, there is always some amount of leakage current passing through the transistors, which means they do consume a certain amount of power.

Even though the static power consumption associated with an individual logic gate is extremely small, the total effect becomes significant when we come to consider today’s ICs, which can contain tens of millions of gates. Furthermore, as transistors shrink in size when the industry moves from one technology node to another, the level of doping has to be increased, thereby causing leakage currents to become relatively larger.

The end result is that, even if a large portion of the device is totally inactive, it may still be consuming a significant amount of power. In fact, static power dissipation is expected to exceed dynamic power dissipation for many devices in the near future.

Addressing static power dissipation

There are two key equations that need to be considered when it comes to addressing static power dissipation. The first describes the leakage associated with the transistors:

Leakage = exp(-qVt/kT)

One important point about this equation is that it shows that static power dissipation has an exponential dependence on temperature (T). This means that as the chip heats up, its static power dissipation increases exponentially.

Another important point is that static power dissipation has an exponential dependence on the switching threshold of the transistors (Vt). In order to address low-power designs, IC foundries offer multiple Vt libraries. This means that each type of logic gate is available in two (or more) forms: with low-threshold transistors that switch quickly but have higher leakage and consume more power, or with high-threshold transistors that have lower leakage and consume less power but switch more slowly.

The second equation describes how the delay (switching time) associated with a transistor is affected by the switching threshold of that transistor (Vt) and the supply voltage to that transistor (VDD):

Delay = VDD x (VDD – Vt)-a

This means that engineers have to perform a complicated balancing act, because lowering the supply voltage reduces the amount of heat being generated, which in turn lowers the static power dissipation. However, lowering the supply voltage also increases gate delays. By comparison, lowering the transistors’ switching thresholds speeds them up, but this exponentially increases their leakage and therefore their static power dissipation.

One solution is to use multiple voltage domains as was introduced in the discussions on dynamic power dissipation above. Another option is to use low Vt transistors only on timing-critical paths, and to use high Vt transistors on non-critical paths. These two solutions may of course be used in conjunction.

Yet another technique is to selectively power-down leaking blocks using non-leaking transistors whenever those portions of the device are not required; for example, when those portions are placed in a “stand-by” mode. However, switching entire blocks on and off can cause dramatic current surges, which may require the use of additional circuitry to provide a “soft” (staged) power on/off for these blocks.


Figure 2 — Power distribution considerations include total power consumption, voltage drop, and electromigration effects.

Power Distribution Considerations

Packaging considerations

When it comes to power distribution, the first problem is to get the power from the outside world, through the device’s package, to the silicon chip itself. The wires used to distribute power throughout the chip have resistances associated with them — the longer the wires the larger the resistance, and the larger the resistance the greater the associated voltage drops. This means that traditional packaging technologies based on peripheral power pads are no longer an acceptable option in the case of today’s extremely large and complex designs.

The solution is to use a flip-chip packaging technology, in which pads located across the face of the die are used to deliver power from the external power supply directly to the internal areas of the chip. In addition to being able to support many more power and ground pads, this minimizes the distance the power has to travel to reach the internal logic. Furthermore, the inductance of the solder bumps used in flip-chip packages is significantly lower than that of the bonding wires used with traditional packaging techniques.

Temperature and performance considerations

Power consumption — both static and dynamic — increases a device’s operating temperature. In turn, this may require engineers to employ expensive device packaging and external cooling technology.

In order to accommodate variations in operating temperature and supply voltage, designers have traditionally been obliged to pad device characteristics and design margins. However, creating a device’s power network using excessively conservative design practices consumes valuable silicon real estate, increases congestion, and results in performance that is significantly below the silicon’s full potential. This is simply not an option in today’s highly competitive marketplace.

Yet another consideration is that the on-chip temperature gradient (the difference in temperatures at different portions of the device caused by unbalanced power consumption) can produce mechanical stress, which may degrade the device’s reliability.

Voltage drop effects

Deep submicron (DSM) and ultra-deep submicron (UDSM) devices are prone to voltage drop effects, which are caused by the resistance associated with the network of wires used to distribute power and ground from the external pins to the internal circuitry (in the case of DC related voltage drops, these are also often referred to as IR drop effects). Purely for the purposes of providing a simple example, consider a chain of inverter gates connected to the same power and ground tracks (Figure 3).


Figure 3 — A chain of inverters connected to the same power and ground tracks.

Every power and ground track segment has a small amount of resistance associated with it. This means that the logic gate closest to the IC’s primary power or ground pins (gate G1 in this example) is presented with the optimal supply. The next gate in the chain (G2 in this example) will be presented with a slightly degraded supply, and so on down the chain.

The problem is exacerbated in the case of transient or AC voltage drop effects. These occur when gates are switching from one value to another or — even worse — when entire blocks are switched on and off. This causes transitory power surges, which momentarily reduce the voltage supply to gates farther down the power supply chain.

The simple example circuit shown in Figure 3 consists only of inverter gates, but a real design typically contains tens of thousands of register (storage) elements triggered by a clock signal. The clock can cause large numbers of register elements to switch simultaneously, resulting in significant “glitches” in the power supply. In order to analyze and address these effects, it is necessary to take resistive, inductive, and capacitive effects into account.

The reason voltage drop effects are so important is that the input-to-output delays across a logic gate increase as the voltage supplied to that gate is reduced, which can cause the gate to miss its timing specifications. There is also an increase in the interconnect delays associated with wires driven by underpowered gates. Furthermore, a gate’s input switching thresholds are modified when its supply is reduced, which causes that gate to become more susceptible to noise.

Voltage drop effects are becoming increasingly significant, because the resistivity of the power and ground tracks rises as a function of decreasing feature sizes (track widths). These effects can be minimized by increasing the width of power and ground tracks, but this consumes valuable real estate on the silicon, which typically causes routing congestion problems.

In order to solve these problems, the logic functions have to be spaced farther apart, which increases delays (and power consumption) due to longer signal tracks. Thus, implementing an optimal power network requires the balancing of many diverse factors.

Electromigration effects

Electromigration occurs when the current density (current per cross-sectional area) in tracks is too high. In the case of power and ground tracks, electromigration effects are DC-based. The so-called “electron wind” induced by the current flowing through a track causes metal ions in the track to migrate. This migration creates “voids” in the “upwind” direction, while metal ions can accumulate “downwind” to form features called “hillocks” and “whiskers.”

Electromigration in power and ground tracks causes timing problems, because the increased track resistance associated with a void can result in a corresponding voltage drop. This will, in turn, cause increased delays and noise susceptibility in affected logic gates as discussed above.

Power and ground electromigration can also cause major functional errors to occur, because the voids may eventually lead to open circuits while the hillocks and whiskers may cause short circuits to neighboring wires.

Requirements for a true low-power design environment

RTL-to-GDSII

The majority of today’s design environments concentrate on analyzing and addressing power considerations towards the back end of the physical portion of the design process. This makes it almost impossible to fix any problems caused by poor decisions made during the early stages of the design.

A key requirement for a true low-power design environment is to provide early analysis of effects like voltage drop using whatever data is available at the time, and to then successively refine the analysis as more accurate data becomes available. This allows potential problems to be identified and resolved as soon as possible.

Creating optimal low-power designs involves making tradeoffs such as timing-versus-power and area-versus-power at different stages of the design flow. In order to enable designers to accurately and efficiently perform these tradeoffs, it is necessary for low-power optimization techniques to be integrated with, and applied throughout, the entire RTL-to-GDSII flow.

Power-aware design optimization techniques

There are a wide variety of power-aware design optimization techniques that can be brought into play. During the early (pre-synthesis) stages of the design, the RTL can be modified to employ architectural optimizations, such as replacing a single instantiation of a high-powered logic function with multiple instantiations of low-powered equivalents. The design may also be partitioned for implementation in multiple voltage (VDD) domains, and power-aware clock gating techniques can be automatically applied.

Following synthesis, power-aware mapping techniques may be used to optimize the netlist. These techniques include mapping highly active nodes into specific cells and mapping highly active input signals onto low capacitance input pins. When partitioning the design into multiple voltage (VDD) domains, appropriate level shifter elements need to be inserted into the netlist to connect logic elements across multiple domains. Furthermore, signals to and from domains that may be switched on and off require special attention so as to avoid any “floating net” problems.

A key element in the power-aware design process is to perform appropriate timing optimizations. This means that it is necessary to perform domain-based timing and power analysis throughout the flow. Furthermore, such analysis needs to account for delay increases caused by cell-specific voltage drops in the power rails. Potential optimizations include optimal sizing of gates using a gain-based synthesis flow and automatic selection of low and high thresholds when multi-Vt libraries are available.

Advanced techniques also enable optimization for power during floorplaning and placement. In order to correctly implement multiple voltage domains, it is necessary to separate the different power meshes for each domain. Power-aware cell placement based on weighting nets according to their activity can be used to minimize dynamic power consumption. The results from early voltage drop analysis can be used to determine better locations for any buffers that are to be inserted. And advanced clustering techniques can be applied to clock trees to reduce power consumption.

Appropriate on-chip decoupling capacitors should be added to minimize the inductive voltage drop effects caused by off-chip current variations over time. In order to lower the current-per-pad and bond-wire inductance, many pads are allocated for power and ground, thereby making the analysis of pad placement a non-trivial task. Flip-chip packaging technologies can be used to increase the number of pads connected to the power and ground supplies, thereby lowering the current-per-pad and also lowering the inductance.

The process of designing the power distribution network should be based on the results of early rail analysis performed when the power grids are still incomplete. Correct distribution of dissipating elements across the chip can avoid hot-spots and local voltage drop problems, and special wire-widening algorithms can be used to address voltage drop and electromigration issues.

Integrated tool suite

There are a number of very sophisticated power analysis tools available to designers. However, these tools are typically provided as third-party point-solutions that are not tightly integrated into the main design environment. These tools either require the use of multiple databases or they combine disparate data models into one database. This means that design environments based on these tools have to perform internal or external data translations and file transfers, making data management cumbersome, time-consuming, and prone to error.

Correlating results from different point-tools can be difficult, which means that problems may be discovered late in the design cycle or may never be detected at all. Perhaps the most significant problem with existing design environments, however, is that power, timing, and signal integrity effects are strongly interrelated in the nanometer domain, but conventional point-solution design tools do not have the capability to consider all of these effects and their interrelationships concurrently.

In order to fully account for the impact of voltage drop effects, for example, it is important to have an environment that can derate for timing — on a cell-by-cell basis — based on actual voltage drops. The timing analysis engine should then make use of this derated timing data to identify potential changes to the critical paths.

In turn, the optimization engine should make appropriate modifications to address potential setup or hold problems that appear as a result of the timing changes. This requires a design environment in which the power analysis, voltage drop analysis, derating calculations, timing analysis, and optimization engines all work seamlessly together. In the absence of such an integrated environment, one would have to transfer huge amounts of data (such as SDF files) between the different point tools and iterate between them in order to address the timing problems caused by voltage-drop-induced delays.

The lack of integration between power analysis tools and the rest of the environment can result in a tremendous amount of “false errors,” such as minor voltage drops in portions of the design that won’t affect the performance or functionality of the device. Engineers often overcompensate for these false errors and modify the power grid unnecessarily. In turn, this can cause these portions of the design to fail to meet their area or timing constraints and to become congested, and compensating for this can cause ripple effects throughout the rest of the design.

Even worse, the lack of integration between power analysis tools and the rest of the environment means that, when the results from the power analysis are used to locate and isolate timing and/or signal integrity problems, the act of fixing these problems may introduce new problems into the power network. This can result in numerous, time-consuming design iterations.

Ultimately, using point-solution power analysis tools can result in non-convergent solutions that prevent designs from meeting their time-to-market windows (or from being realized at all). Thus, a true low-power design environment should have all of the power analysis tools operating concurrently with the implementation tools, including synthesis, place-and-route, clock-tree, extraction, timing, and signal integrity analysis. Furthermore, all of the tools in the environment should operate on a common data model so as to provide them with concurrent access to analysis data and enable “on-the-fly” changes to the design.

Aspects of IC power dissipation

Sometimes, those involved in IC design can get a very narrow view of their particular specialty area. This article, while covering some basics, aims to provide a global overview to everyone on the team, with a focus on power use (and reduction in a forthcoming article). With the reduction in size of MOS, the world of chip manufacturing has become susceptible to quantum effects which can play havoc with power consumption.

Power consumption in a chip

The power consumption in a chip can be divided into three major categories. These are: Dynamic Power, Short Circuit Dissipation, and Leakage Power Dissipation. Each of these categories and their components are discussed below in detail. Please note that unless otherwise mentioned, the description below refers to NMOS only – similar explanations can be derived for PMOS as well. “MOS” is used to refer to MOSFET and CMOS in general.

  1. Leakage Power Dissipation: This component of power dissipation is getting the most attention these days. Not all the components of leakage consumption existed or dominated for quarter micron and above nodes and thus, it contributed a negligible portion of the overall power consumption. However, with the shrinking of MOS due to technology advancements, the quantum mechanical effects started coming into picture and resulted into many of these leakage current components. This is the component of energy dissipation which affects operation of chip largely in the standby operation as other components seize to play during that period. Therefore, to achieve low power target in a chip, one has to look for various sources of leakage components which might come into play. Major sources of leakage consumption are as below:

1.1   Weak Inversion Current/Sub-threshold Current: Sub-threshold region of MOS is the region of operation where VGS≈VT and VDS>0 (in the context of an nMOS). In this region, the voltages are not sufficient to build a complete surface channel for MOS to start conducting. However, some of the electrons may gain enough energy to cross over from source to drain. This current is called sub-threshold current.  The approximate value of this current can be understood from the equation below:

where,

iSUB = Sub-threshold current

α        =  Some process and technology dependent constant.

T        = Temperature in Kelvin

Cox    =  Capacitance due to oxide.

n        =  another process dependent constant.

VGS    =  Gate-source voltage

k        =  Boltzmann constant

VT      =  Threshold voltage

W      = Channel width

L       = Channel Length

q       = Charge on an electron

As we can see from above equation, the sub-threshold current increases with decrease in L, increases exponentially with drop in VT and increases with temperature. As the CMOS process shrinks, L decreases and VT has to be decreased for better functioning of the MOS logic (higher VT devices require more time to switch their state, which reduces the maximum operation speed of the device). Therefore, this current increases with the lowering of the technology nodes and become substantial at deep-sub-micron technologies. When the circuit operates in proper saturation/off region, this region of operation does not come into play. However, during low power operation when the voltage is reduced, one may reach a stage which satisfies the voltage conditions conducive for sub-threshold region of operation and this component becomes substantial. Also, one should note that the analog circuits use this range of operation quite abundantly for their circuit implementation to use high gain region in this range of operation.

Figure 1: Various Leakage Currents

1.2  Junction Reverse Bias Current: Some parasitic diodes are formed between diffusion region and substrate boundaries. These parasites tend to drift some of minority current from drain to substrate. Also, some of the electron-hole pairs generated in the depletion region contribute to current flow towards substrate. This net current-flow is known as Junction Reverse Bias Current. This current has direct relationships with doping densities and tends to increase with the doping.

Figure 2: Reverse Bias PN-Junction Current

1.3  Drain Induced Barrier Lowering (DIBL): As the drain voltage is increased, it influences the depletion region around the drain region where a local build up of potential takes place. This results into increase of width of depletion region and increase of surface potential around the drain region. In long channel MOS, where the source is at a distance from the drain, there is not much of influence on the source region. Consequently, the potential between the source and channel does not alter. However, as the technology nodes shrinks, the distance between the drain and source decreases. As a result, source region also starts getting affected due to the voltage at the drain. This results in an increase of the depletion width and increase of the surface potential near the source side of the depletion channel. Hence the potential barrier is lowered and more electrons start moving from source to drain side, for a given gate potential. This is called Drain Induced Barrier Lowering. This results into increase of off-current due to increased carrier availability.

1.4  Punch-through Current: Punch-Through current is an extreme form of DIBL. When the drain voltage reaches beyond a certain level, the depletion region extends deep into the well. As a result, gate voltage loses control over the current through the MOS and a large amount of current start flowing through it. This current varies in quadratic relation with VDS (Drain-Source voltage). This is one of the factors which determine voltage range of operation variation with the reduction in MOS size and oxide thickness. As MOS shrinks, the distance between source and drain nodes decreases and as a result, the same VDS will now cause greater electric field between drain and source nodes. This high field can induce punch-through current. Therefore, it becomes essential to decrease the supply voltage as one shrinks the MOS size.

1.5  Gate Induced Drain Leakage (GIDL): Suppose the drain is connected to supply and gate is either connected to ground or a negative supply. This will cause creation of electric field in the drain region under the gate. This field will create a depletion region in the drain. This result into field crowding near the drain and high field effects start taking place, such as avalanche multiplication and band-to-band tunneling. As a result, minority carriers are emitted in the drain underneath the gate. As the substrate is at a lower potential, the minority carriers accumulated near the drain depletion region are swept to substrate. This current is known as Gate Induced Drain Leakage current. This current is affected highly by the voltage applied and the gate oxide thickness.

1.6  Gate Tunneling Currents: As we scale down the technology to deep-sub-micron level, the thickness of the oxide under the gate also reduces. In present day technologies, this is in the range of 1-2nm thickness. Heavily doped channel and ultrathin oxide layer give rise to very high electric filed in the oxide region, of the order of MV/cm. Because of this, current carriers can tunnel through the oxide region giving rise to gate current. Larger the amount of voltage applied, more are the chances that the carriers can tunnel through the oxide layer. This current not only amounts to a leakage current from gate terminal, but can also decrease the current flow through drain. This can hamper the performance of the device. To counter this current, poly-silicon gates are used instead of metal-gates.

  1. Short Circuit Power Dissipation: This is another component of power consumed in the device. When there is a logical change at the input of a circuit, it may change its output state. During this transition, some of the MOS would go from OFF to Saturation state and some other would follow the opposite path. As input takes some finite time to switch between the two logical states, during this transition phase, there comes a small period where both the NMOS and PMOS are conducting and none of them are in OFF state. During this time some current flows through them, which is known as Short Circuit Current. This current does not contribute to any internal capacitor (junction, interconnect and diffusion capacitances) charging, therefore contributes to pure loss of power.

Consider a low to high transition of input A.  The nMOS starts conducting when the level reaches VTn. At this time, pMOS is still ON and remains conducting till input reach a level of (VDD-|VTp|), where VTn and VTp are the threshold voltages of nMOS and pMOS respectively. When either of nMOS or pMOS goes into cut-off stage, the conduction stops and Short Circuit current-path breaks. Similar path follows for falling transition of input where pMOS is switched on while nMOS is still conducting. This current becomes considerable if rise and fall time of inputs are high or if the load capacitances are low. To counter such losses, the delays in the input rise and fall are reduced and capacitance at the output is increased.

Figure 3: Short Circuit current in a CMOS inverter.

3. Dynamic Energy Consumption: Dynamic Energy consumption is the consumption due to toggling of the cells as a result of toggle in the input. For this reason this is also known as Switching Energy. When a cell changes its state from logical high to logical low or vice versa, various internal capacitors (junction, interconnect and diffusion capacitances) charge or discharge accordingly. Energy is drawn from the supply to charge these capacitors, known as dynamic power. This energy consumption used to be the most dominating consumption in technologies up to quarter node (250 µm) with insignificant leakage current. However, with shrinking of technology, the functional current has reduced and leakage component has increase many-fold. However, every effort is made to minimize the switching power consumption to reduce the overall energy consumption of the application.

Figure 4: Switching current flow in a CMOS circuit.

If all the parasitic capacitances in a CMOS cell are lumped into load capacitance C, then, if the output level changes from VDD to Ground, there is a total energy consumption of CVDD2. Out of this, half of energy is stored in the load capacitor C and rest half of the energy is dissipated. Similarly, when output changes back to ground, similar energy dissipation of energy takes place. Therefore, this switching energy consumption is directly related to VDD and switching frequency. As a result, reduction of supply voltage is one way of reducing dynamic consumption. However, reduction of VDD causes cells to become slower, therefore, effectively reducing the maximum frequency of the operation. Besides, reduction in frequency causes the same operation to take more time. Average switching energy consumption is:

Pav = f·C·V2

where, f is the frequency of operation. This power consumption is totally independent of the rise and fall time of the input and output signals.

The other component of switching energy consumption is loss due to dynamic hazards and glitches. Glitches may arise in a circuit due to unbalanced delays in the paths of various inputs coming in or in the path internal to the circuit. Consider the circuit as shown below.

Figure 5 : Glitch generation, circuit and timing diagram.

Consider the case where two of the inputs are at logical one, shown by VDD, and signals A and B transition with some delay, as shown in adjacent timing diagram. Due to unbalanced delays between arrival of A and B, output signal Z is asserted to 1 for a short duration of time. Such transitions are known as glitches/hazards. On the other hand, had A dropped earlier than assertion of B, there would not have been any glitch at the output as one of the input of output AND gate would have toggled to zero before assertion of other input. Therefore, timing is met in such a way that such glitches are either removed or minimized. However, in some cases, this behavior may be intentional to stop race conditions in a circuit. For this purposes, not all the inputs are toggled at the same time. Conditions where such glitches cannot be removed altogether, logic may be placed at the output to absorb such glitches to arrest their propagation to following logic, e.g. adding some buffers in the path to absorb such glitches and also balance the timing of the path.

Looking at the various sources of power dissipation and their causes, attaining a low power design is becoming more challenging as we progress with MOS scaling. As we move across the technology nodes, new actors start playing critical roles in this arena, giving a new twist to this story of low power. Our next article will talk about various steps and methods employed to attain a low power design. Not all those methods might be employable for a particular design; however, one may have to walk a tightrope to balance power vs. performance.

SoC PDN challenges and solutions

The Power Delivery Network (PDN) is the one of the most important components in an SOC as it supplies power to all the components in the design. With increasing complexity of designs, the partitioning approach is gaining popularity, and power gating helps reduce rising consumption. With these approaches, designs become more efficient, but they introduce additional issues and challenges with respect to the design of the PDN.

In this paper we will introduce the usual flow taken in the design of the power grid. This will be followed by the challenges and issues faced while designing the PDN in multi-partition and low-power designs.

Power grid design

Metals used in the power grid mostly depend on the power requirement of the design and the metal options used in the technology node. More metal options cost more but it creates more robust design than a less metal option design. Metal usage (width, spacing, and metal stack) in the power grid is defined by the power requirement. If we have more power requirement, then in this case we must use metal stripes of more width for the grid.

Metal width should be chosen such that no routing track is wasted. Sometimes DRC rules also play a role in deciding the power grid metal width. Let’s have a look on the DRC spacing sample table which is given below.

DRC spacing rule depends on the metal width & also on the parallel run length of the metal. Below spacing table shows that how spacing varies with metal width used. If we take M4 power stripe width of w2 µm, then in this case spacing from the next M4 signal route of width w1 µm (minimum metal width of M4) must be of “s3”. M4 width “w2” is chosen to take-care of the wide metal rule so that we don’t waste the nearby routing track. We are assuming routing grid is of “x” µm for a particular technology node which may vary from technology to technology.

 

Challenges of Power Grid Design and Analog Integration

1. Power Gating in Partitioned Designs

Partitioning the design helps in breaking the design into smaller hierarchies which can be handled in a more effective manner individually. Also power gating some of these modules lead to significant reduction in total power of the design. But power gating the modules leads to break in power grid continuity. And thus one of the biggest disadvantages of power gating partitions is the IR drop issues faced by core grid. This discussion is specifically for wire-bond based packages (QFP, QFN, BGA etc).

One of the ways by which we can minimize the drop is by making the grid stronger. This needs to be done quantitatively since it has its own limitations of congestion and significant increase in die area.

The second approach can be by providing feed-through paths for core power over the power gated module so that the core grid remains continuous.

FIG 1: Traditional power grid in power gated partitioned design

FIG 2: Introducing feed-through paths over partitions to maintain core grid continuity

 

 

2. Grid Alignment between partition and Top (NON power gated partitioning)

Alignment of the core grid between the partitions and grid can be a tricky and iterative business, specially, when a large number of blocks are involved. It is always advisable to have a common set-to-set-distance, width and spacing for metal stripes for all the blocks and top. If this is followed alignment can be eventually met at the cost of a movement of few microns while placing the blocks on chip top.

Often there is metal sharing between chip top and block level. Alignment in these cases can be achieved by replicating the grid inside blocks in block LEF so that the grid at top can easily connect to these block pins with a movement of a few microns here and there. It is always advisable to follow a clean approach in this respect right from the beginning.

Fig 3: Block power pins reflected on top.

Fig 4: Top power grid getting connected to Block power pins

3. Power Gated Design

Low power mode is a feature introduced to lower the total power dissipation of a chip by switching off power supply of certain portions of the logic when not in use. Power switches are generally used to gate power supply to power gated modules. Even though the use of power switches adds extra complexity to integration including a hit on die area, they are the best when it comes to design performance.

One of the issues related to power switches is its placement with respect to power sources and the power gated module. We will discuss a few examples of the use of power switch in a design in this respect below.

CASE1: When power gating is done by controlling power switch inside die. Source on board is always ON.

Fig 5: Use of power switch where source ballast is always ON

In Fig5 above we have shown how the switchable P1 domain is power gated using a power switch at point A. Therefore the voltage at point A controls the voltage drop throughout the chip. Therefore the switch A needs to be placed as close as possible to the power source as possible. The entire grid needs to be established at point A by keeping drop between pad and point A as negligible as possible. This approach however is prone to voltage drop across the switches and hence appropriate switches need to be used based on current requirement, resistance offered by the switches and size of the switches used.

 

CASE2: When power gating is done by switching off power at board level.

Fig 6: Use of power switch where ballast is used for power gating

In figure above the ballast on board is switched off to power gate power domain P1. The low power domain P2 (which is always On) is connected to a power switch very close to power pad to create a parallel power grid.

 

 

CASE 3: When only internal ballast is in use

Fig 7: Use of power switch for internal ballast only

 

A switch is placed as close as possible to the internal regulator to power gate the switchable domain P1.

 

4. Memory Orientation

This is an extremely rare case scenario related to Memory orientation affecting IR drop and chip functionality, but can be extremely critical if not taken care of.

Often we have different orientations for memories in an NPI, with memory power pins in both horizontal and vertical directions depending on the allowable orientations of memory in a given technology.

While dropping via from top metal to memory pin, the “addStripe” command drops via only on orthogonal memory pin. So when top metal grid and memory pin are in same direction, no vias are added to memory pins. This can be extremely hazardous if not caught.

Fig 8: Memory power pin is orthogonal to power grid

Fig 9: Memory power pin in same direction as power grid

Reducing IC power consumption: Low-power design techniques

Designers always look for ways to reduce unwanted components of power consumption, either by architectural the design in a fashion which includes low power techniques, or by adopting a process which can reduce the consumption. However, some of these solutions come at the expense of performance, reliability, chip area, or several of these. Eventually, one has to reach a compromise between power, performance, and cost. The article below aims to discuss some of those techniques. These techniques are divided into Architectural Techniques and Process Based Techniques.

  1. Architectural Power Reduction Techniques: At the RTL level, one can take several steps to reduce the overall power consumption of the device. Typically, RTL based techniques minimize the dynamic power consumption of the device, however, using techniques like power gating, one can also reduce leakage power of a part of chip. Various popularly employed techniques are:

 

1.1.Clock Gating : This technique is a very popular Dynamic Power reduction technique. Dynamic power is the sum of transient power consumption (Ptransient) and capacitive load power (Pcap) consumption. Ptransient represents the amount of power consumed when the device changes logic states, i.e. “0” bit to “1” bit or vice versa. Capacitive load power consumption as its name suggests, represents the power used to charge the load capacitance. Total dynamic power is as follows :

Pdynamic = Pcap + Ptransient = (CL + C) Vdd2 f N3

where CL is the load capacitance, C is the internal capacitance of the chip, f is the frequency of operation, and N is the number of bits that are switching. As dynamic power consumption is directly linked to toggling of the MOS cells, gating the clock when not required helps reduce the dynamic current. This techniques help preserve the state of the design while only limiting the transient currents. Designers frequently use AND/NOR gates to gate a clock, however, latch based clock gating is the most favored technique as it also saves designs from hazards which can otherwise introduce additional power consumption, inherent in dynamic power consumption.

1.2. Variable Frequency/Frequency Islands : In a big chip, not all the blocks need to be clocked at highest possible frequency in order to achieve the desired level of performance. There can be few blocks which inherently work slow (e.g., slow communication blocks like I2C, UART, etc.) and, therefore, can be clocked at slower clock than blocks like core/processor which require high frequency clock for maximum throughput. Therefore, by providing different frequency clocks to different blocks, one can reduce localized dynamic consumption.

Figure 1: Frequency Islands

1.3.Power Gating:  There can be applications where certain blocks of the chip might not be required to function in some of the low power modes like sleep, deep-sleep, standby mode, etc. and only a part of the device is required to function. In such cases, it makes sense to power off non-functional blocks so that device does not have to power unused blocks. This not only helps reduce the dynamic consumption but leakage power is also saved for such a power gated block. However, while dealing with such a technique, design has to make sure that signals coming in from power-gated blocks do not affect the functioning blocks while operating in low power. For this purpose, isolation blocks are placed in the path so that functionality corruption does not take place, as can be seen in figure 2. Please note that the isolation signals are not required for signals going out of always-ON domain to other power domains as they are never supposed to go non-deterministic.

Figure 2: Power Gating

 

  1. Process Based Power Reduction Techniques : There are a lot of components of power consumption and not all can be targeted using Architectural Techniques alone. Power consumption due to effects like Drain Induced Barrier Lowering, Gate Induced Drain Leakage, sub-threshold leakage, etc. can be controlled most effectively using process based techniques. Below are some of the most commonly employed Process Based Techniques:

2.1 Multi Threshold Voltage CMOS Cells : A lot of MOS characteristics are governed by the threshold voltage of the cell. Sub-threshold current is the current between source and drain when the gate voltage is below threshold voltage. Mathematical expression for approximate value of this current is :

As one can see, this current reduces when threshold voltage VT is increased. Therefore, higher VTcells can be placed to decrease this component. However, as we have seen in propagation delay above, increasing VT has a negative impact on frequency of operation. Therefore, designers have to adopt a strategy to mix lower VT and higher VT cells in a way to reduce the leakage current while maintaining the desired frequency of operation. To implement this strategy, high VT cells are used as sleep transistors which gate the supply to further low VT based design when the block is supposed to be in standby mode. When device is in active mode, these sleep transistors are turned ON, and low-VT blocks downstream of this sleep-transistor can get the power and work as usual. This helps in reducing the current in standby mode. Alternatively, various data paths are categorized in terms of timing critical versus non-timing-critical paths. Timing critical paths can be implemented with cells of lower VT (known as LVT (lower VT) cells) so that the same operation can be achieved in less amount of time as compared to a path implemented with high VT cells (also abbreviated as HVT cells). This mixed usage approach balances the leakage current even when the chip is in run mode.

Another solution is to dynamically change the VT of the cells as per the application requirement. This can be achieved by varying the well/body biasing voltage using a control circuit. This requires more complex MOS fabrication as it requires twin-well or triple-well fabrication technique. This is more commonly known as Variable Threshold CMOS (VTCMOS). However, one should note that lowering the VT also compromises with the reliability of the chip as even lower voltage swings can cause the logic to start functioning in an incorrect fashion. These voltage swings can arise due to various process or environmental variations. Therefore, one has to be very cautious while decreasing the VT of the cells so as not to compromise on the sanctity of the final application.

2.2   Mutli VDD Technique : As we can see from the equation above, there is a quadratic relationship between device voltage VDD and dynamic power consumption. Therefore, one can reduce the dynamic voltage substantially by reducing the supply voltage. However, voltage reduction has its downside as well. Propagation delay of a cell is as below :

As one can see from equation above, reduction in the VDD increases the delay of the cell. As a result, the operating frequency of the cell reduces when one reduces the supply voltage. Therefore, one has to maintain a balance between voltage supply and associated performance.

Figure 3: Voltage Islands – Multi VDD Operation

 

A solution to this challenge can be to create voltage islands in the design where low performance slow peripherals can be powered using lower supply voltage and performance critical blocks can be powered using higher voltage. However, design has to make sure that appropriate voltage level shifters are placed on those signals which talk across the voltage domains.

This technique also reduces Gate Induced Drain Leakage effect and associated power consumption in a device.

 

2.3  Dynamic Voltage and Frequency Scaling : Voltage Island technique, also known as Static Voltage Scaling presents few constraints while operating the device. This technique is not adaptive to the application needs and voltage supply to a block cannot be changed once designed. However, Dynamic Voltage Scaling technique liberates designer and customer of such limitations. This technique makes use of a regulator which can be programmed to deliver voltage levels as required. Therefore, various blocks can get configurable voltage and the customer/user can change the voltage settings as per the application settings. This can help save the power dynamically. Various solutions have also been used where the design freed the software to make changes to voltage scaling. The design itself senses the current-load requirement in the device and makes the voltage adjustments accordingly. This technique helps reduce power consumption in a more adaptive manner.

The same voltage scaling can also be clubbed with dynamic frequency scaling where the frequency of a block can be changed by the software as needed. Therefore, a block running on lower VDD can be clocked by a slower clock while maintaining the performance and functional requirements. This technique helps reduce dynamic as well as leakage power consumption in the device.

Figure 4: Differential Voltage and Frequency Scaling

2.4 Fully Depleted Silicon on Insulator (FDSOI) : This is another technique which helps reduce various components of leakage currents which are more of a menace at lower technology nodes. Leakage components like GIDL, Reverse bias current and Gate tunneling currents can be controlled very effectively using this technique. In this technique, the MOS sits over an ultra thin film of oxide which insulates the cell from rest of the body. On top of this oxide film, a very thin layer of silicon is deposited which acts as a channel. Due to its thinness, channel can be established in this layer without any additional doping of the same. For this reason, it is known as fully depleted SOI.

Figure 5: FDSOI Cell (Left) and Various Leakage Currents in  CMOS Cell (Right)

In another technique a small neutral region is deposited in the depletion region under gate. Here, channel thickness need not be as thin as required in FDSOI. This is known as partially depleted SOI (PDSOI). However, PDSOI tends to have higher VT (and therefore slower operation) and larger gate-effects as compared to FDSOI (hence, larger leakage currents). Therefore, due to better control over VT and drastically reduced leakage currents, FDSOI is a preferred choice for small process nodes (usually below 90nm).

The article discussed various solutions which can be adopted to achieve target current requirements. However, all of these have pros and cons. Therefore, designers often mix various solutions to reach the optimum level which meets not only current requirements but also takes care of performance and cost of the device to make the product sellable.

Reducing Congestion with IC Compiler

Congestion is a problem that you face when designing chips. Congestion means that more routing resources are needed than you actually have. Congestion can occur locally in a portion of a block or globally for the whole block. Sometimes your designs are congested at different locations with different severity. This document describes how to solve congestion issues.

Divide-and-Conquer Approach

The divide-and-conquer approach resolves congestion step by step and does not attempt to solve all congestion problems at once. This approach can appear to be more time-consuming, but it can actually take less time. You might need to solve a few problems at the same time, for example, a high utilization issue, a datapath structure issue, or a port location issue. Some problems are interdependent or influential, so solving those problems might improve some results but produce other poor results. You should save your results after each successful experimental stage.

You should neither optimize (using place_opt or psynopt) nor explore congestion issues of a design at the same time. With large designs, it might be useful to segment out the problematic hierarchy (by using grouping commands in Design Compiler/Design Compiler topographical mode) and generate the expected floorplan (by using minimum physical constraint options); then work stand-alone until you get the best results.

Avoid Very High Utilization

The first step of design optimization in IC Compiler is done with a thin netlist. This means that the netlist does not contain elements to be added later in the flow such as clock tree buffers, hold fixing cells (buffers), and so on. Area should be preserved to add these elements later. The amount of the area to be saved depends on each design, but it is around 10 percent. Timing issues and wire distances can determine the amount of the area needed. In difficult signal integrity cases, 20 percent should be sufficient. Therefore, it is recommended that you avoid very high utilization. If you have a design with very high utilization, take the following actions to eliminate potential problems resulted by routing congestion and/or insufficient placement area:

1. Check the design constraints (both timing and design rule checking). An optimized design that contains many large cells and buffers could indicate bad constraints. Commands that might help are:

check_timing report_timing_requirements -ignored

2. Perform netlist reduction by using either Design Compiler or IC Compiler.

Design Compiler: compile -area_effort high [-inc]

IC Compiler: psynopt -area_recovery -area_effort high [-only_area_recovery]

IC Compiler: place_opt -area_recovery

Setting physopt_ultra_high_area_effort to true adversely impacts the design

area when used with place_opt -area_recovery -effort medium or place_opt

-area_recovery -effort high.

Reducing the number of cells in the design or even specifying a few subblocks to reduce the number of cells, you use fewer routing resources. To avoid repeating this step for each stage in the flow (it might cause longer runtime), the resulting reduced netlist should be saved with a new name. The saved netlist becomes your starting point.

Congestion Hot Spots and Block Timing

Some block inspections reveal timing issues that force cell placement. A subblock can have very tight timing and design rule checking(DRC) constraints, especially a path with many logic levels between flip-flops or latches. In many cases, the tight timing is caused by timing and DRC constraint propagation from the top level. The top-level constraints should be checked carefully. If you find unjustified constraints, apply a different DRC or timing constraint to the block. You can also try other techniques such as using case analysis, setting false path or multicycle path constraints, adding input delay to the associated clock, or constraining the ports.

Automatic Congestion Handling

If you want IC Compiler to resolve congestion automatically, follow the below steps.

Set the Congestion Options

For best results, you should provide realistic numbers for the routing availability of the metal layers. For example, metal1 is mostly used for cell building and power and often has limited availability for routing.

To specify the routing availability for a layer, use the following command:

set_congestion_options -layer <layer> -availability <percentage> \

-coordinate [get_placement_area]

These settings affect congestion optimization and reporting.

Check the Results

Some locations in the design are expected to be congested if they are not well

planned for channel and port areas. The treatment for these areas is described

in the “Floorplan-Driven Problems” section that follows.

View the ASCII congestion report generated by the report_congestion command.

You can also view congestion information in the GUI.

Use the -congestion Option

IC Compiler provides powerful algorithms to resolve congestion. Unlike the

default behavior that the placer minimizes wire length (the default) or optimizes

path location to meet timing, the goal of the algorithms is to reduce congestion.

The algorithms are invoked by using the -congestion* option with the psynopt,

place_opt, create_placement, or refine_placement command during placement stage.

Two primary concerns when using the congestion-removal algorithms are:

1. Runtime is increased, so congestion-removal algorithms should be invoked

only when needed.

2. Congestion-removal algorithms can result in less optimized design for timing.

When using the create_placement and refine_placement commands, you can control

the congestion effort. The -congestion_effort option determines how much effort

IC Compiler uses to resolve congestion. A medium effort is the default for

tradeoff between quality and runtime. For more difficult designs, setting the