

# A NOVEL APPROACH TO HARDWARE CONTROLLED POWER AWARE VERIFICATION WITH OPTIMISED POWER CONSUMPTION TECHNIQUES AT SOC

Eldin Ben Jacob, Staff Engineer, Samsung Semiconductors India Research(SSIR), Bangalore, India (eldin.jacob@samsung.com)

Harshal Kothari, Staff Engineer, Samsung Semiconductors India Research, Bangalore, India (harshal.k1@samsung.com)

Sriram Kazhiyur Soundarrajan, Associate Director, Samsung Semiconductors India Research, Bangalore, India (sriram.k.s@samsung.com)

Somasunder Kattepura Sreenath, Director, Samsung Semiconductors India Research, Bangalore, India (soma.ks@samsung.com)

Abstract— Complexity of System-on-Chip is increasing exponentially with the advancements in artificial intelligence, machine learning and IoT technologies. Keeping up with the latest trends in ASIC domain, advanced low-power design is a major requirement in high performance SOCs. The increase in power rails and power ports combined with the requirement for a faster power up and power down process has resulted in adoption of increased hardware controls compared to software controls. The increase in hardware control calls for identification, verification and analysis of all functional, use and corner case scenarios to ascertain the most optimal power settings when each IP is functional. Power related verification attributes at subsystem level drives power estimation at standalone block level and SOC level for all practical product applications. This paper details on the ways to mitigate the costly loss of performance and power, the early development of real use cases and how the experiments by power transition tests across multiple blocks was experimented leading to better performance and power saving. The advantage of using such a complex power estimation techniques resulted with almost +/-0.5% accuracy in power estimation. The fine tuning techniques deployed based on the new verification approach resulted in saving up to 30% power consumption on silicon. The fully configured and powered on module consumed ~41mW (18% less) against the planned 50mW, idling power with power rail on consumption 8.44mW → 5.86mW (30.6%) and the power gated block with minimal functionality to the block was found to consume only 0.58mW against planned 0.65mW (11%).

Keywords: Power estimation from front-end test, Power Aware, Low Power simulation, Clock and Power gating, System On Chip,

### I. INTRODUCTION

A systematic approach inculcated to target an efficient and smooth verification flow for low-power simulations for large SOCs. Power aware verification flow (Figure 1.)

- 1. Analysis of power/voltage/clock domains
- 2. Defining verification scope
- 3. One-time compile Loading the power information of design
- 4. PMIC controller Mimic the behavior in the test bench
- 5. Power operations isolation, level shifters, retention
- 6. Checkers to reinforce DV

Our SOC consisted of 75 power domains, 79 clock domains and 37 voltage domains resulting in mammoth combinations of power domain switching which is astronomical to handle and qualify. The hardware descriptive languages do not have the capability to express the power intent of design resulting in the usage of IEEE 1801



standard Unified Power Format at RTL level followed by UPF based synthesis. The power intent of the SOC is defined using UPF2.0/3.0 and liberty files for standard library cells and hard macros. The UPF consists of low power constructs like power domain description, voltage supply ports/nets, isolation cell logic, retention rules, level shifters etc. which are used to define the power state of a block [2]. A liberty file is an ASCII file, which provides a standard method for defining library cells that include timing, noise, power, and related input/output characteristics for a design independent of the IEEE 1801 file [2]. Simulator can then apply low power semantics to the instances of these cell models at simulation time. The SOC consisted of 13 power manager IPs which deployed various low power techniques:

Variable voltage level for logic/memories/hard macros → Operating from 0.6V to 3.3V.

Switchable power supplies → Power gating using standard library power switch cells.

Clock shut off - Clock gating when PD is idle.

Isolation cells - Avoid leakage to nearby PDs.

Level shifters - Sampling signals that cross voltage levels.

Retention - Data saved during power shut off and restored during power on.



Figure 1. PA Verification Flow





Figure 2. Testbench structure [1]

The testbench was developed by compiling the power information for the design by loading UPF files containing design power information and the liberty files containing power information for library cells. The elaboration tool requires the appropriate IEEE power defines and tool dependent power defines. The discrepancy in the way the tools from different EDA vendors interpret the IEEE standard resulted in unexpected issues that consumed significant manual efforts during the environment bring up. To avoid this in future, a utility to resolve the discrepancy has been developed. The testbench was architectured to mimic the behavior of the PMIC (Figure 2). The SOC power up/down was achieved by simulating the power balls at the DUT top using a System Verilog model. The power ports are driven with a combination of \$supply\_on/off system tasks and CPU firmware masters which controls the power domains of the IPs based on the use case by the power manager IP. To reinforce the through verification, additional checkers for isolation strategies, level shifter strategies, retention strategies, power switches of the power domains and power state table were developed. This ensured that power intent as desired by the power architecture were intact and the DUT exercises the right power intent.

## II. SOC LEVEL POWER REDUCTION TECHNIQUES

Active power is directly proportional to the IP activities and can be optimized by analyzing the IP activity. The grouping of such IPs in design blocks along with the techniques like power scaling, clock gating, power gating can effectively bring down the active power consumption.

Typically, the transition latency increases as the aggressiveness of the power saving technique increases. The transition latency directly affects duration required to recover from power down state. The degree of power saved in power shut off is the maximum but it comes at the expense of higher latency. Based on the requirements for power saving and IP functional requirements, a three step approach was implemented to reach the lowest power state.

Low Power Entry - The first step involves gating the IP based on idle time which restricts the number of toggles at IP boundary resulting in significant reduction in power consumption. The IPs in that power domain also quiesce and perform low power handshake with NOC and power controller via Q-channel or C-channel interface.

Clock Shut Off - Two step approach was identified to efficiently gate the clocks. The Low power entry triggers the clock shut off FSM cutting off all the clocks at the IP boundary. Assertions based checkers were developed to monitor all free running clocks across the SOC based on the clock tree diagram architecture. The data from the analysis of the checkers resulted in identification more than 100 free running clocks that were brought under the clock shut off logic resulting in huge power saving.

Power Gating – The third step in our approach involved turning of the virtual power rail to power island using the power switch. During the power gating the signals will be driven to X for simulation purposes and the outputs are isolated to 0/1.



The current implementation to reach the lowest power state involves a sequential transition beginning with entry to low power state  $\rightarrow$  clock shutoff  $\rightarrow$  isolation enable  $\rightarrow$  power shut off  $\rightarrow$  reset assertion. The SOC also supports retention of few memories/flops through controllable knobs.



Figure 3. Handshake Protocol

With increase in complexity, the additional logic and requirement to keep the power consumption low, the need for bifurcating the complete design across multiple controllable power domains becomes imminent. Based on the analysis of the architecture and to support the power requirements a total of >70 power domains were identified across the 13 blocks of the design (Figure 4). The complexity of the design and stringent use case scenarios makes the power rails off and on process critical in power saving technique [1].



Figure 4. Master and Slave configuration to control the different Power domain

The master-slave configuration and handshake are controllable by both hardware(HW) and software(SW) triggers. In addition to this, the FSM in the design allows the controls to be configured at different phases of power down (Figure 3). Operational stage (OPR), Clock shut off(CSO), Retention(RET), Power shut off(PSO) form the different phases and these phase transition are controllable by software as well (Figure 5).





Figure 5 Power Off Process in IP

To achieve maximum power reduction and to enable ease of control, the design was enabled with both hardware and software control mechanism. The hardware controlled process is based on event trigger mechanism. The trigger is generated from core/security sub system or registers in Always-On domain of the targeted subsystem. Based on the event trigger programming the power down and power up sequence will be initiated. To reinforce the verification, the assertions were developed to monitor the change of intermediate status during the different power stages. The hardware controls are also supplemented with software controls that take charge of the events and the power transitions via APB interface. The software triggers include the register writes to the clock controller to gate clocks at the boundary of PLL, the individual power domain controls like retention, clock shut off (from power controller) and block logic resets.

The power controller in the SOC supports clock frequency gearing as well using a programmable AXI interface controlling the clock manager as part of achieving the low power target. The block level low power state transitions are controlled by hardware trigger in which each power state is coupled with a combination of low power techniques in different power domains (Figure 6).



Figure 6. Power State Diagram for Individual Blocks

The sheer number of instances of the IP with very high complexity resulted in a very challenging task for the verification engineers. With multiple power domains/rails controlling vastly different and independent systems with myriad of use cases resulted in a huge volume of tests to be run at SOC level where the simulation times are very high. A few of the most challenging tasks were to identify the x-propagation resulting from incorrect port driver, aligning the power sequence guide with design architecture implementation, to get the power consumption to match with respect to expectation and requirements

To reinforce the verification and gain confidence all the tests developed for the RTL simulations were run with native low power simulations. This approach opened up a new possibility where the power estimation analysis could be done real use case scenarios that will be run on the actual chip. To shift left and mimic the real power gating scenarios with PG netlists, the retention and isolation strategies were added to the UPF3.0 that are taken for final synthesis. To avoid x-propagation the isolation cells are used in conjunction with clamp cells, which can drive the outputs to a known value. To reinforce the verification assertion checks were added for clamping value. For the retention cells which are placed in the power domains, the verification is accomplished for the memories/flops with power rail off and power gating scenarios. To complete the verification and gain better confidence, code (branch, expression, toggle, FSM) and functional coverage (including low-power) at SOC level were taken to closure within stringent timelines.



#### III. RESULTS

| Power Gating(PG)/Clock Gating(CG) from testcase analysis at SOC Level |                     |                        |                     |              |          |               |  |  |  |  |  |
|-----------------------------------------------------------------------|---------------------|------------------------|---------------------|--------------|----------|---------------|--|--|--|--|--|
|                                                                       |                     | hardware<br>Controlled | Software Controlled |              |          |               |  |  |  |  |  |
| Scenario                                                              | Low Power<br>Stages | Master PG              | Block PG            | Master<br>CG | IP<br>CG | % Power Saved |  |  |  |  |  |
| A                                                                     | Level0              |                        |                     |              |          | 0%            |  |  |  |  |  |
| В                                                                     | Level1              |                        |                     |              | Yes      | 42%           |  |  |  |  |  |
| С                                                                     | Level2              |                        | Yes                 |              |          | 68%           |  |  |  |  |  |
| D                                                                     | Level3              |                        |                     | Yes          |          | 52%           |  |  |  |  |  |
| Е                                                                     | Level4              | Yes                    | Yes                 |              |          | 83%           |  |  |  |  |  |

Table 1. SOC Level Power Consumption

The power reduction techniques were experimented on verification scenarios in combination with the hardware supported methods to achieve the power consumption figures as per requirements. The initial figures revealed that power consumption was 2.1x - 3.4x of the specified values in each of the scenarios. The SOC supports software programing of registers for clock controllers, the power controller and the power pin controllers. Based on the analysis of the power pin controls and power consumption from the functional scenarios created, the design architecture was tweaked resulting in upto 83% power reduction (Table 1). As shown in the above table, the lowest power consuming state is reached only when the chip stays idle for a longer duration and the whole chip will wake up from standby once an external trigger is received eg. a GPIO trigger. If a block is in idle state, the whole block / power domain will go to clock gated stage so that the wake up is faster and functionality is not affected. Based on the idle time duration, the chip goes to multiple intermediate low power stages between IP level clock gating/ gearing to chip level standby.

|   | Block Level Analysis |                       |                    |                       |                    |                     |        |  |  |  |  |
|---|----------------------|-----------------------|--------------------|-----------------------|--------------------|---------------------|--------|--|--|--|--|
|   | System Register,     |                       |                    |                       |                    |                     | Power  |  |  |  |  |
|   | NOC                  | MCU                   | Camera controller  | Image Processor       | Display controller | DMA                 | Saving |  |  |  |  |
| A | FC, NOP (100Mhz)     | FC,NOP (800Mhz)       | FC,NOP(800 Mhz)    | FC,NOP(800Mhz)        | FC,NOP (800Mhz)    | FC,NOP (800Mhz)     | 0%     |  |  |  |  |
| В | FC, NOP (100Mhz)     | FC,NOP (800Mhz)       | Clock gearing(100N | Clock gearing(100Mhz) | Clock gearing(100N | Clock gearing(100Mh | 10%    |  |  |  |  |
| С | FC, NOP (100Mhz)     | FC,NOP (800Mhz)       | FC,NOP (800Mhz)    | FC,NOP (800Mhz)       | Clock Gating       | FC,NOP (800Mhz)     | 25%    |  |  |  |  |
| D | FC, NOP (100Mhz)     | FC,NOP (800Mhz)       | Clock Gating       | Clock gating          | Clock Gating       | Clock Gating        | 40%    |  |  |  |  |
| E | FC, NOP (100Mhz)     | Clock Gearing(100Mhz) | Clock Gating       | Clock gating          | Power Gating       | Clock Gating        | 58%    |  |  |  |  |
| F | FC, NOP (100Mhz)     | FC,NOP (800Mhz)       | Power Gating       | Power Gating          | Power Gating       | Power Gating        | 68%    |  |  |  |  |
| G | FC, NOP (100Mhz)     | Clock Gated           | Power Gating       | Power Gating          | Power Gating       | Power Gating        | 79%    |  |  |  |  |
| Н | Clock Gated          | Power Gated           | Power Gating       | Power Gating          | Power Gating       | Power Gating        | 91%    |  |  |  |  |

Table 2. Block level power analysis for scenarios

The block level power reduction techniques include clock gating(CG), clock gearing, power gating(PG), power gearing, etc. The block level clock gearing is achieved from software programing in the register block which gears down the clock to a lower value. Clock gating can be achieved from the power state or by programming register of the clock controller. The different power domains in different blocks are connected to the power rails with UPF. The different power rails and the internal power wires are powered on and off by the internal logic which is controlled by SW. The results of the trials on various low power states in a block is mentioned in table 2. The power consumption is further reduced by turning the power rails off from the master controller but that makes the whole block functionally off.

Another major finding that improves the performance of the power transition is that the software trigger vs hardware triggered transition performance. The analysis showed that the number of transition required for the hardware triggered transition is only 40 clock cycles whereas the requirement for the same transition for software triggered event is 200 cycles (since this involves slow speed APB clock). The major setback of the hardware trigger only approach is the power figures achieved doesn't meet the required value in particular scenario. This is the reason to go about the combination of software and hardware approaches to achieve the rated power figures.



#### IV. CONCLUSION

The current verification scope includes features like clock gating, clock gearing, isolation, power gating, power rail off and retention tests. In the test sequences, the power down and power up is not only controlled by the hardware parameters but also software triggered. From the test results' analysis, It was concluded that the combination of software and hardware triggered events help achieve the power targets with required performance. In our SOC level standby test and obtained corresponding power estimation report, the Master-Slave power management methodology used in the SOC saves power tremendously. For a SOC with 1 global master power controller, and the 14 individual blocks with subsidiary slave controllers, helped reduce the power consumption during the low power states by 1/10<sup>th</sup> of the total power at top. The power transition sequences which were formulated in our low power verification, includes states in which only one of the block is active and the others are all in power down state.

The thorough use case testing of the power controller blocks resulted in reducing the SOC power consumption by a great extent when coupled with the various use cases arising by analyzing the power estimation results from PTPX. The iterative verification led to multiple low power SOC states with combinations of power shut off, clock shut off, retention and rail supply off. In the chip standby low power state, block with master power controller is the only one which is active and the other blocks are all powered down. As the transition to this state has high latency, multiple intermittent power saver strategies are being formulated involving both hardware and software controls which is expected to provide us further power saving in the range of 10-75% when functional datapath is not required to be run. These sequences are now being used as is by software teams to develop the final product drivers and the early silicon power values strongly correlate with the power estimation values run on verification scenarios.

#### REFERENCES

- [1] Harshal Kothari, Eldin Ben Jacob, Sriram Kazhiyur Soundarrajan, Somasunder Kattepura Sreenath, "Challenges In Power-Aware Verification with Hardware Power Controller and Novel Approach to Harness Xcelium Low-Power Functional Coverage for Complex SoC", CadenceLive 2021
- [2] Low-Power Simulation Guide (IEEE 1801), Cadence learning and support, 2022