# Design and Verification of the PLL using the new DCO and Its Applications to Built-In Speed Grading of Arithmetic Circuits.

Yi-Sheng Wang Hsiang-Kai Teng Shi-Yu Huang

Electrical Engineering Department, National Tsing Hua University, Taiwan

Abstract—The Digitally Controlled Oscillator (DCO) is a key component in a cell-based Phase-Locked Loop (PLL) for onchip high-speed clock generation. In this work, we design a cell-based DCO to achieve a more linear DCO period profile with a consistently small resolution of just 1ps. This DCO is embedded into an in-house PLL to evaluate its impact on the output clock period jitter. For verification, a mixed-level simulation method is applied to speed up the process, in which the data-path of the PLL is modeled in the transistor level while the controller in the RTL. Post-layout simulation results reveal that our DCO can achieve a resolution in the range of [0.62ps, 1.25ps] for all 5 process corners in the most extreme temperature range from -40°C to 150°C. At the same time, peak-to-peak jitter of our PLL over 2000 clock cycles after locking is reduced from the original 10.86ps to 6.07ps, with a reduction ratio of (10.86-6.07)/10.86 = 44%. Last, but not the least, we show the results of using our PLL to perform online speed grading for arithmetic circuits.

*Index Terms*—cell-based design, PLL, DCO, Varactor, step resolution, Built-In Speed Grading

# I. INTRODUCTION

Built-In Speed Grading (BISG) has been applied to find out the maximum operation speed  $(F_{max})$  of the circuit under test. All-digital cell-based Phase-Locked Loops (PLLs) have been used in the application of on-chip clock generation in BISG progress in [1]-[12]. Due to its regular architecture, process migration of this type of PLL is often much easier than its analog/mixed-signal counterparts. The basic architecture of All-digital cell-based PLL is shown in Fig. 1. It is constructed by a Phase Detector, Controller, Digitally Controlled Oscillator (DCO), and Frequency Divider. The Phase Detector compared the phase between reference clock and the output signal of Frequency Divider, and the Controller will adjust the control code to DCO according to the comparison result, Lead/Lag signal. The output frequency of DCO go through the Frequency Divider to compare with reference clock to complete the loop. Thus, the DCO determine the supporting clock frequency range and the peak-to-peak jitter of PLL at most.





The rest of this paper is organized as follows. Section 2 introduces the design of the new DCO. Section 3 verification

methodology and simulation result of the PLL with the new DCO and BISG. Section 4 concludes.

## II. DESIGN OF THE NEW DCO

We design a new DCO which is shown in **Fig. 2**. This DCO consists of only 3 logic gates – the start-up gate and two buffer gates. All of them are varactor loaded. The coarse-tuning  $\beta$ -code controls the *latch-based varactor cells* at the outputs of the two buffer gates, while the fine-tuning  $\gamma$ -code controls the *NAND-based varactor cells* at the output of the start-up gate. In comparison, a NAND-based varactor cell provides a smaller amount of loading effect when turned on. On the other hand, a latch-based varactor cell provides a larger and more flexible loading effect.



Fig. 2: New DCO architecture.

The amount of change in delay when turning on a latch-based varactor cell depends on the **driver-to-varactor size ratio**. For example, when a driver of X4 is loaded by an X1-sized latch-based varactor, the tuning effect is small. However, if we increase the driver strength to X20, then the tuning effect becomes smaller. In this work, **progressively sized latch-based varactor cells** has been designed. The comparison between same sized and progressively sized varactor cells is shown in **Fig. 3**. **Fig. 3(a)** is DCO Period Profile comparison between the the same sized XL and **progressively sized** {X2, X2, X1, X1, XL, XL, XL, XL}, and **Fig. 3(b)** is DCO Period Profile comparison between the the same sized X1 and **progressively sized**. The key characteristics are listed in **Table 1**. It shows that the **progressively sized one** can get a more linear DCO period profile while support a wider period range.



Fig. 3: DCO Period profile of of progressively sized and same size.

Table 1: Characteristics of the progressively sized and same size

| DCO Characteristic Table    | Same Size XL | Same Size X1 | Progressive |
|-----------------------------|--------------|--------------|-------------|
| DCO Period Range (ps)       | 306 ~ 408    | 309 ~ 495    | 316~506     |
| β Segment Size Range (ps)   | 29 ~ 34      | 30 ~ 36      | 30~35       |
| β-Segment Overlap Ratio (%) | 79.4 ~ 89.7  | 50~83.3      | 61.3 ~ 74.2 |

Table 1: Characteristics of the progressively sized and same size

The layouts of the reference DCO and the proposed DCO using a 90nm CMOS process are shown in **Fig. 4**. For the new version, we have added a path-selection part controlled by the  $\alpha$ -code so that the period can be extended.



Fig. 4: The layout of two versions of DCOs using a 90nm CMOS process.

The verification flow chart is shown in Fig. 5. We use the EDA tool, VCS-XA, developed by Synopsys Inc. to simulation the DCOs, and perl scripts to analyze the characteristics of the DCO. The key characteristics of the two DCOs are listed in Table 2. Our new version excels in particularly two metrics: the  $\beta$ -segment size and the resolution. Originally, the  $\beta$ -segment size is [45ps, 252ps]. Now it becomes more uniform as [29ps, 32ps]. Originally, the resolution is [0.52ps, **3.85ps**]. Now it becomes more uniform as [0.79ps, 0.96ps]. Based on this post-layout simulation results in the typical environment, the most significant advantage of this proposed DCO over the reference one is the reduction of the worst-case resolution from 3.85ps down to 0.96ps. Furthermore, the both DNL and INL is significantly smaller than reference one. It shows that the new DCO is much more linear than the reference one.



Fig. 5: The verification flow chart of DCO Table 2: Characteristics of the proposed DCO based on post-layout simulation in a 90nm CMOS process.

| Characteristics            | Reference DCO [13]  | Proposed DCO     |
|----------------------------|---------------------|------------------|
| 1. DCO Period Range        | 489 ~ 1247 (ps)     | 467 ~ 1946 (ps)  |
| 2. β-Segment Size          | 45 ~ 252 (ps)       | 29 ~ 32 (ps)     |
| 3. Resolution              | 0.52 ~ 3.85 (ps)    | 0.79 ~ 0.96 (ps) |
| 4. β-Segment Overlap Ratio | 46.4 ~ 68.0 (%)     | 61.3 ~ 74.2 (%)  |
| 5. Worst DNL / INL         | -72.5 / -155.7 (ps) | 3.4 / -6.9 (ps)  |

A robust DCO should be able to operate correctly under the most extreme process and operating conditions. We perform an extensive simulation to verify our DCO design under 5 process corners {TT, SS, FF, FS, SF} and 5 sampled temperatures in {-40°C, 0°C, 25°C, 85°C, 150°C}. The results are listed in **Table 3**. Among them, the resolution is [0.64ps, 1.17ps], meaning that the worst-case resolution is as small as 1.17ps even under the most extreme condition. Last but not least, the  $\beta$ -segment

overlap ratio is still positive, meaning that there is no period gap under the worst process and the worst temperature.

Table 3: Summary of Characteristics of the proposed DCO under extreme process and temperature conditions.

|   | Characteristics Summary of the Proposed DCO                                         |                                |  |  |  |
|---|-------------------------------------------------------------------------------------|--------------------------------|--|--|--|
|   | under Extreme Process and Temperature Conditions                                    |                                |  |  |  |
|   | Process Corners: {TT, SS, FF, FS, SF} x Temperatures {-40°C, 0°C, 25°C, 85°C,150°C} |                                |  |  |  |
| ĺ | 1. DCO Period Range                                                                 | 646 ~ 1493 (ps)                |  |  |  |
| Ī | 2. β-Segment Size                                                                   | 23 ~ 42 (ps)                   |  |  |  |
| ĺ | 3. Resolution                                                                       | 0.62 ~ 1.25 (ps)               |  |  |  |
| l | 4. β-Segment Overlap Ratio                                                          | 23.4 ~ 73.1 (%) Still Positive |  |  |  |
| _ |                                                                                     |                                |  |  |  |

# III. VERIFICATION METHODOLOGY AND SIMULATION RESULT OF THE PLL WITH THE NEW DCO AND BISG

A. Layout and Simulation Waveform of Optimized PLL



Fig. 6: The layout of PLLs using a 90nm CMOS process.



Fig. 7 Post-layout simulation waveforms of the cell-based PLL with the proposed DCO using a 90nm CMOS process (TT corner, 25 °C).

The layouts of cell-based PLLs with the reference DCO and the proposed DCO using a 90nm CMOS process are shown in **Fig. 6.** The simulation waveforms of the cell-based PLL with the proposed DCO using a 90nm CMOS process are shown in **Fig. 7**. We use the EDA tool, VCS-XA, developed by Synopsys Inc. to perform post-layout mixed-level simulation of our cellbased PLLs, and perl scripts to measure the peak-to-peak jitters. A 10,000ns simulation takes about 16 hours to complete. The simulation shows that our PLL is locked under the three-level control code  $\langle \alpha, \beta, \gamma \rangle = \langle 4, 16, 20\text{-}21 \rangle$ . Due to the havering of the  $\gamma$ -code between 20 and 21, the phase detector's output signal, i.e., *lead\_lag*, toggles several times within an observation window to indicate a locked condition.

Table 4: Characteristics of cell-based PLL with the proposed DCO based on post-layout simulation in a 90nm CMOS process.

| Characteristics        | The cell-based PLL with | The cell-based PLL with |
|------------------------|-------------------------|-------------------------|
|                        | the Reference DCO [13]  | the Proposed DCO        |
| 1. Area                | 21025(um <sup>2</sup> ) | 20164(um <sup>2</sup> ) |
| 2. Power               | 3.879mW                 | 3.008mW                 |
| 3. Peak-to-Peak Jitter | 10.86(ps)               | 6.07(ps)                |

The key characteristics of two versions of PLLs producing 1GHz clock signals by taking a 50MHz reference clock are listed in **Table 4**. Area achieves a reduction ratio of (21025-20164)/(21025 = 4.09%) and power achieves a reduction ratio of

(3.879-3.008)/3.879 = 22.45%, respectively. The post-layout simulation reveals that the peak-to-peak jitters within an observation window of over 2000 clock cycles after locking are indeed reduced from the original 10.86ps to 6.07ps, achieving a reduction ratio of (10.86-6.07)/10.86 = 44%.

#### **B.** Verifacation and Simulation Result of BISG

One key application of the PLL is used to as on-chip clock generation to produce different test clock frequencies. As the benefit, it helps us to perform BISG. The flow chart of BISG methodology is shown in **Fig. 8**. It is by performing multiple runs of BIST to approach the maximum operation speed (Fmax). However, it takes a lot of time to wait for the PLL locking and test pattern shifted in to perform all-transistor level BISG simulation. In order to speed up the simulation process and get the fast verification, there are two methodology. One is using mixed-level simulation and the other is verify BISG by performing Built-In Self Test (BIST) of different test clock frequencies at the same time.



Fig. 8: The flow chart of BISG methodology.

We use the EDA tool, VCS-XA, developed by Synopsys Inc. to perform post-layout mixed-level simulation. The mixedlevel simulation module is shown in **Fig. 9**, in which the most important part of the circuit, the data-path of the PLL and the circuit under test (e.g. 16 bits adder) with BIST component, is modeled in the transistor level to keep up the timing information precisely, while the controller of the PLL is in the RTL. The advantage of using RTL module for PLL controller is to set the initial control code of DCO close to the target frequency, that can make the PLL lock faster to reduce the simulation time. Furthermore, BISG is by performing multiple runs of BIST to approach the maximum operation speed (Fmax). Therefore, to get the fast verification result, we can simulate the BIST under different test clock frequencies at the same time. This may takes fewer time since the PLL can lock faster due to mixed-level simulation but also no need on waiting the BIST pass/fail signal to change the test clock frequency.



Fig. 9: The mixed-level simulation module for BIST.

To get fast verification, the simulation waveforms of BIST for 16 bits adder under 1ns and 1.1ns test clock using a 90nm CMOS process are shown in **Fig. 10 and Fig. 11** respectively. Both test clock 1ns and 1.1ns take about 7500ns on waiting the cell-based PLL to lock. After the test clock being prepared, we start the BIST progress. The BIST progress is done, once shifting in all the test pattern generated by LFSR. Then we compare between the signature compressed by MISR and golden signature. The pass/fail signal goes high if they are the same, which means the CUT can operate under that test clock. For example, a 16 bits adder can operate under 1.1ns and may fail under 1ns.





Fig. 10: The simulation waveforms of BIST for 16 bits adder under 1.1ns.

The simulation results of critical path delay (the reciprocal of  $F_{max}$ ) for different bits binary adder using a 90nm CMOS process are shown in **Fig. 12**. The result includes design compiler report result, gate-level simulation result, mixed-level simulation result. The result shows that we can do speed grading for different CUT to find out  $F_{max}$  by these verification while reduce simulation time.



Fig. 12: The simulation results of critical path delay (the reciprocal of Fmax) for different bits binary adder.

#### **IV. CONCLUSION**

To get a more accurate on-chip high-speed test clock signal, we design a new DCO architecture. We use the EDA tool, VCS-XA, to simulate and perl scripts to analyze the characteristics of the DCO. This can make us find out the DCO good or not quickly instead of observing the simulation waveform in detail.

As the result, we can achieve better features beyond existing solutions. First, our architecture does not have any "DCO period gap" even under 5 process corners and an extreme temperature ranging from -40 to 150. This feature makes our DCO very robust. Last, but not the least, our DCO has a linear DCO period profile, thus contributing to a uniform 1ps step resolution across the supported output clock period range. Compared to a prior reference design, the worst-case resolution has been effectively reduced from 3.85ps to 0.96ps in a typical condition or to 1.25ps in extreme process and temperature conditions, leading to a peak-to-peak jitter reduction of 44% from the original 10.86ps to 6.07ps in the PLL that uses this new DCO, while the area and the power consumption is less than the previous version.

And to get fast verification of BISG, we use mixed-level simulation and perform BIST of different test clock frequencies. This takes less time on simulation and can successfully find out the circuit under test operation speed.

#### **ACKNOWLEDGMENTS**

This work was sponsored in part by the Ministry of Science and Technology (MOST) of Taiwan under research grants MOST-111-2221-E-007-115-MY3, and in part, by Power Semiconductor Manufacturing Corp. We also acknowledge the help of Taiwan Semiconductor Research Institute (TSRI) for providing the access to the EDA tools.

## REFERENCES

- M. Combes, K. Dioury, and A. Greiner, "A Portable Clock Multiplier Generator Using Digital CMOS Standard Cells," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 958-965, Jul. 1996.
- [2] T. Olsson and P. Nilsson, "A Fully Integrated Standard-Cell Digital PLL," *IEE Electron. Letters*, vol. 37, pp. 211–212, Feb. 2001.
- [3] T. Olsson and P. Nilsson, "A Digitally Controlled PLL for SoC Applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 751-760, May 2004.
- [4] C.-C. Chung and C.-Y. Lee, "An All-Digital Phase-Locked Loop for High-Speed Clock Generation," *IEEE J. Solid-State Circuits*, vol. 38, no. 3, pp. 347-351, Feb. 2003.
  [5] P.-L. Chen, C.-C. Chung, and C.-Y. Lee, "A Portable Digitally
- [5] P.-L. Chen, C.-C. Chung, and C.-Y. Lee, "A Portable Digitally Controlled Oscillator Using Novel Varactors," *IEEE Trans. on*

Circuits and System II, Express Briefs, vol. 52, no. 5, pp. 233-237, May 2005.

- [6] D. Sheng, C.-C. Chung, and C.-Y. Lee, "An All-Digital Phase-Locked Loop with High-Resolution for SoC Applications," *Proc.* of Int'l Symp. on VLSI Design, Automation and Test (VLSI-DAT), April 2006.
- [7] D. Sheng, C.-C. Chung, and C.-Y. Lee, "An Ultra-Low-Power and Portable Digitally Controlled Oscillator for SoC Applications," *IEEE Trans. on Circuits and System II, Express Briefs*, vol. 54, no. 11, pp. 954-958, Nov. 2007.
- [8] H.-J. Hsu and S.-Y. Huang, "A Low-Jitter ADPLL via a Suppressive Digital Filter and an Interpolation-Based Locking Scheme," *IEEE Trans. on VLSI Systems*, vol. 17, no. 11, 2009.
- [9] S.-K. Lee, Y.-H. Seo, Y. Suh, H.-J. Park, and J.-Y. Sim, "A 1GHz ADPLL with a 1.25ps Minimum-Resolution Sub-Exponent TDC in 0.18µm CMOS," *IEEE Int. Solid-State Circuits Conf.*, pp. 482-483, Feb. 2010.
- [10] P.-Y. Chao, C.-W. Tzeng, S.-C. Fang, C.-C. Weng and S.-Y. Huang, "Low-Jitter Code-Jumping for All-Digital PLL to Support Almost Continuous Frequency Tracking," *Proc. of IEEE Int'l Symp. on VLSI Design, Automation and Test* (VLSI-DAT), April 2011.
- [11] M.-H. Hsieh, L.-H. Chen, S.-I. Liu, and C. C.-P. Chen, "A 6.7MHz-to-1.24GHz 0.0318 mm<sup>2</sup> Fast-Locking All-Digital DLL in 90nm CMOS," *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 244-245, 2012.
- [12] P.-Y. Chao, C.-W. Tzeng, S.-Y. Huang, C.-C. Weng, and S.-C. Fang, "Process Resilient Low-Jitter All-Digital PLL via Smooth Segment jumping", *IEEE Trans. on VLSI Systems* (TVLSI), Vol. 21, No. 12, pp. 2240-2249, Dec. 2013.
- [13] C.-W. Tzeng and S.-Y. Huang, "Parameterized All-Digital PLL Architecture and its Compiler to Support Easy Process Migration", *IEEE Trans. on VLSI Systems* (TVLSI), Vol. 22, No. 3, pp. 621-630, Mar. 2014.
- [14] J. G. Maneatis, "Why Synthesizable-Digital PLLs Are No Substitute for Hardened Mixed-Signal PLLs", online article, May 22, 2019.
- [15] Z.-H. Zhang, W. Chu, and S.-Y. Huang, "A Ping-Pong Methodology for Boosting the Resilience of Cell-Based Delay-Locked Loop", IEEE Access, Vol. 7, pp. 97928-97937, Aug. 2019.