# The Bang Bang PLL as a Clock Source in Serial-De-Serializer (SERDES) Applications by Raleigh Smith, BSc., MASc., P.Eng. A dissertation submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Ottawa-Carleton Institute for Electrical and Computer Engineering Department of Electronics Engineering Carleton University Ottawa, Ontario February, 2021 ©Copyright Raleigh Smith, 2021 ## Abstract Demands for increased wireline data throughput necessitate multi-GHz clock sources of ever-greater fidelity. At the same time, there has been resolute industry pressure for process geometry size reduction, digital circuit implementation and modularization to fulfill the objectives of development cost reduction, scalability, increased functionality and decreased power dissipation. In aid of these objectives, this work demonstrates a digital bang-bang phase-locked loop that develops the 14-GHz clock for a 56-Gb/s PAM-4 transceiver. This low jitter clock source is realized using an LC-based digitally-controlled oscillator having a frequency tuning range of 14 % and worst case resolution of 2.0 MHz/LSB. The major digital functions of the band-bang phase-locked loop are consolidated in a single, fully-synthesized digital signal processing unit operated at 3.5 GHz or 10x the reference clock frequency. Limit cycles are minimized, without the aid of a multibit time-to-digital-converter, through substantial reduction of loop latency using a look-ahead digital loop filter. Various design techniques exploiting an advanced 7-nm FinFET technology are discussed including noise reduction, frequency resolution and tank Q-enhancement. Additionally, methods of accurately modelling a digitally-controlled oscillator and linear loop analysis of the bang-bang phase-locked loop are demonstrated. Closed-loop phase noise performance is accurately predicted using an industrystandard digital event-driven simulator with dramatically reduced computation effort compared to analogue or mixed-mode simulations. Here, a method of faithfully calculating various noise profiles for digitally-controlled and reference oscillators is exploited. The measured RMS random jitter of the BBPLL, integrated from 1 kHz to 100 MHz, is 143 fs and shows limit-cycle free operation resulting in minimal spurious tone activity in the frequency spectrum. The BBPLL consumes 40 mW of power, while the DCO consumes 14.8 mW of this total. The RMS jitter demonstrated in this thesis is consistent or better than analogue charge-pump PLLs of comparable frequency and significantly better than the reported BBPLLs at very competitive area and power dissipation. To my parents, members in good standing of the Greatest Generation. # Acknowledgments I would to thank my supervisor, Prof. Ralph Mason, for his guidance, support and for securing the opportunity to carry out my PhD work with TSMC. His active participation in the success of this work is earnestly appreciated. Further to this, my deepest gratitude and thanks must go to Dr. Dirk Pfaff for his technical leadership, instruction and patient coaching throughout this whole project. His expectations and persistence pushed me further than I would have thought possible. Many thanks are also due to Dr. Babak Zamanlooy and Dr. Muhammad Nummer whose knowledge, help and advice I shamelessly exploited. Special thanks is owed to the remaining members of the Ottawa TSMC mixed-signal design team, including Robert Abbott, Dr. Xin-Jie Wang and Dr. Shahaboddin Moazzeni, for their technical contributions. Additionally, the essential work done by the layout team, Rolando Villanueva and Micheal Kozlov, as well as the test and analysis effort carried out by Tae-Young Goh and Dr. Rolando Ramirez needs to be recognized. Also, I am very grateful for the generous help given to me by friends and colleagues Jerzy Wieczorkiewicz and Dr. Augusto Lima, as well as the many technical and non-technical discussion with fellow graduate students Xing Zhou and Dr. Nahla AbouElKheir. Much thanks is owed to the faculty and staff of the Electronics Department of Carleton University for their support throughout my graduate work. Finally, this work would not have been possible without the organizational support I received from TSMC through Cormac O'Connell. # Table of Contents | Ał | ostra | nct | ii | |-----|---------------|---------------------------------------|-----| | Ac | kno | wledgments | v | | Га | ble o | of Contents | vi | | Lis | st of | Tables | ix | | Lis | st of | Figures | xi | | No | omer | nclature | xiv | | 1 | Int | roduction | 1 | | | 1.1 | Motivation | 1 | | | 1.2 | Objectives, Contributions and Novelty | 7 | | | 1.3 | Conference and Journal Submissions | 8 | | | 1.4 | Thesis Outline | 9 | | 2 | Ba | ckground | 11 | | | 2.1 | Industry Direction | 11 | | | 2.2 | FinFET Transistor Overview | 14 | | | 2.3 | SERDES Applications | 16 | | | 2.4 | Bang Bang Phase Locked Loop (BBPLL) | 18 | | 3 | $\mathbf{Th}$ | e Bang Bang Phase Locked Loop | 22 | | | 3.1 | Introduction | 22 | | | 3.2 | Bang Bang Phase Detector | 24 | | | 3.3 | Digital Loop Filter | 30 | | | 3.4 | LSB Dithering - Sigma Delta Modulator | 33 | | | 3.5 | Linearization of BBPLL Loop Equations | 34 | |---|------|----------------------------------------------------------------|----------------| | | 3.6 | Application of Linearized Loop Equations | 41 | | | 3.7 | Summary | 44 | | 4 | The | e DCO | 46 | | | 4.1 | Introduction | 46 | | | 4.2 | Specification | 51 | | | 4.3 | Varactor Selection | 52 | | | 4.4 | Tuning Array Range and Resolution | 56 | | | 4.5 | $Con/Coff\ Optimization\ .\ .\ .\ .\ .\ .\ .\ .$ | 61 | | | 4.6 | Row Loss Minimization - Q Optimization | 64 | | | 4.7 | Class-C Oscillator Architecture | 65 | | | 4.8 | DCO Model Simulation | 71 | | | 4.9 | DCO Implementation | 79 | | | 4.10 | Description of Frequency Tuning Array Rows | 84 | | | 4.11 | 28-GHz and 14-GHz Inductor Designs | 87 | | | 4.12 | Summary | 93 | | 5 | DC | O Current Source | 95 | | | 5.1 | Introduction | 95 | | | 5.2 | Constant Current Source Implementation | 96 | | | 5.3 | Thermal and Flicker Noise in FinFET Devices | 98 | | | 5.4 | Noise Reduction from Source Degeneration | 104 | | | 5.5 | Summary | 110 | | 6 | вв | PLL Time-Based Simulation and Measurement | 112 | | | 6.1 | Introduction | 112 | | | 6.2 | The Time-Based Model | 113 | | | 6.3 | DCO Noise Model Generation | 113 | | | 6.4 | Crystal Oscillator | 118 | | | 6.5 | PLL Closed-Loop Noise Prediction | 118 | | | 6.6 | Summary | 121 | | | | | | | 7 | | nulation and Test Results | 122 | | 7 | | nulation and Test Results Simulated Results and Die Micrograph | <b>122</b> 122 | | 7.3 | Phase-Locked Loops in Wireline (SERDES) Applications | 128 | |---------|------------------------------------------------------|-----| | 8 Coi | nclusions and Future Work | 131 | | 8.1 | Conclusions | 131 | | 8.2 | List of Contributions | 132 | | 8.3 | Future Work | 133 | | List of | References | 136 | | Append | dix A Oscillator Phase Noise | 144 | | Append | dix B Array Resolution Derivation | 156 | | Append | dix C DCO Model Code | 159 | | C.1 | Contents - row_evaluation.m | 159 | | C.2 | Contents - DCO_tuning_range_evaluation.m | 166 | | Append | dix D Inductor Leg Parasitic Analysis | 171 | # List of Tables | 3.1 | BBPLL Loop Parameters for $A_I=1$ and RMS Jitter $\sigma_{\Delta t}=150$ fs | 43 | |------|--------------------------------------------------------------------------------|-----| | 4.1 | Varactor Capacitance/Finger (aF) | 55 | | 4.2 | Varactor Con/Coff Ratio per Finger (aF/aF) | 55 | | 4.3 | Varactor Q/Finger | 56 | | 4.4 | Type II/III Row Varactor Pair Delta Capacitance | 58 | | 4.5 | PVT Variation of Varactor Capacitance | 64 | | 4.6 | Minimum Q Across Type I Row for A, B, C and D Layouts | 64 | | 4.7 | Min/Max Type I Row Q for A, B, C and D Layouts Across Corners . | 65 | | 4.8 | Large Signal Simulation Capacitance Results for Slow/Fast PVT and | | | | Layout Extraction rcworst_CCworst/rcbesst_CCbest | 72 | | 4.9 | Tuning Range and Loss at 28.0 GHz - Layout A (Slow - reworst CCworst) | 74 | | 4.10 | Tuning Range and Loss at 28.0 GHz - Layout B (Slow - reworst CCworst) | 74 | | 4.11 | Tuning Range and Loss at 28.0 GHz - Layout C (Slow - reworst CCworst) | 74 | | 4.12 | Tuning Range and Loss at 28.0 GHz - Layout D (Slow - reworst CCworst) | 75 | | 4.13 | Large Signal Capacitance for Worst, Typical and Best Corners | 76 | | 4.14 | Frequency Control Word (S) Derivation | 81 | | 4.15 | Skin Depth Comparison and Current Crowding Factor | 89 | | 4.16 | HF Resistance Including Skin Effect and Current Crowding | 90 | | 4.17 | Inductor Field Solver Analysis at 28 GHz | 90 | | 4.18 | Extracted DC Resistance by Corner | 91 | | 4.19 | Inductor Field Solver Analysis at 14 GHz | 93 | | 5.1 | Current Source Degeneration Results for $R_S = 420 \Omega \dots 1$ | 106 | | 5.2 | Noise Reduction due to Reduced $I_{ds}$ for Values of $R_S$ | 109 | | 5.3 | Noise Reduction Factors due to Degeneration for Values of $R_S$ | 110 | | 5.4 | Calculated Flicker and Thermal Noise Reduction for Values of $\mathcal{R}_S$ 1 | 110 | | 5.5 | Total Noise Reduction - Simulated vs. Calculated | 111 | | 6.1 | DCO Parameter Values | 116 | | 6.2 | DCO Parameter Values | 117 | |-----|-------------------------------------------------|-----| | 6.3 | Simulated vs. Measured Jitter (1 kHz - 100 MHz) | 119 | | 7.1 | Simulated Results | 122 | | 7.2 | Contemporary SERDES PLL Performance Comparison | 130 | # List of Figures | 2.1 | Planar MOSFET vs. FinFET Transistors Layout | 15 | |------|-----------------------------------------------------------------------------------------------------------------|----| | 2.2 | Cross Section of FinFET Transistors | 15 | | 2.3 | SERDES Application | 17 | | 2.4 | All Digital Phase Locked Loop with LC-Tank DCO | 19 | | 2.5 | Bang Bang Phase Locked Loop | 21 | | 3.1 | Second-Order Digital Bang-Bang PLL Functional Block Diagram | 23 | | 3.2 | a), b), c) Gaussian and d), e) f) Uniform Jitter Convolution | 25 | | 3.3 | BBPLL Jitter Component vs. Loop Gain Example | 29 | | 3.4 | BPD to DCO Control Path Functional Block Diagram | 31 | | 3.5 | Overflow/Underflow Checker Functional Block Diagram | 32 | | 3.6 | DLF Look-Ahead Structure Functional Block Diagram | 33 | | 3.7 | z-domain BBPLL Model | 35 | | 3.8 | z-domain Functional Block Diagram of BBPLL Model | 36 | | 3.9 | BBPLL Open Loop Response - $A_P/A_I=40/1,\mathrm{RMS}$ Jitter = 150 fs . | 42 | | 3.10 | BBPLL Closed Loop Response - $A_P/A_I=40/1, \text{RMS Jitter}=150 \text{ fs}$ | 42 | | 4.1 | Classical Linear Feedback Model | 47 | | 4.2 | $\label{eq:comulation-Mode} \mbox{Accumulation-Mode vs. Inversion-Mode PMOS n-Well Varactor} \ \ . \ \ . \ \ .$ | 53 | | 4.3 | Row Type II/III $\Delta$ -Capacitor Configuration | 58 | | 4.4 | Typical Capacitance - Row Type I | 59 | | 4.5 | Typical $\Delta$ -Capacitance - Row Type II/III | 60 | | 4.6 | On Capacitance (6 Fin, L = 36 nm, 1 Finger) svt, ulvt vs. Amplitude | 61 | | 4.7 | On/Off Capacitance Ratio (6 Fin, $L=36$ nm, 1 Finger) svt, ulvt vs. | | | | Amplitude | 62 | | 4.8 | On/Off Capacitance (6 Fin, L = 36 nm, 1 Finger) Temperature Sensi- | | | | tivity | 63 | | 4.9 | Varactor (6 Fin, L = 36 nm, 1 Finger) Voltage Sensitivity $\dots$ | 63 | | 4.10 | Class-C Oscillator Core | 67 | | 4.11 | DCO Normalized Bias Current | 69 | |------|----------------------------------------------------------------------|-----| | 4.12 | Class-C Oscillator Biasing | 70 | | 4.13 | DCO Tuning from Model and Simulation - 28 GHz | 77 | | 4.14 | DCO Tuning Error - Model vs. Circuit Simulation Over Corners | 78 | | 4.15 | DCO Tuning from Model and Simulation - 14 GHz | 78 | | 4.16 | DCO Core Functional Block Diagram | 80 | | 4.17 | DCO Clock Buffer and Distribution Functional Block Diagram | 82 | | 4.18 | DCO Varactor Row Control Functional Block Diagram | 83 | | 4.19 | DCO Type I Varactor Row | 84 | | 4.20 | DCO Type II Varactor Row | 85 | | 4.21 | DCO Type III Varactor Row | 86 | | 4.22 | DCO Inductor Layout for 28.0 GHz Operation | 88 | | 4.23 | DCO Inductor Layout for 14 GHz Operation | 92 | | 5.1 | Oscillator Current Level Calibration Block Diagram | 96 | | 5.2 | Degenerated Selectable PMOS Current Source Implementation | 98 | | 5.3 | Channel Noise Circuit | 96 | | 5.4 | Source and Drain Interconnect Noise Circuit | 100 | | 5.5 | Gate Interconnect Noise Circuit | 101 | | 5.6 | Flicker $(1/f)$ Noise Circuit | 102 | | 5.7 | Channel Noise Degeneration Test Circuit | 106 | | 5.8 | M_Source $1/f$ and Thermal Phase Noise Before and After Degeneration | 107 | | 5.9 | Total M_Source Noise for Incremental Values of $R_S$ | 108 | | 6.1 | DCO Phase Noise - Simulated vs. Measured | 115 | | 6.2 | DCO Edge Position Error with 1-fs Resolution Jitter Calculations | 117 | | 6.3 | BBPLL Phase Noise and Trajectory - $A_I = 2, A_P = 7$ | 120 | | 6.4 | BBPLL Phase Noise and Trajectory - $A_I = 1, A_P = 40 \dots$ | 120 | | 6.5 | BBPLL Phase Noise and Trajectory - $A_I=100,A_P=2031$ | 120 | | 7.1 | BBPLL Die Micrograph - 7-nm Process | 123 | | 7.2 | Measured PN (1 kHz to 100 MHz Offset) - 14-GHz Output Frequency. | 124 | | 7.3 | Periodic Jitter Measurement | 125 | | 7.4 | Breakdown of PN Contributors | 126 | | 7.5 | DCO Frequency Tuning characteristic and Step Size | 127 | | A.1 | Oscillator Phase Noise Sources | 144 | | A 2 | Oscillator Phase Noise Spectrum | 146 | | A.3 | Tank Resonator Intrinsic Noise Sources | 147 | |-----|-------------------------------------------------------------------|-----| | A.4 | Impulse Sensitivity Function LC vs. Ring Oscillaor | 151 | | A.5 | Impulse Sensitivity Function for LC Oscillator | 152 | | A.6 | Noise Conversion from Intrinsic Noise Sources | 153 | | A.7 | Noise Conversion from the Current Source | 154 | | C.1 | 28-GHz Frequency Tuning Range with 20 Rows | 164 | | C.2 | Parallel Array Resistance - 28 GHz | 164 | | C.3 | Parallel Resistance Including Inductor and Array - 28 GHz $\cdot$ | 165 | | C.4 | Required DCO Current - 28 GHz | 165 | | C.5 | Tank Amplitude from 10 mA - 28 GHz | 166 | | D.1 | DCO Leg Inductance | 171 | # Nomenclature #### Abbreviations AC Alternating Current Accumulation-Mode Biasing of a metal-oxide-silicon semiconductor stack so majority carriers accumulate in the silicon near the oxide-silicon interface. ADPLL All-Digital Phase Locked Loop aF atto-Farad AF Flicker noise exponent AHVDD Analogue High VDD for input/output - nominal 1.5 V. AHVSS Analogue High VSS for input/output - ground voltage. AI or $A_I$ Digital Loop Filter Integral Gain Al Aluminum AM Amplitude Modulation AP (context - interconnect) - metal (Al) layer used to break- out signals to die bump pads, TSMC 7-nm process (thickest, $t = 2.4 \mu m$ ). AP or $A_P$ (context - Digital Loop Filter) Proportional gain AVDD Analogue DC supply voltage (VDD) - core transistor voltage - nominal 0.75 V. AVSS Analogue DC ground reference voltage (VSS) - core transistor voltage. BBPD Bang-Bang Phase Detector BBPFD Bang-Bang Phase Frequency Detector BBPLL Bang-Bang Phase Locked Loop BER Bit Error Ratio Biploar Binary Junction Transistor (BJT) BPD Binary Phase Detector BSIM4 Berkeley Short Channel (IGFET - Integrated Field Ef- fect Transistor) Model (4 - sub-100 nm). BW Bandwidth - contiguous frequency range. $C_{div}$ Total capacitance of the oscillator output buffer and di- vider. $C_{gm}$ Total capacitance of the oscillator core transistors. CICC Custom Integrated Circuits Conference Class-A Oscillator A harmonic oscillator where current flows continuously during the full output clock cycle. Class-B Oscillator A differential implementation of a Class-A oscillator. Class-C Oscillator A differential harmonic oscillator having a current con- duction angle of approximately 100 to 150 degrees. Class-D Oscillator A differential harmonic switching oscillator that pro- duces a large output magnitude from a low VDD. Class-F Oscillator A differential harmonic switching oscillator that employs transformer peaking to amplify a harmonic output clock. CMOS Complementary Metal-Oxide Semiconductor $C_{MOS}$ Capacitance of a Metal-Oxide Semiconductor varactor Coff Off-state capacitance of a circuit element. Con On-state capacitance of a circuit element. $C_{OX}$ MOSFET (FinFET) gate capacitance per unit area $(F/mum^2)$ . $C_{para}$ Total parasitic capacitance of the frequency tuning array. $C_{tail}$ Capacitance between the core transistor common source of a differential oscillator and ground. Cu Copper $C_{var}$ Total varactor capacitance of the frequency tuning array. dB Deci-Bell dBc Deci-Bell relative to a carrier signal level. dBc/Hz Deci-Bell relative to a carrier signal level per frequency cycle (Hz). DC Direct Current DCC Duty-Cycle Correction DCD Duty-Cycle Distortion DCO Digitally Controlled Oscillator DDR4 $4^{th}$ Generation Double Data Rate Synchronous Dynamic Random-Access Memory (1.2 V, 1600 - 3200 MT/s) defined by JEDEC. DDR5 $5^{th}$ Generation Double Data Rate Synchronous Dynamic Random-Access Memory (1.1 V, 3200 - 6400 MB/s) defined by JEDEC. DLF Digital Loop Filter DLL Delay Locked Loop EF Flicker noise frequency exponent. EM 3-D Electromagnetic Three-Dimensional Simulation Tool. f Frequency (Hz) FCW Frequency Control Word FF Fast NMOS/Fast PMOS transistor process corner. FFT Fast-Fourier Transform Fin The vertical portion of FinFET gate. FinFET fin Field-Effect Transistor - fin-shaped gate expanded into three dimensions. Finger The physical subdivision of a transistor gate (dimensions W/L - Width/Length). Flicker Noise Low frequency phase noise (Pink Noise) having a 1/f profile at baseband and $1/f^3$ profile after up-conversion. Oscillator output frequency (resonant frequency). fs femto-seconds GaAs Gallium-Arsenide Gaussian A statistical distribution generated by data having a random characteristic. GIDL Gate-Induced Drain Leakage gm Transistor conductance measured in Siemens (S). HF High Frequency $I_{bias}$ DC Bias current IC Integrated Circuit Id Tank\_Left AC drain current through the left core transistor of a differential oscillator. Id Tank\_Right AC drain current through the right core transistor of a differential oscillator. $I_{ds}$ Transistor drain to source current. $I_{dso}$ Transistor drain to source current with a source degen- eration resistance of $0 \Omega$ . I-MOS Inversion Mode Metal-Oxide Semiconductor varactor. $I_n$ Noise current. $I_{n1/f}^2$ Baseband transistor flicker noise power seen as drain current. $I_{nG}^2$ Channel thermal noise current due to gate resistance. Inversion-Mode Biasing of a metal-oxide-silicon semiconductor stack so minority carriers accumulate in the silicon near the oxide-silicon interface. IO Input/Output ISF or $\Gamma$ Impulse Sensitivity Function $I_{\omega 0}$ Effective oscillator LC-tank current. Jitter The variation in clock edge position w.r.t. an ideal ref- erence position. JSSC Journal of Solid-State Circuits k (context - noise model generation) clock edge number. k (context - thermal noise) Boltzmann constant (i.e., $1.38064852 \times 10^{-23} \ JK^{-1}$ ). K AC signal coupling factor across a DC blocking capaci- tor. $K_C$ Current crowding factor KF Flicker noise coefficient $K_{VCO}$ Voltage Controlled Oscillator conversion gain $(\Delta f/\Delta V)$ . L (context - inductor) Inductor or Inductance. L (context - transistor) gate Length. $\mathcal{L}(f)$ Single-sided phase noise. LC Inductor-Capacitor LC-tank A resonant circuit created using inductor and capacitor circuit elements. $L_{eff}$ Effective transistor gate length. Limit-Cycle Regime PLL operation where output phase/frequency oscillates about a fixed phase/frequency point. LMS Least Mean Squares - adaptive filter algorithm that con- verges to a minimum error. Loaded-Q The effective Q factor of a circuit, including all parasitic resistances that would contribute to circuit loss. LSB Least Significant Bit LSL Logical Shift Left LSR Logical Shift Right LVDS Low-Voltage Differential Signal m (context - FinFET) transistor multiplier M11 Metal (Cu) interconnect layer below M12, TSMC 7-nm process (thickness $t = 0.72 \ \mu m$ ). M12 A top layer of metal (Cu) interconnect, TSMC 7-nm process (thickness $t = 0.72 \ \mu m$ ). MCM Multi-Chip Modules MIM Metal-Insulator-Metal IC capacitor formed using metal plates of two or more layers requiring additional process steps. MOM Metal-Oxide-Metal IC capacitor constructed using mul- tiple inter-digitated fingers over a single or multiple metal layers. MOSFET Metal-Oxide Silicon Field-Effect Transistor nm nano-meter $N_{FV}$ Transistor flicker noise voltage in linear form. NMOS N-type Metal Oxide Semiconductor. NRZ Non-Return to Zero data encoding for transmission. $N_{TV}$ Transistor thermal noise voltage in linear form. P FinFET fin Pitch PAM-4 Pulse Amplitude Modulation (4-state) data encoding for transmission. PDF Probability Density Function PFD Phase Frequency Detector PI A circuit with both Proportional and Integral functions. PISO Parallel In Serial Out PLL Phase Locked Loop PM Phase Margin PM Phase Modulation PMOS P-type Metal Oxide Semiconductor. PN Phase Noise pp peak-to-peak PSD Power Spectral Density PSRR Power Supply Rejection Ratio PSS Periodic Steady State PVT Process, Voltage and Temperature. Q Resonator Quality Factor. $Q_{C\_worst}$ Worst case (PVT and extracted layout) Q of the DCO tuning array total capacitance. QEC Quadrature-Error Correction QED Quadrature-Error Detection $Q_{DCO\_worst}$ Worst case (PVT and extracted layout) Q of the DCO tuning array total capacitance plus inductor. Quantization A process of resolution to a finite number of levels. rad Radians rad<sup>2</sup>/Hz Phase noise power spectral density measured in radians squared per Hz. rcbest-CCbest Extracted layout parasitic component modelling ex- hibiting lowest Resistance and Capacitance. rcworst-CCworst Extracted layout parasitic component modelling ex- hibiting highest Resistance and Capacitance. $R_D$ Transistor drain resistance (intrinsic, extrinsic and in- terconnect). $R_{DC}$ Resistance at zero frequency. $R_G$ Transistor gate resistance (intrinsic, extrinsic and inter- connect). $R_{HF}$ High-Frequency Resistance. $R_{lumped}$ Total resistive losses in the metal interconnect between oscillator varactor array and inductor. RMS Root Mean Square $r_o$ Transistor intrinsic resistance. RO Ring Oscillator $R_O$ Load Resistance. $Row\_FF$ Frequency array row D flip-flop. Rp or $R_p$ Parallel resistance of a resonator. $R_{p\_HF}$ Resistance at zero frequency. Rs or $R_s$ Series resistance of a resonator. $R_S$ Transistor source resistance (intrinsic, extrinsic and in- terconnect). $R_{s\_DC}$ High-Frequency Resistance. $R_{s.HF}$ Resistance at zero frequency. RTL Register Transfer Language Run-Time The total duration a central processing unit is egaged in a process from beginning to end result. s seconds S (context - FinFET) - subthreshold swing - the gate volt- age range between drain current on and off-states. S (context - frequency array) - a set of cardinal numbers representing the unique states of a frequency tuning ar- ray. $S_{noise}(\Delta f)$ Noise power spectral density at frequency offset $\Delta f$ . SPD Sampling Phase Detector SDM Sigma-Delta Modulator SERDES Serializer/Deserializaer Si Silicon $S_{id}(f)$ Noise power spectral density SiGe Silicon-Germanium Signum Sgn function - input quantized to output value -1, 0 or +1. SiO<sub>2</sub> Silicon Dioxide SIPO Serial In Parallel Out SNR Signal-to-Noise Ratio SS Slow NMOS/Slow PMOS transistor process corner. SST Source-Series Transmitter svt standard voltage threshold t Thickness of a semiconductor material or metal layer. T Absolute Temperature ( ${}^{\circ}K$ ) $t_A t_{SU} + t_H = t_A$ tank\_l/L Tank Left - the signal conductor laid out on the left side of the LC-tank. tank\_r/R Tank Right - the signal conductor laid out on the right side of the LC-tank. TDC Time-to-Digital Converter TDM Time Division Multiplexing $t_H$ Hold Time Thermal Noise Random phase noise directly proportional to heat. $T_{OX}$ Oxide thickness (nm) TSMC Taiwan Semiconductor Manufacturing Company $t_{SU}$ Setup Time ulsvt Ultra-low voltage threshold V Volts Varactor A transistor or diode circuit element used as a capacitor. $V_B$ Transistor Bulk Voltage $V_{BG}$ Voltage from transistor bulk to gate. $V_{bias}$ DC Bias Voltage (also $V_{bias\_L}$ and $V_{bias\_R}$ .) VCO Voltage-Controlled Oscillator $V_D$ Transistor Drain Voltage $V_{DD}$ CMOS supply voltage Vds Voltage from transistor drain to source. $V_{eff}$ Transistor effective (overdrive) voltage ( $V_{eff} = V_{gs}$ – $V_{th}$ ). Vgate\_Left Feedback voltage applied to the left differential oscillator core transistor. Vgate\_Right Feedback voltage applied to the right differential oscil- lator core transistor. Vgd Voltage from transistor gate to drain. Vgs Voltage from transistor gate to source. Vm Voltage magnitude $V_n$ Noise voltage $V_{n1/f}^2$ Baseband transistor input referred flicker noise power seen at the gate. $V_{nG}^2$ Channel thermal noise voltage due to gate resistance. Vp Peak Voltage $V_S$ Transistor Source Voltage Vsg Voltage from transistor source to gate. Vsource Voltage on the common source of the core transistors of the differential oscillator. $V_{SS}$ CMOS supply voltage ground reference. $\mathbf{V}_T$ or $\mathbf{V}_{th}$ Transistor Threshold Voltage $V_{tank_L}$ Voltage on LC-tank conductor tank\_L. $V_{tank_R}$ Voltage on LC-tank conductor tank\_R. W Transistor gate Width. w Width of a semiconductor material or metal layer. w/t Ratio of width to thickness of a semiconductor material or metal layer. Wander Jitter operating below 10 Hz (Red Noise). $W_{eff}$ Effective transistor gate width. XO Discrete Crystal Oscillator. Zeta $(\zeta)$ Damping Factor. # **Symbols** $\gamma$ Transistor coefficient (i.e., short channel devices $\gamma =$ 1.0). $\delta$ Skin Depth. $\Delta f_{1/f^3}$ Up-converted flicker-noise corner frequency. $\zeta$ (zeta) Damping Factor. $\epsilon_0$ Absolute permittivity (8.854 x $10^{-12}$ F/m). $\epsilon_r$ Relative permittivity of silicon ( $\approx 3.9$ ). $\epsilon_{OX}$ Oxide permittivity ( $\approx 3.9 \times 8.854 \times 10^{-12} \text{ F/m}$ ). $\mu_0$ Free space vacuum permeability (H/m). $\mu_r$ Relative Permeability. $\sigma$ Conductivity (S/m). $\Sigma\Delta$ -modulator SDM - sigma-delta modulator. $\omega$ Angular frequency = 2 x $\pi$ x frequency (radians/s). $\Omega/\square$ Ohms/square - Resistance ( $\Omega) = \Omega/\square$ x Length/Width. ## Chapter 1 ### Introduction #### 1.1 Motivation The 7-nm FinFET process, used in this work, was created by TSMC with the objectives of increasing circuit function integration, speed and reducing power dissipation. This process geometry is very expensive, so it can only be used for high volume commercial applications such as cell phones, laptop computers and other portable devices, as well as high data rate network equipment. At this level of integration multiple processor and DSP (Digital Signal Processing) cores, RAM memory blocks, as well as radio and I/O blocks associated with an application are fabricated on the same die. Therefore, high-speed data must be transported across the die without going through I/O structures that would limit data rates, consume valuable pin count and increase power dissipation. This is commonly achieved using SERDES circuit blocks that drive transmission lines implemented using top metal layers. The ultimate goal of the overall project, of which this BBPLL is a critical part, is to implement a SERDES circuit function that will transport data between two circuit blocks on a the same 7-nm die. Therefore, all circuit blocks, including the BBPLL, must be implemented on the 7nm die, except for the BBPLL reference clock, which must originate from an external Crystal Oscillator (XO) [1,2] for performance reasons (i.e., stability and phase noise). The SERDES transmit encoding of four level Pulse-Amplitude Modulation (PAM-4) and a data rate of 56-Gb/s were chosen for this project as they are current industry standard [3]. The goal of this thesis is to reduce the size, power dissipation and output jitter of the high-frequency SERDES clock source by implementing it as an All-Digital Phased-Locked Loop (ADPLL). The following list articulates the major challenges that were overcome to during this development, and also substantiates many of the design decisions. - 1. DCO type based on the objectives described previously, the ideal DCO would be a ring oscillator implemented using digital library inverter elements. However, as low-jitter (phase noise) was of primary importance for this design, an LC-tank oscillator was chosen. That is, the theoretical maximum Q of a ring oscillator is approximately 1.57, while the loaded-Q of an LC-tank oscillator implemented as an integrated circuit can be in the range of 10 30. Oscillator Q has a squared relationship with phase noise as shown by Leeson (A.20) [4]. See section 2.4 and Appendix A for details. - 2. LC-tank inductor implementation inductors can be relatively large structures with windings that require a large enough conductive cross-section to minimize losses and maximize Q. It was found that combining the top two metal layers provided adequate metal thickness to support sufficient inductor Q. Additionally, at 28 GHz and 14 GHz the inductor areas were small enough to consider the LC-tank oscillator a reasonable option. See sections 4.2 and 4.11, and Figures 4.22 and 4.23. - 3. DCO frequency selection with the SERDES transmit data rate of 56 Gb/s, 28 GHz was selected initially as the oscillator resonant frequency. Both the rising and falling clock edges would be used to clock NRZ data (i.e., one bit/symbol, two states/symbol). This clock must be distributed to five transceivers and further distributed within the transceivers using digital library inverter elements. Unfortunately, the gain of these library inverter elements was insufficient for this purpose. In order to overcome this challenge, the original oscillator resonant frequency of 28 GHz was reduced to 14 GHz, reducing the inverter gain requirement. This was achieved by adding a turn to the existing inductor, sufficiently increasing its inductance within the existing footprint. The transmit modulation had to change to PAM-4 (i.e., two bits/symbol, four states/symbol) to accommodate the new transmit clock rate. See sections 4.2 and 4.11 for details. - 4. Oscillator 7-nm FinFET flicker noise as IC geometry shrinks, flicker noise increases. Although this analysis was considered to be beyond the scope of this project, it was understood that the baseband flicker noise corner of this 7-nm process could be in the tens or even hundreds of MHz, depending on transistor size. The following measures were taken to mitigate this issue: - (a) A class-C oscillator architecture [5] was selected based largely on its reduced conduction angle. That is, phase noise from the current source and core transistors is present in the oscillator during the period (i.e., conduction angle) when current is injected into the LC-tank. Reducing the conduction angle from 180° for a conventional oscillator to ≈ 120° for class-C reduces this noisy period. See section 4.7 and Figure 4.11 for details. It should be noted that the class-C oscillator current conduction angle is larger than the typical class-C power amplifier conduction angle (i.e., 90°). This is due to the stability requirements of the oscillator, which unlike a power amplifier, requires positive feedback to sustain oscillation see squegging and Figures 10 to 12 in [5]. - (b) The class-C oscillator architecture requires a large bypass capacitor connected from the common source node of the core transistors to ground. This not only supports class-C operation, but provides a low impedance path for noise to ground. See section 4.7 and Figure 4.10 for details. - (c) Oscillator core transistor gate bias was selected to maximize the output amplitude. This maximizes the signal to noise ratio of the output. Care must be taken to ensure that the transistor breakdown voltage limit is never exceeded. See section 4.7 and Figure 4.12 for details. - (d) The size of the DC-blocking capacitors used to isolate the gate bias voltage on the oscillator core transistors was carefully selected to minimize distortion of the oscillator feedback signal. This minimizes the amount of flicker noise that is up-converted from baseband to skirt the oscillator output frequency. See section 4.7 and Figure 4.10 for details. - (e) A method of oscillator current source calibration was implemented to ensure maximum oscillator output amplitude (i.e., maximum SNR) across PVT variations. See section 5.2 and Figure 5.1 for details. - (f) Source-degeneration of the current source transistors was used to reduce baseband flicker and thermal noise that is up-converted into the channel of oscillator core transistors. The degenerated current source transistors form a cascode transistor pair with the core transistors. Therefore, the noise current in the core transistors is not significant as it is limited by the current source transistors. See section 5.4, Figure 5.7 and (5.19) for details. - (g) Design priority was given to reducing loss and improving the Q of the LC-tank (Varactor array row loss see section 4.6 and inductor loss see section 4.11, as well as section 4.8). - 5. Oscillator core layout oscillator core transistor layout was carefully interdigitated to minimize core transistors element interconnect losses and ensure current density requirements were met. Both these measures were necessary to guarantee the oscillator core produced enough gain and current to oscillate at frequency without exceeding thermal requirements. It should be noted that the M0 layer interconnect models were revised periodically during this design work. - 6. DCO array tuning range the size and architecture of the DCO frequency tuning array needed to be determined and verified to be adequate to compensate of PVT and layout variation extremes. This was achieved by creating a MATLAB® model of the DCO, which allowed accurate and timely testing of various array designs. See section 4.8 and Appendix C for details. - 7. DCO varactor design for range FinFET varactor element sizing was optimized for maximize Con/Coff ratio and Q, as well as minimum array physical size and tuning resolution. This was achieved using both small-signal and large-signal simulation. Additionally, care was taken to ensure these devices were large enough to mitigate element matching issues. See sections 4.3, 4.4 and 4.5 for details. - 8. DCO varactor design for resolution DCO tuning array resolution was maximized by optimizing the fine tuning varactor elements to minimize Con/Off ratio, while maintaining Q. The difference or $\Delta$ -capacitance between two larger varactor elements was used here to ensure frequency tuning curve monotonicity (i.e., adequate element matching). See sections 4.3, 4.4 and 4.5 for details. - 9. Phase detector selection conventional ADPLL designs use a Time-to-Digital Converter (DTC) as a phase-frequency detector. This represents considerable - design effort that results is a relative large circuit requiring significant current. Its resolution is limited by the semiconductor process and is susceptible to PVT variations. These issues were mitigated by replacing the TDC with a Binary Phase Detector (BPD), implemented as a D flip-flop. See section 3.2 for details. - 10. Quantization noise BBPLL DCO frequency tuning is not continuous and the BPD produces a scalar phase error. These two circuit functions combine to generate quantization noise that is not present in an analogue PLL with continuous VCO tuning. Therefore, while an ADPLL does not have as many sources of large thermal noise (e.g., loop filter resistors, charge pump ...) it contributes quantization noise and associated frequency spurs. However, the DCO thermal noise can be used to randomize the quantization noise and spread it and its frequency spurs across the frequency spectrum noise floor. See sections 3.1 and 3.2 for details. - 11. BBPLL analysis the nonlinearity of the Binary Phase Detector (BPD) used by the BBPLL makes loop analysis for stability, phase margin and phase noise difficult. Analysis employing bilinear transforms was carried out and yielded competent, but optimistic results See sections 3.5 and 3.6 for details. A verilog model of the DCO was created from the MATLAB® work discussed in item 6 of this list to complete a time-domain model of the full BBPLL. This generated functionally accurate and time-efficient event-driven behavioural simulation results. However, this initial verilog model did not include phase noise from thermal sources (i.e., DCO and XO). See sections 6.1 and 6.2 for details. - 12. Full event-driven BBPLL model a complete model of the BBPLL must include phase noise from thermal sources as it affects both jitter performance and loop functionality. This was accomplished by using curve fitting to model the phase noise profiles of the DCO and external XO. These frequency-domain profiles were converted to time-domain jitter vectors containing clock edge variations. A novel approach was used to include these clock edge variations in the event-driven simulation. See sections 6.3, 6.4 and 6.5 for details. - 13. DLF gain range and resolution the jitter performance of the BBPLL is dependent on a balance between two extreme modes of operation (i.e., limit – cycle regieme and random - noise regieme). These modes of operation are controlled by the Digital Loop Filter (DLF) gain. Therefore, adequate DLF proportional and integral gain range and resolution must be implemented. See sections 3.2 and 3.3 for details. - 14. Pre-production 7-nm process during this design the 7-nm FinFET process was in its last stages of development (i.e., pre-production); therefore, circuit elements models were periodically updated. Additionally, some parameters, i.e., FinFET channel loss, were underestimated. In order to overcome these challenges simulations had to be rerun after new model releases and any resulting problems and inaccuracies had to be accounted for in the design. - 15. Security restrictions Although not a development issue, these restrictions were the most significant barrier to the complete this thesis. TSMC security restrictions are tight. It is absolutely forbidden to record or transport any information or data created inside the TSMC lab to an external environment. This is a well justified policy that is based on specific cases that range from inadvertent disclosure in a PhD thesis to outright industrial espionage. In order to write this thesis all data, circuit diagrams and code had to be recorded by hand and then reproduced for this thesis document. Therefore, actual plots, circuit and layout diagrams, as well as original test data was not available. The test results included in this thesis have been reproduced from the published papers associated with this work. Validation of the physics and process fabrication, as well as the component models, parasitic models and libraries is fundamental to bringing a new semiconductor process to production. However, despite this work complex circuit implementations using new processes have been known to fail to yield adequately. It needs to be demonstrated that challenges such as high loss of fine interconnect required by the process geometry can be overcome. At the same time, a process scaled and optimized for digital design must also not only support analogue and RF circuits, but be used as a platform for novelty and cutting-edge innovation. The implementation of a 56 Gb/s SERDES device was selected as an industryrelevant application to demonstrate the capabilities of the TSMC 7-nm FinFET technology, not yet in full production. The company's motivation in doing this project was to show potential customers that complex mixed-signal circuit design in this new process with new circuit models was not only possible, but that it was capable of high yield, and therefore, a low risk undertaking. While bang-bang PLLs have been ignored previously for low-jitter high-data rate transceivers, recent development work has showed some promise for this type of clock source. Therefore, realization of a BBPLL with state-of-the-art performance was considered an excellent candidate for this design effort. # 1.2 Objectives, Contributions and Novelty The objective of this work is to implement a low jitter (low PN with minimal spurs) 14.0-GHz Integer-N BBPLL and clock distribution for a five lane 56-Gb/s PAM-4 SERDES transmitter in TSMC's new 7-nm FinFET technology - see Figure 2.3. The BBPLL DCO is realized as a Class-C LC-tank oscillator [5] tuned using p-type FinFET transistors configured as inversion-mode varactors. FinFET transistors and fine pitch interconnect, targeted for aggressive geometry digital design, will be shown to be capable of attaining state-of-the-art analogue circuit performance. This is achieved by overcoming issues of high interconnect losses, gate resistance, flicker noise and low VDD levels. The work described in this proposal makes the following contributions to the current state-of-the-art. - 1. Single-fin modularity is used to implement a fine resolution $\Delta$ -capacitance of 75 aF. This is made possible as the on-state capacitance of the FinFET PMOS inversion-mode varactors has a linear relationship with the number of fins. - 2. A new closed form solution quantifying how source degeneration can be used to reduce transistor flicker noise in oscillators is derived. This is important in this application as the flicker noise produced by small geometry transistors is significantly worse than the flicker noise of larger geometry planar MOSFET transistors. - 3. Taking advantage of the improved performance of the 7-nm process, the digital loop filter in the forward path of the BBPLL is clocked at 10 times the reference frequency and incorporates a lookahead architecture. This new architecture reduces delay or loop latency, which deteriorates jitter performance and phase margin, that would normally be present if the digital loop filter were clocked at the reference frequency. - 4. A new method of efficiently incorporating reference oscillator and DCO jitter with digital time-domain event-driven simulation (i.e., verilog simulator) is proposed. This enables full functional and phase noise simulation of the BBPLL, while greatly reducing simulation run-time. - 5. Digital time-domain simulator run-time is further reduced (by five times) and output jitter error is improved (< 1 % for a 1 ms simulation time) by calculating jitter time-stamp vectors prior to simulation rather than during simulation. - 6. A novel approach to Large-signal circuit and 3-D EM analyses is proposed to characterize circuit elements and modules to create a mathematical model of the DCO. The run-time of the mathematical model is significantly shorter than that of the DCO circuit simulation, while maintaining accuracy to within $\pm$ 0.9 %. The shortened run-time allows various DCO array architectures and implementations to be optimized quickly and accurately. This project includes the first design and implementation of an LC-tank Class-C oscillator in TSMC's 7-nm CMOS FinFET process. The success of this implementation demonstrates that this process, optimized for digital design, can also be used to realize analogue circuits exhibiting start-of-the-art performance. ## 1.3 Conference and Journal Submissions The following two papers have been published as a result of the work described in this proposal. The Custom Integrated Circuits Conference (CICC) paper was selected as one of the top papers of the 2019 Austin, TX Conference. This resulted in an invitation to submit the expanded journal paper listed below, which was included in the March 2020 Journal of Solid-State Circuits (JSSC) special issue. D. Pfaff, R.Abbott, X.-J. Wang, B. Zamanlooy, S. Moazzemi, R. Smith, C.-C. Lin, "A 14-GHz Bang-Bang Digital PLL with sub-156fs Integrated Jitter for Wireline Applications in 7nm FinFET," in Proc. IEEE Custom Integr. Circuits Conf. (CICC), Austin, TX, USA, Apr. 2019, pp. 1-4. 2. D. Pfaff, R. Abbott, X.-J. Wang, S. Moazzemi, R. Mason, R.R. Smith, "A 14-GHz bang-bang digital PLL with sub-156fs integrated jitter for wireline applications in 7nm FinFET CMOS," IEEE J. Solid-State Circuits (JSSC), vol. 55, no. 3, Mar. 2020, pp. 580-591. The order of the list of authors' names for the above papers is TSMC followed by Carleton University as required by TSMC. The author of this thesis was the main author of these papers and the major contributor to the development of the BBPLL discussed in these papers. #### 1.4 Thesis Outline Chapter 2 establishes the background necessary to understand the motivation and objectives of the remaining chapters of this thesis. It briefly discusses four topics ranging from the forces governing the direction of the industry and this project, the motivation for FinFET transistor technology, the application of SERDES interfaces, as well as a general description of Bang Bang Phase Locked Loop (BBPLL) devices. Chapter 3 presents the requirements, architecture, concepts and design of the SERDES Bang Bang PLL. Linear loop equations are derived and used to approximate bandwidth and stability. Chapter 4 explains the DCO architecture selection and implemented. Issues such varactor configuration and capacitance resolution; loss minimization (i.e., Q maximization) for various varactor row layouts, frequency range over PVT and extracted corners versus the number of varactor rows; and Inductor design and layout. A novel method of accurately plotting frequency versus tuning-array setting that includes loss, parasitic capacitance, as well as linear and non-linear varactor capacitance is demonstrated. Chapter 5 discusses the DCO current source design and techniques used to reduce DCO noise. A closed form equation is derived for source degeneration flicker noise reduction. In Chapter 6 transient simulation of the BBPLL in the time domain is discussed in detail. Also, a novel method of jitter (phase noise) analysis is demonstrated. Chapter 7 discusses the simulated and measured results, and presents a die micrograph of the BBPLL implemented in 7-nm CMOS FinFET technology. Chapter 8 presents the conclusions and proposes areas of future work. A list of contributions, as well as conference and journal publications are also included. Appendix A articulates some fundamental oscillator and oscillator noise concepts that apply directly the class-C [5] DCO demonstrated in this work. Appendix B presents the derivation of an equation for frequency tuning array step size resulting from a corresponding capacitance step. This is consistent with [6]. Appendix C is the MATLAB® code used to model the DCO frequency tuning array discussed in section 4.8. Appendix D is an analysis of the inductor leg interconnect parasitics. ### Chapter 2 # Background ### 2.1 Industry Direction Classical phase locked loop design has been largely an analogue-domain undertaking with only a few digital components to implement the feedback divider and phase-frequency detector. Analogue loop filters needed to be constructed using external discrete elements (i.e., capacitor, resistors, Op-Amps ...) to realize the required loop bandwidth and stability. Voltage controlled oscillators have taken forms such as relaxation, ring, LC-tank harmonic, pulled quartz crystal, rotary travelling wave, as well as transmission line, ceramic piezoelectric and surface acoustic wave resonators [7]. These phase locked loop implementations served the industry well as they could be reliably produced at low cost to yield consistent performance. However, they suffer form large parasitics that limit their frequency of operation, large physical size that limits the form factor of their end application and increasing cost pressures associated with discrete components, assembly and production testing. With the relentless industry drive for more sophisticated circuit functions in smaller form factors that use less power and have increased data throughput, a list of clocks source requirements has emerged. That is, clock circuits must become smaller, their operating frequency must increase and power dissipation decrease, while output jitter amplitude must be decreased and jitter frequency controlled to improve application bit error ratio, data throughput and transmission range. Additionally, the non-recurring costs of development (i.e., initial implementation, as well as the porting of existing designs to more aggressive process nodes) and recurring costs associated with production (i.e., wafer, mask sets, fabrication, packaging, assembly, production yield, as well as production test time, complexity and equipment) must be kept under control. All this needs to be balanced against the potential of the market (i.e., price and volume) and the cost of integration (i.e., the rarefied cost of cutting-edge processes). One step in achieving these goals is to take a perfectly good analogue circuit operating in the s-domain and implement this same function in the digital or z-domain. This has several advantages. First, integration of many circuit functions onto a single die. This reduces form factor, as well as, manufacturing and production testing costs. Second, it improves modularity, reuse and the costs of migrating the design to other process nodes. Of course, these advantages must be weighed against the required circuit performance and power consumption. Classic examples of research and industry toiling at the boundaries of the limits of integration are low noise amplifies and power amplifiers. Specifically, there has been a great push to move these circuit functions into CMOS processes to implement a complete radio on a single chip. While this effort has been successful in many applications, CMOS noise limits low noise amplifier performance, and CMOS efficiency and output power limits power amplifier performance. Serial-De-Serializer or SERDES circuit functions are used to concentrate many parallel communication links onto a single high-speed link. This reduces physical interconnect complexity and improves its reliability. SERDES wireline transceivers applications have evolved from system backplane to intra-die communications. This latter extreme is the application discussed in this thesis. That is, an all digital phase locked loop that provides the low-jitter clock for a SERDES transceiver used to transfer high-speed data between devices on a single die (i.e., inter-processor core or processor to memory). This objective presents an additional set of challenges. First, analogue loop filter components, specifically capacitors, present a die real estate problem, as they do not scale with process. This is resolved by replacing the analogue filter with a digital filter, which in addition to reduced circuit size comes with benefits such as modularity, scalability and programmability. Second, the digital loop filter does not require a charge pump current source, so this element can be removed. Third, a digital phase detector is now required. The most intuitive approach is to create a circuit that resolves the smallest phase error possible, for a given process, to minimize quantization jitter and output spurs. The general approach is to use a time-to-digital converter that produces an output vector (magnitude and direction) measurement of phase error that can be processed by the digital loop filter. A much less intuitive choice is the binary phase detector, common to receiver clock extraction circuits. The output of this circuit is a scalar value that has two stable states, speed-up and slow-down. The advantage of a binary phase detector is that it can be implemented using a single D flip-flop; thus, reducing the phase detector physical size, current consumption, scalability, portability and testability over the time-to-digital converter implementation. Its disadvantages are large quantization noise and significant output frequency-domain spurs (limit-cycle operation), limited locking range, non-linear effects and difficult analysis. The voltage controlled oscillator is the last circuit block to be integrated with the phase locked loop components already on a single CMOS die. A fundamental challenge that must be overcome is that CMOS is noisier (both thermal and flicker noise) than other process options (i.e., SiGe, GaAs, Bipolar ...). Therefore, great care must be taken to minimize thermal and flicker noise at each step of the design process. The first step is to replace the voltage control with digital control i.e., digital controlled oscillator. A significant amount of research and development has been done recently on ring oscillators implemented using inverters. This is a solution for many of the requirements listed above; however, the low Q [8] of these oscillators makes them a difficult fit for the low-jitter digitally controlled oscillator function. LC-tank oscillators provide a much better Q (i.e., improved jitter or phase noise performance) than ring oscillators [9,10], but require a large inductor. In this work, these two options were considered and the LC-tank chosen as a reasonable compromise, the top metal layers being suitable for inductor realization. The digital-to-analogue conversion required for the digital loop filter to control the oscillator frequency was implemented directly by using an array of varactor elements that provide both frequency range and resolution. This has the additional advantage that FinFET varactor arrays are generally much smaller than Metal-Oxide-Metal/Metal-Insulator-Metal (MOM/MIM) capacitor arrays. The Bang-Bang Phase Locked Loop derives its name from its phase detector, which as mentioned above, is a bistable implementation of the signum function. The gain between its two stable states is very high (theoretically infinite); thus, when the phase locked loop is locked, the phase detector will bang back and forth between its two stable states. This work demonstrates how these challenges were overcome to produce a clock signal exhibiting stability and jitter performance consistent with the best analogue circuit implementation. #### 2.2 FinFET Transistor Overview Since circa 1970 the rate of increase in the number of transistors per unit area that can be economically fabricated to form Integrated Circuits (IC) has followed Moore's Law. That is, IC transistor densities in leading edge semiconductor technologies will double every 18 to 24 months [11,12]. His formalization has held true to the present day, and it is an understatement to say that this consistent pace of technological advancement is a tribute to the creativity, ingenuity and ability of the scientists and engineers upon whose shoulders we now stand. During the four decades proceeding Dr. Moore's paper the increased transistor density of ICs was primarily due to the aggressive scaling of planar MOSFET transistors without significant change in their basic structure. As the Gate Length $(L_g)$ shrinks, the MOSFET Drain Current to Gate Voltage Characteristics $(I_d - V_g)$ degrade in two major ways, referred to as short channel effects. First, the Subthreshold Swing (S) degrades and the Threshold Voltage $(V_t)$ decreases to the extent that the Gate Voltage $(V_g)$ has diminished control over the channel. Second, S and $V_t$ become increasingly sensitive to variations in $L_g$ . Gate lengths in the nanometre range result in a drastic increase in Subthreshold Leakage Current $(I_{off})$ . That is, the gate no longer has sufficient control to shut off the transistor at very short channel lengths because the drain potential now has increased electrostatic influence over the channel - referred to as Gate-Induced Drain Leakage (GIDL) [13, 14]. Figure 2.1 [15] presents a top-view comparison of a planar MOSFET (left) and a FinFET (right). Both devices have two gate fingers dividing the devices into Source/Drain/Source. The FinFET has eight fins with fin pitch (P) and $T_{SI}$ is the body or fin thickness. The transistor Length (L - shown as the gate vertical dimension) and Width (W - shown as the gate horizontal dimension) of these devices correspond. While FinFET L is a function of a single dimension, FinFET W is a function Figure 2.1: Planar MOSFET vs. FinFET Transistors Layout of $2H_{FIN} + T_{SI}$ . Additionally, $H_{FIN}$ and $T_{SI}$ must remain fixed to ensure manufacturability, e.g., $H_{FIN} < 4T_{SI}$ . Therefore, arbitrary transistor widths are no longer an option. Instead W quantization is imposed, which limits the total FinFET W resolution to an integer number of fins. For example, the FinFET of Figure 2.2 has a W = 3 fins with a fin pitch of P nm. While in theory L should not be limited in this way, as L becomes smaller its values are also quantized. Figure 2.2: Cross Section of FinFET Transistors FinFET technology allows transistor scaling to continue without major process changes. This addition of a third dimension to the planar MOSFET gate facilitates the development of accurate mathematical models (i.e., SPICE or BSIM compact models) that seamlessly fit into existing circuit design tools. ### 2.3 SERDES Applications Recent Serializer/Deserializer (SERDES) development has been driven by the demand for cost-effective, high-speed and high-density long-haul optical data transfer between servers, switches and routers used in data centres. Additionally, short-haul data transfer applications range from card-backplane-card over electrical and optical media, through inter-die in 2-D and 3-D packaging (e.g., Multi-Chip Modules - MCM), to intra-die over IC interconnect (e.g., interprocessor and DDR4/5 memory access) [16]. SERDES circuits are point-to-point bidirectional data transport systems consisting of a Parallel In Serial Out (PISO) transmitter and Serial In Parallel Out (SIPO) receiver positioned at opposite ends of a transmission line or fibre optic link. Thus, a large number of parallel data streams are Time Division Multiplexed (TDM) onto a much smaller number of serial data streams to reduce interconnect density, cost and complexity, as well as improve the reliability. Leading industry standard data rates include, but are not limited to, 6 Gbps, 11 Gbps, 25 Gbps, 56 Gbps and 112 Gbps using NRZ or PAM-4 modulation. Framer and packet, as well as electrical and optical physical layer interface requirements and examples can be found in [3, 16, 17]. The author of this thesis was the major design and development contributor to the circuit blocks of Figure 2.3 that are not grey. Figure 2.3 is a functional block diagram of the transmit side of the four channel SERDES device for which the ADPLL, the subject of the work, was created. The ADPLL is implemented as a Bang-Bang Phase Locked Loop (BBPLL). The 350 MHz External Clock Source [1] is commercially available and was selected for its favourable phase noise profile. The 350 MHz clock outputs, OutP/OutN, are multiplied by the BBPLL to produce 14.0 GHz differential rail-to-rail clock signals, CLKin/CLKip. These signals are buffered in the 14.0 GHz ADPLL block to drive LVDS Drivers. Two LVDS drivers are used to distribute the 14.0 GHz clock to four data transmitters and a clock transmitter through 100 $\Omega$ differential transmission lines. The left-hand transmission line distributes the 14.0 GHz clock to Lanes one to three and the right-hand transmission line to Lanes four and five. Lanes two to five are used for data transmission and Lane one transmits the 14.0 GHz clock to the receiver in NRZ format. The transmitter circuits of each lane are identical. The 14.0 GHz differential low-voltage clock signal is tapped from the differential Figure 2.3: SERDES Application transmission lines by a receiver circuit within the Duty-Cycle Correction (DCC) block of each lane. Each receiver circuit consists of a series of single-ended inverters that regenerate the clock signal to its former rail-to-rail levels. Also, the single-ended regenerated signals are de-skewed using cross-coupled inverter pairs. Capacitively isolated inverters with resistive feedback are used to correct the clock duty cycle. The T-coils [18,19] are designed to tune-out the capacitance of the protection diodes and pads so the output impedance of the selected Source-Series Transmitter (SST) elements [17,20,21] are matched to the 100 $\Omega$ differential transmission line medium. This minimizes transmission line reflections, cross-talk and maximizes bandwidth. The TxP/TxN transmitter output signals terminate in substrate pads that connect, through substrate transmission lines, to the corresponding SERDES receiver. ### 2.4 Bang Bang Phase Locked Loop (BBPLL) The purpose of a Phase Locked Loop (PLL) in this application is to multiply the frequency of an external reference clock oscillator to a higher frequency that can be used to increment the SERDES state machines and provide a timing reference for data transmission and reception. Additionally, in order to resolve transmitted data, in the presents of channel impairments, the PLL must be able to transfer the frequency stability and low Phase Noise (PN) characteristics of the external oscillator to the high frequency clock. In recent years the ultimate goal has been to build a stable low-PN frequency multiplier around a Ring Oscillator (RO) high frequency source. The advantages of size, power dissipation, modularity and re-usability across process geometries are major. Unfortunately, for an N-stage RO, as N goes to infinity its Q approaches a maximum of $\pi/2 = 1.57$ [9,10], which limits PN performance. By contrast, Inductor-Capacitor (LC) oscillator implementations have a higher Q (i.e., typically 10-30, depending on process losses) and a less detrimental Impulse Sensitivity Function (ISF)+ [22] than digital oscillators (i.e., Ring Oscillators - RO). While the size of the LC circuit elements is a definite drawback, they remain a viable compromise solution as their PN is expected to be lower than that of a RO by approximately 20 dB [23]. An ADPLL realized completely as a digital circuit possesses all the advantages of the RO listed previously. Thus, the DCO is the only custom designed element in this work. A digital-to-analogue conversion function must be included with the DCO to allow the ADPLL to control the analogue LC-tank oscillator phase/frequency selection. This is achieved using an array of voltage controlled capacitive elements, referred to as varactors, to realize the capacitance portion of the LC-tank - see Figure 2.4. In a DCO each varactor has two capacitive states, off-capacitance and on-capacitance, associated with the low and high voltages of the controlling bit states. Large arrays of varactors are necessary to provide the frequency tuning range required to compensate for Process, Voltage and Temperature (PVT) variations [6]. This control is generated in the forward loop form the error signal, which is low pass filtered to produce a multi-bit Frequency Control Word (FCW). The frequency tuning array design of this work is discussed in detail in section 4.4. The DCO output clock signal must be converted from an analogue clock signal to a digital clock signal. This is achieved by buffering the analogue output of the Figure 2.4: All Digital Phase Locked Loop with LC-Tank DCO LC-tank oscillator to first, provide current gain to increase the slope of the rising and falling clock edges; and second, voltage limit the clock signal to produce distinct output high and low levels consistent with digital operation. The ADPLL shown in Figure 2.4 possess two quantization noise sources - the Phase Detector (i.e., Time-to-Digital Converters - TDCs, Binary Phase Detectors ...) and the D to A function of the DCO. These are not present in analogue PLLs with nearly-continuous phase detectors (excluding dead zones) and linear VCO tuning curves. The finite phase/frequency steps that are generated by this quantized behaviour produce a limit-cycle mode of operation or regime in the output clock, dco\_clk, when the ADPLL approaches a phase locked state. That is, as the output frequency of the DCO will almost certainly not be an exact multiple of the reference frequency, ref\_clk; therefore, the phase detector will force a frequency oscillation about the exact multiple. This oscillation or frequency modulation will produce significant frequency-domain spurs at the dco\_clk output. In this work it will be demonstrated that an ADPLL realized as a BBPLL (i.e., Bang-Bang Binary Phase Detector - BBPD) can be designed to meet and exceed the performance of analogue PLL implementations, in spite of additional quantization noise sources. The BBPD has the additional advantages of reduced circuit complexity and power consumption when compared to TDC implementations. A trade-off between quantization noise and random thermal noise can be achieved by adjusting the proportional gain of the loop filter to find a total output PN minimum. Additionally, thermal noise can be used to both linearize the phase detector quantization and redistribute the frequency-domain spur energy into the noise floor. What is a Bang-Bang PLL or, where does the bang bang come from - doesn't every PLL operate in this way? These questions can be answered by first defining what is PLL limit-cycle operation. Many PLLs can operate in a limit-cycle mode or regime. This happens when the feedback control loop reaches a locked state where the output frequency (or phase) oscillates about a fixed value. That is, when the DCO or Phase-Frequency Detector (PFD) have finite resolution and this resolution is periodically exceeded. Therefore, if the feedback frequency is too high, the phase detector will tell the loop to reduce frequency, and after this new reduced frequency is compared the phase detector the resulting error will adjust the oscillator to increase frequency. If the frequency and amplitude of these limit-cycles are small enough, this operation may be considered acceptable. However, in a Band-Bang PLL this is a dominant behaviour. In fact it is the extreme form of this behaviour that gives this PLL its name. Most phase or phase-frequency detectors (i.e., Time-to-Digital Converter, phase detector charge pump combinations, XOR ...) provide a vector output consisting of phase error sign and amplitude. In contrast, the binary phase detector output is a scalar consisting of phase error sign only. That is, it always tells the loop to apply a maximum correction no matter the phase error. This causes the Bang-Bang PLL to exhibit limit-cycle behaviour in the absence of other dominating phenomena, such as random phase noise, after it is locked. This mode of operation results in quantization noise and spurs at the output. The current state of BBPLL development has minimized the BBPD to a two-state signum (i.e., $sgn(\Delta t)$ error function realized as a single D flip-flop; therefore, that is what is used here. Figure 2.5 shows the architecture of a generalized Bang-Bang Phase Locked Loop (BBPLL) [24]. The functional block diagram of Figure 2.5 is described in detail in the chapters that follow, so only a brief introduction is presented here. A highly stable low PN external crystal oscillator generates the $ref\_clk$ (e.g., 350 MHz) for the PLL. The $ref\_clk$ phase is compared by the BPD to the phase of the feedback clock, $fdbk\_clk$ , which is a frequency divide by N version of the DCO output clock, $dco\_clk$ . This comparison produces a highly quantized binary error, $sgn(\Delta t)$ , that is presented to the weighted (i.e., $A_p$ and $A_i$ ) proportional plus integral paths of the Digital Loop Filter (DLF). The DLF is clocked by $fdbk\_clk$ . The $z^{-D}$ block represents the delay, in Figure 2.5: Bang Bang Phase Locked Loop $D\ fdbk\_clk$ cycles, in the forward loop that will occur due to circuit implementation. The filtered DCO Frequency Control Word (FCW) produced by the DLF is split into higher order and lower order paths. The higher order bits control the DCO frequency selection directly, while the lower order bits feed an $n^{th}$ order $\Sigma\Delta$ -modulator. The $\Sigma\Delta$ -modulator creates a dithered average frequency selection that improves the DCO resolution while minimizing the production of $dco\_clk$ frequency-domain spurs. In this work a significant amount of attention was paid to ensuring that the last remaining analogue portion of the BBPLL, the DCO frequency generation, would function over PVT as well as extracted layout variations. Additionally, care was taken to reduce PN, particularly flicker noise. #### Chapter 3 # The Bang Bang Phase Locked Loop #### 3.1 Introduction The following list identifies the advancements made through the development of this BBPLL, excluding the DCO, which is discussed in chapter 4. - 1. Oversampling of the reference clock by the feedback clock at the phase detector was used to reduce the noise bandwidth of the system. That is, a feedback divisor of N=4 was used instead of the N=40, required to make the reference and feedback clock frequencies approximately equal. This limits the amount of noise that can be imposed on the loop. - 2. The digital loop filter in the forward path of the BBPLL is clocked at 10 times the reference frequency and incorporates a lookahead architecture. This eliminates the delay or loop latency, which deteriorates jitter performance and phase margin, that would normally be present if the digital loop filter were clocked at the reference frequency. The $\Sigma\Delta$ -modulator is also clocked at this same rate to reduce its latency. - 3. A second order $\Sigma\Delta$ -modulator was used to reduce the required DCO resolution needed to meet jitter requirements. Its single bit output interpolates the DCO output frequency in a manner that mitigates the effects of array element mismatch. - 4. The complex divide and modulus functions required to determine the DCO frequency control word value were replaced by simple adders to further reduce loop latency. 5. Various properties (detailed in section 3.5) of this BBPLL were exploited so that s-domain loop equations could be developed for this discrete (z-domain) circuit. These linearized equations were used to determine the unity gain frequency, phase margin, loop bandwidth, damping factor and zero frequency for various loop filter gains. While these equations were not completely consistent with [24], they did produce reasonable, although somewhat optimistic, results. The scope of the work described in this chapter is that of analysis as opposed to design. That is, the design of the Bang-Bang Phase Detector (BBPD) or Binary Phase Detector (BPD), Digital Loop Filter (DLF), $\Sigma\Delta$ -Modulator (SDM) and Divide by N were carried out by other members of the design team. Here the role of the author was that of analysis and testing. Figure 3.1 illustrates a digital Band-Bang PLL (BBPLL) functional block diagram that is consistent with both this work and [24]. The architecture and analysis presented in the following paragraphs closely follows this reference. Figure 3.1: Second-Order Digital Bang-Bang PLL Functional Block Diagram Going around the loop in Figure 3.1, the BPD compares the phase of the reference clock $(ref\_clk)$ with the phase of the feedback clock $(fdbk\_clk)$ to produce a one bit indication of the relative phase positions of these two inputs (i.e., the time-error $\Delta t = t_r - t_d$ ). This time domain $sgn(\Delta t)$ function result is presented to a single pole DLF that consists of Proportion and Integral paths (PI). The $z^{-D}$ block is included to account for any pipeline delays incurred by the DLF paths to produce a consistent DCO Frequency Control Word (FCW), $\omega$ . The Least Significant Bits (LSB) of $\omega$ , $\omega'$ , are processed by a second order $\Sigma\Delta$ -modulator to produce an averaged frequency selection to the DCO fine resolution portion of the FCW, $\omega''$ . The Divide by N block or prescaler is set to four to produce a 3.5-GHz $fdbk\_clk$ from the 14-GHz $dco\_clk$ output. In this example $ref\_clk$ is 350 MHz, so the DLF and $\Sigma\Delta$ -modulator are operating 10 times faster than the $ref\_clk$ . A decimation function, not shown here, was implemented to resolve the BPD sampling rate difference between the $ref\_clk$ and $fdbk\_clk$ clocks. This oversampling of the reference clock improves jitter performance by reducing the noise-bandwidth from the divide by 40 case - where $ref\_clk$ and $fdbk\_clk$ are approximately equal in frequency. The original motivation for an all or mostly digital PLL was to decrease the size of the circuit by replacing the analogue loop filter components with a DLF. As with the DLF, the classical Phase-Frequency Detector (PFD) was replaced by a Time-to-Digital Converter (TDC) to improve the scalability of the PLL implementation across technologies and process nodes. However, while DLF circuits are small and consume little current, the same cannot be said about the TDC. In this work this drawback was overcome by borrowing the single-bit or BPD from the world of clock and data recovery. While this is an innovative solution, the abrupt nonlinearity of the BPD markedly complicates the analysis of the BBPLL and introduces a significant quantization error. Quantization errors created by both the BPD and finite resolution of the DCO will modulate the period of the output signal. When this noise is larger than the random noise present at the BPD inputs, quasi-periodic orbits [25] will appear in the state-space analysis of the loop (i.e., Normalized $\psi$ vs. $\Delta t$ ), which result in frequency domain spur tones at the DCO output. This condition can be improved by increasing the DCO frequency resolution and exploiting random noise sources (i.e., $1/f^2$ ) to dither or break-up the periodic noise. The remainder of this chapter discusses BPD, DLF, $\Sigma\Delta$ -modulator, as well as the derivation of open, closed and error loop equations that were used to generate loop stability and bandwidth plots. The DCO is discussed in Chapter 4. #### 3.2 Bang Bang Phase Detector The fundamental implementation of a BPD in a BBPLL is a D flip-flop with the reference clock signal connected to the D input, the feedback clock signal connected to the clock input and the Q output as the phase error signal. This is illustrated by the block labelled BPD in Figure 3.1. In this work a high-speed D flip-flop is directly clocked by the prescaler output. This produces an oversampled phase detector output that is decimated and sent to the LPF. Thus, the BPD output signal is logically equivalent to an output clocked using a 350 MHz feedback signal, but with much reduced jitter. With the low-bandwidth (high-jitter) divide-by-10 circuit removed from the loop, jitter is added only by the high-bandwidth prescaler and D flip-flop circuits. In the ideal case the transfer function of this phase detector can be modelled by a signum or sgn function that quantizes the input phase difference or phase error $(\Delta t)$ to one of two output states (i.e., -1 and 1) with a transition gain approaching infinity (i.e., $\operatorname{sgn}(\Delta t)$ ). This transfer function is, of course, non-linear; therefore, its gain is uncontrolled. Figure 3.2: a), b), c) Gaussian and d), e) f) Uniform Jitter Convolution. Assuming positive edge triggering and ignoring the effects of flip-flop metastability [26–28], the BPD produces a logical-1 output when the reference clock leads the feedback clock and a logical-0 (i.e., implemented as a -1 state) output when the reference clock lags the feedback clock. Therefore, the BPD determines the sign of $\Delta t$ , but the input phase error amplitude is lost. It is well understood from [29, 30] that the gain of the BPD, $K_{bpd}$ , as $\Delta t$ approaches zero, depends on the relative jitter between the reference and feedback clocks, sometimes known as untracked jitter. Fortunately, this untracked jitter tends to linearize the BPD transfer function when observed over a period of many reference clock cycles, so an approximate transfer function can be derived. That is, the time domain convolution of the BPD transfer function with the jitter Probability Density Function (PDF) shown in Figure 3.2 [31] yields the following equation for Gaussian (or random) jitter (3.1), $$K_{bpd} = \sqrt{\frac{2}{\pi}} \cdot \frac{1}{\sigma_{\Delta t}} \tag{3.1}$$ and a similar equation for uniform (quantization) jitter (3.2), $$K_{bpd} = \sqrt{\frac{1}{3} \cdot \frac{1}{\sigma_{\Delta t}}} \tag{3.2}$$ where $\sigma_{\Delta t}$ is the RMS relative jitter between the BPD inputs. It should be noted that reference clock jitter was not considered here. Therefore, the absolute jitter, J, is the standard deviation of the time occurrences of the $dco\_clk$ edges w.r.t. an ideal clock. This jitter is coincident with the standard deviation of the delay, $\Delta t$ , between the inputs of the BPD (3.3) [24]. $$J = \sigma_{\Delta t} \tag{3.3}$$ Additionally, it can be shown from work done in the field of clock and data recovery that the Gaussian and Uniform PDF plots of Figure 3.2 are approximations [31]. In reality, these profiles may have multiple maxima. However, as in [24], in this work these approximations were considered reasonable. In [25] the BPD of a BBPLL is described as having two operational modes or regimes; the $limit-cycle\ regime$ and the $random-noise\ regime$ - these are described in the following paragraphs. As a BBPLL settles to a phase locked state the phase error, $\Delta t$ , approaches a minimum. In the absence of other noise sources (i.e., no reference clock jitter and no feedback clock jitter) the BPD output will oscillate between its two output states. This oscillating signal is integrated by the loop filter to produce a signal ( $\omega$ ) the LSB of which control the DCO resolution. The minimum amplitude of $\omega$ is determined by the DCO resolution and the maximum frequency of $\omega$ is half the frequency of the reference clock. Reference [25] describes this behaviour as a periodic or quasi-periodic orbit in state space, also know as a limit-cycle. These limit-cycles on the DCO frequency tuning control inputs produce phase jitter on the DCO output through its time or period gain $K_{DCO}$ , where $K_{DCO}$ is the weight of a DCO LSB measured in Hz/bit. It should be noted that the amplitude of this uniform deterministic jitter is also directly proportional to the proportional gain of the loop filter $A_P$ , the loop delay D and the feedback divide ratio N. An expression for this limit-cycle standard deviation or RMS jitter referenced to the DCO output, $\sigma_{\Delta t,lc}$ , is given in [24,25] as (3.4). $$J_q = \sigma_{\Delta t, lc} \approx \frac{(1+D)}{\sqrt{3}} \cdot NA_P K_{DCO}$$ (3.4) where N is a division factor in the frequency domain; therefore, it is a multiplication factor here in the time domain equation. Also, the larger the loop delay, D, the larger the limit-cycle variance. It is necessary for loop stability that the DLF proportional path gain be much larger than the integral path gain (i.e., $A_P/A_I \gg 1$ ) and this ratio must be increased as loop latency increases [24]. Therefore, (3.4) reveals that $A_P$ needs to be large enough to guarantee loop stability, but not so large that excessive $limit-cycle\ regime\ jitter\ amplitude\ is\ produced.$ Increasing DCO resolution (i.e., making $K_{DCO}$ smaller) decreases jitter amplitude. However, as $K_{DCO}$ becomes smaller the DCO fine tuning array becomes larger while each varactor element within the array becomes smaller. This eventually leads to element-matching inconsistencies that result in tuning curve non-monotonicity. Generally speaking, the values of $A_P$ and $A_I$ must be selected so the DLF resolution matches the weighting of $K_{DCO}$ otherwise an additional source of quantization error (i.e., quantization jitter) will be added to the loop. However, jitter performance can be improved or conversely $K_{DCO}$ resolution can be relaxed by adding a quantization element between the DLF and DCO. This quantization function provides a mechanism that allows $K_{DCO}$ to be scaled up (i.e., coarse resolution) and the DLF resolution to be scaled down (i.e., finer resolution) by an equal amount. In this work the quantization element is a second-order $\Sigma\Delta$ -modulator clocked at $10 \times ref\_clk$ , that has the added advantage of producing a dithered DCO LSB control bit, derived from the least significant 8-bits of the FCW over multiple ref\_clk periods. As the order of the $\Sigma\Delta$ -modulator increases more of the quantization noise present on the DCO LSB is pushed to higher frequencies outside the bandwidth of the frequency step response of the DCO [32, 33]. While the $\Sigma\Delta$ -modulator mitigates the effect of quantization distortion in the loop it is also a source of quantization noise and therefore, a source of jitter. Reference [24] derives an equation for jitter due to $\Sigma\Delta$ -modulator quantization, $\sigma_{\Delta t,\Delta\Sigma}^2$ . This value can be quadratically summed with (3.4) to give the total jitter due quantization distortion in the loop (3.5). $$J_{qn} = \sqrt{\sigma_{\Delta t, lc}^2 + \sigma_{\Delta t, \Delta \Sigma}^2} \tag{3.5}$$ When the components of random jitter (i.e., thermal and flicker noise) present on the time error $\Delta t$ are larger than the quantization-induced jitter present on $\Delta t$ , the BPD is operating in the $random - noise\ regime\ (3.6)$ . $$J_{rn} = \sigma_{\Delta t, rn} \tag{3.6}$$ In this regime the analysis of the BBPLL can be linearized as discussed in section 3.5. Additionally, when the BBPLL is operating close to its locked state (i.e., when $\Delta t$ is small), the BPD can be modelled as (3.7), where $\sigma_{\Delta t}$ is the Gaussian distribution of $\Delta t$ . $$K_{bpd} = \sqrt{\frac{2}{\pi}} \cdot \frac{1}{\sigma_{\Delta t}} \tag{3.7}$$ An equation for this $\sigma_{\Delta t,rn}$ is derived in [24, 25] and shows that it is inversely proportional to $A_P \times K_{DCO}$ . Figure 3.3 was recreated from these references and is included here to serve as an example of the jitter components that affect the total jitter of the BBPLL (3.8). $$J_{tot} = J_{rn} + J_{qn} (3.8)$$ As stated previously, the limit-cycle term is proportional to $A_P \times K_{DCO}$ and the random jitter term is inversely proportional to $A_P \times K_{DCO}$ ; therefore, an optimum gain can be found to minimize RMS output jitter. The dashed lines show how $J_{rn}$ and $J_{qn}$ cross at a minimum value of $J_{tot}$ . The solid line is produced from (3.8). Here as the gain decreases from (b), $\sigma_{\Delta t}$ from the in-band limit-cycle term decreases to a minimum. If the gain is decreased further, the $\sigma_{\Delta t}$ will increase due to the out-of-band DCO random jitter term (a). Point (c) marks the optimum loop gain setting to minimize BBPLL jitter. Figure 3.3: BBPLL Jitter Component vs. Loop Gain Example It should be noted that the $J_{rn}$ random noise presented in Figure 3.3 [24] originates solely from the DCO, through the feedback path. That is, this analysis does not consider reference clock jitter. However, great care was taken to select a low jitter reference clock source to lessen its impact on circuit performance. A discussion of reference clock jitter is presented in chapter 6. Every D flip-flop has a set-up and hold time (i.e., $t_{SU}$ and $t_H$ ) that together define a metastability region [26–28] where the rising edge of the clock is too close to either the rising or falling edge of the data signal to produce a stable output within the maximum output delay time. This results in an extended output delay (i.e., delay from rising clock edge to stable Q output state $t_{PHL}$ or $t_{PLH}$ ) after which the correct, incorrect or an unstable output state is reached. This behaviour changes the limit-cycle spur locations and increases the deterministic jitter amplitude. Therefore, the $t_{SU}$ and $t_H$ times of the BPD need to be minimized (i.e., BPD implemented using a high-gain/high-speed D flip-flop) to achieve optimal BBPLL output jitter performance. This BPD implementation combines low noise and accurate phase detection capabilities, but lacks frequency detection. A pull-in range that extends to the frequency tuning limits of the DCO is realized by invoking a Phase-Frequency Detector (PFD) to achieve initial PLL lock. During this period the BPF outputs are ignored by the DLF. After frequency-lock is established, the PFD outputs are ignored and phase locking is maintained with the BPD. ## 3.3 Digital Loop Filter The DLF must meet several criteria to ensure both low jitter and stable operation of the BBPLL. First, when the loop is locked a zero steady-state phase error must be enforced to keep the BPD within its linear region of operation. This is achieved by introducing an ideal filter integrator, which transforms the BBPLL into a second-order type-II system with an additional zero, at zero frequency $f_Z$ , to maintain loop stability. The charge pump ripple found in analogue PLL implementations is not present here; therefore, additional poles to suppress this ripple are not required. Second, loop filter programming needs to have significant range, resolution and flexibility to deliver the required loop bandwidth and damping. This is essential to minimize BBPLL output jitter for variation in loop parameters such as DCO gain, $K_{DCO}$ , and phase detector gain, $K_{bpd}$ . More importantly, $K_{bpd}$ , determined by (3.7), is a rather volatile parameter as it is defined by the RMS jitter, $\sigma_{\Delta t}$ , present at the phase detector inputs [24]. The DLF is realized as the summation of proportional and integral paths with programmable gain $A_P$ and $A_I$ , as shown in Figure 3.1. While $A_P$ defines the filter gain, the zero singularity, $f_Z$ , is defined by the filter gain ratio and the reference clock, $ref\_clk$ , frequency $f_{REF}$ [24] (3.9). $$f_Z = \frac{A_I}{A_P} \cdot \frac{f_{REF}}{2\pi} \tag{3.9}$$ The DLF output is the frequency setting control bits, S.F - made up of an integer portion, S, and a fractional portion, F. The F subset of S.F forms the input of a $\Sigma\Delta$ -modulator, discussed in section 3.4, while the S subset is used by divide-by-12 and modulo-12 blocks. These digital circuit blocks generate the value or state of the Frequency Control Words FCW<sub>12</sub> and FCW<sub>5,6</sub> that are applied to the DCO. The DCO FCW requirements are discussed in section 4.9. Figure 3.4 [34,35] is a functional block diagram of the signal path from the BPD output to the DCO control input. The BPD<sub>Out</sub> state (i.e., either +1 or -1) is presented to the DLF, where it multiplied by $A_P$ and $A_I$ . The control bus, S.F, is the sum of the DLF proportional and integral paths. Figure 3.4: BPD to DCO Control Path Functional Block Diagram An important innovation of this pipelined DLF realization is that it produces all possible next state outputs simultaneously to reduce latency. That is, outputs $FCW_{5,6}(S)$ , $FCW_{5,6}(S+1)$ , $FCW_{12}(S)$ and $FCW_{12}(S+1)$ are computed by simple adders replacing more complex divider and modulo operators. Here, the divide-by-12 and 144-bit binary-to-thermometer decoders are replaced by a 144-bit shift register that directly controls the majority of the varactor array (i.e., $FCW_{12}$ ). The remaining DLF computation generates $S_{5,6}$ .F, where the least significant bits of the integer portion control the much smaller $FCW_{5,6}$ varactor array elements. After the current state is determined by the phase detector, $S_{5,6}$ may exceed its valid range of zero to 11. An overflow/underflow checker validates the range and corrects both the $S_{5,6}$ value and the shift register during the clock cycles that follow, see Figure 3.5 [34,35]. If $S_{5,6}$ is too large, it is reduced by 12 and Logical Shift Right (LSR) operation is performed; if $S_{5,6}$ is negative, it is increased by 12 and Logical Shift Left (LSL) operation is performed. This procedure is repeated until a correct $S_{5,6}$ value is established at which time all outputs are updated. Figure 3.5: Overflow/Underflow Checker Functional Block Diagram The pipelined architecture of the DLF introduces a multiple clock cycle latency, D, that in turn increases quantization jitter (3.4) and reduces phase margin. This latency is eliminated by implementing the look-ahead structure illustrated in Figure 3.6 [34,35]. That is, two of the aforementioned pipelined filters units operate in parallel to concurrently compute the next $FCW_{12}$ and $FCW_{5,6}$ values based on the current and complementary phase detector values. Once a new phase detector sample becomes available the $dsp\_sync$ signal becomes active and the correct FCW values are multiplexed to the output, while the other FCW values are discarded. Theoretically, this approach allows the loop latency to be as short as a single 3.5-GHz $dsp\_clk$ cycle. However, practical design considerations increase the latency to three $dsp\_clk$ cycles, which is still significantly shorter than the latency of a reference clocked DLF. Figure 3.6: DLF Look-Ahead Structure Functional Block Diagram ## 3.4 LSB Dithering - Sigma Delta Modulator Achieving fine DCO period resolution is a critical requirement of frequency synthesizers used in high speed SERDES applications. The straight forward approach is to implement a fine tuning array with a large number of very small frequency steps resulting in a large circuit area, increased power consumption and compromised linearity. A more suitable approach is to construct a DCO tuning array with a somewhat relaxed resolution, improving linearity, size and power consumption. The frequency resolution is improved by adding a quantizing element, such as a $\Sigma\Delta$ -modulator, between the DLF and the DCO as illustrated in Figure 3.1. Using the fractional portion, F, of the DCO FCW, the $\Sigma\Delta$ -modulator [36] will dither a set of elements of the fine tuning array to produce a composite or average frequency output. This dithering is performed with a degree of randomness that reduces periodicity to minimize the production of frequency-domain spurs. The degree of randomness is dependent on the number of input bits and order of the $\Sigma\Delta$ -modulator, which must be weighed against circuit complexity, speed and throughput delay (i.e., loop latency). Additionally, the $\Sigma\Delta$ -modulator must be clocked at a higher frequency than the reference so a new result can be computed for a least every new FCW selected by the BPD. In this design the $\Sigma\Delta$ -modulator is clocked with the 3.5-GHz $dsp\_clk$ , as is the DLF. The 8-bit fractional value F, from the DLF, forms the input of a second-order $\Sigma\Delta$ modulator that produces 1-bit output. This output selects either a FCW value of S or its increment S+1, a value that can be easily derived from S, see Figures 3.4 and 3.6. Interpolation between the two frequency settings refines the frequency resolution required to lower the quantization jitter to an acceptable level. This dithering method was found to be less prone to varactor element mismatches when compared to an approach based on separate varactor elements exclusively used for dithering [35]. #### 3.5 Linearization of BBPLL Loop Equations Figure 3.7 is the z-domain model of the BBPLL showing the Open-Loop transfer function from which the $(G_{OL})$ (3.10), Closed-Loop transfer function $(G_{CL})$ (3.11) and Error transfer function $(G_{ERR})$ (3.12) equations can be derived. The latency of the loop, $\ll z^{-D}$ , was ignored as the DLF clocking and architecture resulted in zero delay w.r.t to the reference clock edges. $$G_{OL}(z) = \sqrt{\frac{2}{\pi}} \cdot \frac{1}{\sigma} \cdot \frac{1}{256} \cdot \frac{A_P \cdot z^2 + (\frac{A_I}{A_P} - 1) \cdot z - 1}{z^2 - 2 \cdot z + 1}$$ (3.10) $$G_{CL}(z) = \frac{\Phi_{OUT}}{\Phi_{IN}} = N \cdot \frac{G_{OL}(z)}{N + G_{OL}(z)}$$ (3.11) Figure 3.7: z-domain BBPLL Model $$G_{ERR}(z) = \frac{\Phi_{OUT}}{\Phi_{ERR}} = \frac{N}{N + G_{OL}(z)}$$ (3.12) After some initial trials, the original z-domain approach to loop analysis of the BBPLL was dropped in favour of a linear approach similar to that discussed in [24,37]. Linear or s-domain analysis of the BBPLL, operating in the z-domain, was justified using the following assumptions: - 1. The reference frequency (i.e., sampling frequency of the system) is much larger than the loop bandwidth. - 2. The BBPLL is operating in its phase-locked state and enough random jitter is being produced by the DCO to linearize the operation of the BPD [24]. - 3. Discrete circuit blocks are clocked at a frequency greater than the reference clock. That is, the outputs of these circuit functions are effectively independent of the sampling clock and the latency of the loop $\ll z^{-D}$ , where D = 1. - 4. The DCO has a settling time of zero [37], which is reasonable when the loop is phase locked. This method is demonstrated in the remained of this section. Figure 3.8 illustrates a z-domain model of the BBPLL. The BPD is constructed using (3.7) and included $\frac{T}{2\pi}$ to convert radians to seconds. The DCO is modelled with its gain, $K_{DCO}$ , multiplied by $2\pi$ to convert Hz to radians/second. The integration function, used to convert radians/second to radians (i.e., 1/s in the continuous-time domain), is represented in the z-domain using the bilinear transform (3.13). The sampling frequency of this system comes from the reference input clock, $f_{ref}$ , or sampling time, $T_{ref}$ . $$s = \frac{2}{T} \left[ \frac{z-1}{z+1} \right]$$ , and $\frac{1}{s} = \frac{T}{2} \left[ \frac{z+1}{z-1} \right]$ (3.13) The z-domain representation of the DLF is presented, where $A_P$ is the proportional gain and $A_I$ is the gain of the integration path. A $\frac{1}{256}$ divisor is used to improve the resolution of the gain elements. Figure 3.8: z-domain Functional Block Diagram of BBPLL Model Using the BPD described in [24] a linear approximation of the BBPLL can be used to simplify its analysis, for small values of $\Delta t$ . The first step is to convert the z-domain model of Figure 3.8 into the s-domain, where more conventional approach can be applied. The inverse bilinear transform (3.14) was used to translate the DLF into the continuous frequency domain as shown by (3.15) to (3.20). It should be noted that the DLF sampling frequency, $f_S$ , is different from $f_{ref}$ . In this example the $f_S$ = 3.5 GHz - the 14-GHz DCO output clock frequency divided by four - which is 10 times $f_{ref}$ . This is discussed further in section 3.3. $$z = \frac{1 + s/2f_S}{1 - s/2f_S} \tag{3.14}$$ First, the DLF transfer function was rearranged, (3.16). $$f(z) = A_P + A_I \left(\frac{1}{1 - z^{-1}}\right) \tag{3.15}$$ $$f(z) = A_P + A_I \left(\frac{z}{z-1}\right) \tag{3.16}$$ Next, the inverse bilinear transform was applied to get (3.20) $$z - 1 = \frac{1 + s/2f_S}{1 - s/2f_S} - 1 = \left(\frac{1 + s/2f_S}{1 - s/2f_S}\right) - \left(\frac{1 - s/2f_S}{1 - s/2f_S}\right) = \frac{s/2f_S}{1 - s/2f_S}$$ (3.17) $$f(s) = A_P + A_I \cdot \frac{\left(\frac{1+s/2f_S}{1-s/2f_S}\right)}{\left(\frac{s/2f_S}{1-s/2f_S}\right)} = A_P + A_I \left(\frac{1+s/2f_S}{1-s/2f_S}\right) \left(\frac{1-s/2f_S}{s/2f_S}\right)$$ $$f(s) = A_P + A_I \left(\frac{1+s/2f_S}{s/2f_S}\right)$$ (3.18) $$f(s) = \frac{A_P(s/f_S) + A_I + A_I(s/2f_s)}{s/f_S} = \left[A_P\left(\frac{s}{f_S}\right) + A_I + A_I\left(\frac{s}{2f_S}\right)\right] \frac{f_S}{s}$$ $$f(s) = A_P + \frac{A_I f_S}{s} + \frac{A_I}{2}$$ (3.19) $$f(s) = \frac{sA_P + A_I f_S + sA_I/2}{s} = \frac{s(A_P + A_I/2) + A_I f_S}{s}$$ (3.20) The s-domain representation of the DCO was created and combined with the remaining block transfer functions in (3.21) to produce the open loop transfer function, $G_{OL}(s)$ . $$G_{OL}(s) = \left[\frac{T_{ref}}{2\pi} \cdot \sqrt{\frac{2}{\pi}} \cdot \frac{1}{\sigma} \cdot \frac{1}{256}\right] \left[\frac{s(A_P + A_I/2) + A_I f_S}{s}\right] \left[\frac{K_{DCO} \cdot 2\pi}{s}\right] \left[\frac{1}{N}\right] (3.21)$$ The open loop transfer function was simplified to (3.22). $$G_{OL}(s) = \left[ \sqrt{\frac{2}{\pi}} \cdot \frac{T_{ref} \cdot K_{DCO}}{\sigma \cdot 256 \cdot N} \right] \left[ \frac{s(A_P + A_I/2) + A_I f_S}{s^2} \right]$$ (3.22) The closed loop transfer function, $G_{CL}(s)$ , of the form (3.23) was derived using (3.24) to (3.27). $$G_{CL}(s) = \frac{\Phi_{OUT}}{\Phi_{IN}} = \frac{G_{OL}(s)}{N + G_{OL}(s)}$$ (3.23) If we set K equal to (3.24) $$K = \left[ \sqrt{\frac{2}{\pi}} \cdot \frac{T_{ref} \cdot K_{DCO}}{\sigma \cdot 256 \cdot N} \right]$$ (3.24) then we can express $G_{CL}(s)$ as: $$G_{CL}(s) = \frac{\frac{K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2}}{\frac{s^2 N}{s^2} + \frac{K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2}} = \frac{\frac{K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2}}{\frac{s^2 N + K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2}}$$ $$= \left(\frac{K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2}\right) \left(\frac{s^2}{s^2 N + K \cdot [s(A_P + A_I/2) + A_I f_S]}\right)$$ $$= \left(\frac{K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2 N + K \cdot [s(A_P + A_I/2) + A_I f_S]}\right) \left(\frac{1/N}{1/N}\right)$$ $$G_{CL}(s) = \frac{s(K/N)(A_P + A_I/2) + (K/N)A_I f_S}{s^2 + s(K/N)(A_P + A_I/2) + (K/N)A_I f_S}$$ (3.25) If we set K' equal to (3.26). $$K' = (K/N) \tag{3.26}$$ then $G_{CL}(s)$ can be expressed as (3.27). $$G_{CL}(s) = \frac{sK'(A_P + A_I/2) + K'A_I f_S}{s^2 + sK'(A_P + A_I/2) + K'A_I f_S}$$ (3.27) Equation (3.27) can be represented in its canonical form (3.28) [38], $$G_{CL}(s) = \frac{\omega_n^2 \left(\frac{2\zeta}{\omega_n} s + 1\right)}{s^2 + s2\zeta\omega_n + \omega_n^2}$$ (3.28) as follows (3.29). $$G_{CL}(s) = \frac{K' A_I f_S \left( \left( \frac{A_P + A_I/2}{A_I f_S} \right) s + 1 \right)}{s^2 + s K' (A_P + A_I/2) + K' A_I f_S}$$ (3.29) This yields the following equations for the classical loop parameters $\omega_n$ (3.30), $\zeta$ (3.31) and $\omega_{3db}$ (3.32). $$\omega_n = \sqrt{K' A_I f_S} \tag{3.30}$$ $$\frac{2\zeta}{\omega_n} = \left(\frac{A_P + A_I/2}{A_I f_S}\right)$$ $$\zeta = \left(\frac{A_P + A_I/2}{A_I f_S}\right) \left(\frac{\omega_n}{2}\right) = \left(\frac{A_P + A_I/2}{A_I f_S}\right) \left(\frac{\sqrt{K' A_I f_S}}{2}\right)$$ $$\zeta = \left(\sqrt{\frac{K'}{A_I f_S}}\right) \left(\frac{\frac{A_P + A_I/2}{A_I f_S}}{2}\right)$$ (3.31) $$\omega_{3db} = \omega_n \sqrt{1 + 2\zeta^2 + \sqrt{4\zeta^4 + 4\zeta^2 + 2}}$$ (3.32) The error transfer function, $G_{ERR}(s)$ , of the form (3.33) was derived as (3.34). $$G_{ERR}(s) = \frac{\Phi_{OUT}}{\Phi_{ERR}} = \frac{N}{N + G_{OL}(s)}$$ (3.33) $$G_{ERR}(s) = \frac{N}{\frac{s^2N}{s^2} + \frac{K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2}} = \frac{N}{\frac{s^2N + K \cdot [s(A_P + A_I/2) + A_I f_S]}{s^2}}$$ $$= \frac{Ns^2}{s^2N + K \cdot [s(A_P + A_I/2) + A_I f_S]} \left(\frac{1/N}{1/N}\right)$$ $$G_{ERR}(s) = \frac{s^2}{s^2 + s(K/N)(A_P + A_I/2) + (K/N)A_I f_S}$$ (3.34) Reference [24] makes the assumption that $A_P \geq 32 \cdot A_I$ to maintain stability; therefore, the $A_I$ can be ignored in the open and closed loop equations to find $f_z$ (3.35). $$f_z = \left(\frac{A_I}{A_P}\right) \left(\frac{f_r}{2\pi}\right)$$ where: $f_r = 1/T_r$ (3.35) From the canonical form of the closed loop equation above, we can find the zero frequency from the following: $$\left( \left( \frac{A_P + A_I/2}{A_I f_S} \right) s + 1 \right)$$ (3.36) $$\omega_z = \frac{1}{(A_P + A_I/2) / (A_I f_S)}$$ (3.37) and for $$A_P \gg A_I$$ $\omega_z = \frac{1}{A_P/(A_I f_S)}$ and then, $f_z = \left(\frac{A_I}{A_P}\right) \left(\frac{f_S}{2\pi}\right)$ where: $f_r = f_S$ . (3.38) This result (3.38) derived here is consistent with (3.35). Reference [24] gives (3.39) for the unity gain frequency, $f_u$ . $$f_u = \left(\frac{K_{bpd}}{2\pi T_{DCO}}\right) (A_P K_{DCO}) \tag{3.39}$$ where: $K_{bpd} = \sqrt{\frac{2}{\pi}} \frac{1}{\sigma}$ , $T_{DCO} = \frac{T_{ref}}{N}$ , $T_{ref} = \frac{1}{f_{ref}}$ and the DCO period gain is $K_{DCO}$ . By setting the open loop transfer function (3.21) equal to unity gain (3.40) and assuming $A_P \gg A_I$ , we are able to solve for the unit gain frequency, $f_u$ , (3.41). $$1 = \left(K_{bpd} \cdot \frac{T_{ref}}{256}\right) (A_P) \left(\frac{K_{DCO}}{s}\right) \left(\frac{1}{N}\right)$$ (3.40) $$f_u = \left(\frac{K_{bpd}}{2\pi}\right) (A_P) \left(\frac{T_{ref}K_{DCO}}{256N}\right) \tag{3.41}$$ Comparing (3.39) [24] to (3.41) (this work), knowing that in [24] N=1 (i.e., $T_{DCO}=T_{ref}$ ) and the $\frac{1}{256}$ factor is not included in DLF, these two equations were found not to be equivalent (3.42). $$\left(\frac{K_{bpd}}{2\pi T_{DCO}}\right) (A_P K_{DCO}) \neq \left(\frac{K_{bpd} T_{ref}}{2\pi}\right) (A_P K_{DCO})$$ (3.42) In spite of (3.42), the units of (3.41) evaluate to Hz. That is, $K_{bpd}$ is in [bits/s], $A_P$ is unit-less, $T_{ref}$ is in [s] and $K_{DCO}$ is in [Hz/bit]. Therefore, it is believed that this equation is correct. ### 3.6 Application of Linearized Loop Equations The s-domain open and closed loop equations developed in section 3.5 were used to calculate the BBPLL bandwidth, stability and damping response for $A_I = 1$ , $A_P = 40$ and $\sigma_{\Delta t} = 150$ fs. These gain values were selected based on the previous statement that $A_P/A_I \gg 1$ [24, 26]. Figure 3.9 shows the Bodé magnitude and phase plots of the open loop transfer function. Here the Phase Margin (PM) is 85.9°. It is expected that this calculation is somewhat optimistic. Figure 3.10 illustrates the Bodé magnitude and phase plots of the closed loop transfer function. The magnitude reveals an over-damped system (i.e., calculated $\varsigma = 1.865$ ) with a loop bandwidth of 20.5 MHz. Table 3.1 lists Phase Margin (PM), damping factor zeta $(\varsigma)$ , loop bandwidth $(f_{3dB})$ , unity-gain frequency $(f_u)$ and zero frequency $(F_z)$ for $A_I = 1$ and $A_P$ ranging from 5 to 50. As $A_P$ increases the loop stability increases as is demonstrated by the corresponding increase in PM. The loop bandwidth also increases with critical damping occurring at $A_P \approx 24$ . The value of frequency $f_u$ increased with $A_P$ and $f_z$ decreases with $A_P$ . Also, Figure 3.9: BBPLL Open Loop Response - $A_P/A_I=40/1,\,\mathrm{RMS}$ Jitter = 150 fs Figure 3.10: BBPLL Closed Loop Response - $A_P/A_I=40/1,\,\mathrm{RMS}$ Jitter = 150 fs as $A_P$ increases, $f_u$ increases to be similar to the loop bandwidth, $f_{3dB}$ , indicating improved loop stability [24]. Table 3.1 reveals that these frequencies cross at $A_P \approx 34$ . Therefore, it can be said that there will be peaking in the closed loop magnitude plot as values of $A_P$ decrease below 34. This under-damped condition will result in jitter gain though the loop and loop instability if $A_P \ll 34$ . | Table 3.1: | BBPLL Loop | Parameters for A | $I_I = 1$ and RMS Jitter $\sigma_{\Delta t}$ | = 150 fs | |------------|------------|------------------|----------------------------------------------|-----------| |------------|------------|------------------|----------------------------------------------|-----------| | $A_P$ | Phase Margin | Zeta | $f_{3dB}$ | $f_u$ | $f_z$ | |-------|--------------|---------------|-----------|-------|-------| | | (°) | $(\varsigma)$ | (MHz) | (MHz) | (MHz) | | 50 | 87.36 | 2.325 | 24.96 | 23.62 | 11.00 | | 45 | 86.75 | 2.095 | 22.72 | 21.26 | 12.22 | | 40 | 85.90 | 1.865 | 20.50 | 18.90 | 13.74 | | 35 | 84.68 | 1.635 | 18.33 | 16.54 | 15.71 | | 30 | 82.83 | 1.404 | 16.21 | 14.17 | 18.33 | | 25 | 79.88 | 1.174 | 14.18 | 11.81 | 21.99 | | 20 | 74.85 | 0.944 | 12.29 | 9.449 | 27.49 | | 15 | 65.87 | 0.714 | 10.60 | 7.086 | 36.65 | | 10 | 50.48 | 0.484 | 9.239 | 4.724 | 54.98 | | 5 | 28.37 | 0.253 | 8.329 | 2.362 | 110.0 | With $A_P = 40$ and $A_I = 1$ , RMS jitter was increased to $\sigma_{\Delta t} = 250$ fs. This slightly degraded the loop parameters PM = 83.22° and $\varsigma = 1.445$ , but made a significant change in $f_{3dB} = 12.841$ MHz and $f_u = 11.338$ MHz. Parameter $f_z = 13.744$ MHz was unaffected as it is dependent on the DLF gains and reference clock frequency, and independent of jitter. It can be inferred that if reference clock jitter was included in these calculations, the values of these parameters would degrade in a similar manner. Finally, there are several likely explanations as to why the calculated PM is significantly better than the measured PM. First, the $z^{-D}$ block of Figure 3.1, representing forward loop delay, was not included in the previous derivation. Secondly, the initial assumption that the BBPLL is operating in the z-domain is an approximation. That is, the sampling time is not constant. Rather, it is a dynamic parameter as the phase error is changing when the BBPLL is in a locked state. Thirdly, PVT variation and the non-linearity of various components will contribute to loop delay. #### 3.7 Summary This chapter documents the BBPLL architecture and innovations that have been implemented to ensure stability and minimize output jitter. A review of the justification for an all digital PLL was outlined and the somewhat counter-intuitive motivation for moving from a high resolution (i.e., low quantization distortion) TDC to a low resolution BPD was presented. This led to a discussion of how the BPD could be used to equal or improve the jitter performance of All-Digital PLL systems. Using a BPD, the effective noise bandwidth of the BBPLL was reduced by oversampling the reference clock and decimation the result presented to the DLF. The key concept of using Gaussian phase noise, fed back from the DCO, to linearize the BPD for small values of $\Delta t$ was presented. This has two important features. First, the random jitter can be used as a dithering source to distribute or breakup the spur tones and jitter caused by quantization noise (i.e., $limit-cycle\ regime$ ). Additionally, a trade-off can be found between the $limit-cycle\ regime$ and the $random-noise\ regime$ where the system jitter is minimized. Secondly, the random noise reduces the infinite gain of the BPD. This allows the discrete operation of the BBPLL to be mapped into the s-domain where more conventional analysis can be performed. Minimization of loop latency, which amplifies quantization noise and reduces PM, was also addressed. As the BPD has only two output states, the DLF is limited to two possible next states. Using two parallel circuits both candidate DLF outputs were calculated, in three $dsp_clk$ cycles, during the slack time between active edges of the reference clock. The BPD selects the next valid DLF output state. Additionally, simple adders were used to determine the FCW, replacing the complex divide and modulus functions. A second order $\Sigma\Delta$ -modulator was used to reduce the required DCO resolution needed to meet jitter requirements. Its single bit output interpolates the DCO output frequency in a manner less prone to array mismatch. Complex-frequency domain equations were derived for the BBPLL and an example of the typical loop parameter values calculated. These hand-calculated results, although somewhat optimistic, showed behavioural trends consistent with normal PLL operation, even though input reference jitter was ignored. Further research should be carried out to improve the PM accuracy of this analysis. Chapter 6 improves on this work by demonstrating accurate jitter simulation results generated using a time-domain digital model of the BBPLL. #### Chapter 4 # The DCO #### 4.1 Introduction The following list identifies the advancements made through the development of this DCO. - 1. This was the first LC-tank DCO implementation using the 7-nm FinFET process from Taiwan Semiconductor Manufacturing Company, Limited (TSMC). It successfully demonstrated that the 7-nm FinFET circuit models and fabrication process were sufficient to realize this mixed-signal circuit to meet and exceed current industry demands (i.e., a $Q \approx 10$ at 14.0 GHz). - 2. A coarse/fine frequency tuning array was demonstrated using PMOS FinFET transistors as inversion-mode varactor tuning elements. This made the full array available for frequency tracking (i.e., coarse tuning range $\approx 2.0$ GHz) and demonstrated that this 7-nm FinFET process was capable of realizing well-matched varactor elements of extremely fine resolution (i.e., Single fin for a $\Delta C \approx 75$ aF resolving a $\Delta$ -frequency $\leq 2.0$ MHz). - 3. A differential Class-C oscillator was implemented with approximately 3.9 dB phase noise improvement over conventional implementations. This phase noise performance was further improved by adjusting the bias point of the oscillator core transistors to ensure low phase noise operation and increase output amplitude (i.e., Amplitude $\approx 1.0 \text{ V}_{peak}$ ). - 4. Large-signal circuit analysis and EM 3-D simulation were employed to characterize individual circuit elements and modules, over process and extracted corners, to create the building blocks of a DCO mathematical model. The runtime of the mathematical model was significantly shorter than that of the DCO circuit simulation while maintaining accuracy to within $\pm$ 0.9 %. The shortened run-time allowed various DCO array architectures and implementations to be optimized quickly and accurately. 5. A modular design approach was used from the selection of varactor elements to the layout of array rows. This made it possible to optimize the frequency-tuning array at each layer of abstraction. The heartbeat of the PLL that provides the transmit and receive clocks for the SERDES transceiver originates from a frequency tuneable oscillator. Every oscillator must satisfy the two requirements of the Barkhausen criteria. First, the circuit must include a feedback path that adds a delayed version of the forward signal, the feedback signal, to the forward path. This is illustrated in Figure 4.1. The feedback signal must be added constructively to the forward path. Ideally, an integer number of $2\pi$ phase shifts, $n2\pi$ , must exist through the B(s) feedback block; however, the system will operate correctly with some degree of phase error from the ideal. Figure 4.1: Classical Linear Feedback Model Secondly, after the circuit has reached steady state oscillation, its closed loop gain must be equal to 1.0. It should be noted that to ensure oscillation is initiated at start-up, grows to steady-state and is sustained, the open loop gain must be greater than 1.0. Usually a gain of at least 3.0 is implemented to ensure normal operation over PVT (Process, Voltage and Temperature) variation to maintain adequate production yield. This gain allows a very small signal perturbation (i.e., circuit noise) to be amplified to overcome circuit losses and build into an oscillating signal. Once the closed loop system reaches steady state, the circuit operates in a nonlinear fashion to suppress any gain in the circuit in excess of 1.0. Mathematically, this can be shown using (4.1) - the transfer function for the classical linear feedback model of Figure 4.1. $$H(s) = \frac{Y(s)}{X(s)} = \frac{A(s)}{1 - B(s)A(s)}$$ (4.1) Here, for the gain B(s)A(s) = 1.0, $H(s) = \infty$ ; therefore, the signal will be compressed into nonlinearity [39] by either current or voltage limiting, depending on circuit implementation. In this work the oscillation function of Figure 4.1 is realized with a Class-C Digitally Controlled Oscillator (DCO). The current conduction angle of a differential Class-C oscillator is smaller (typically 120°, see [5] section IV. Stability of the Oscillator Amplitude) than that of a Class-A or Class-B [40,41] oscillator, which conduct current over the full oscillator output cycle, see Figures 4.11 and 4.12. This reduced angle is created by installing a large $C_{tail}$ (see Figure 4.10) that is charged by the current source during the portion of the oscillator output cycle when current does not flow into the LC-tank (i.e., the differential non-conduction angle $\approx 60^{\circ}$ ). Thus, during the conduction angle the sum of the currents from the current source and $C_{tail}$ produces a large current spike that charges the LC-tank [5]. Harmonic energy created by this non-sinusoidal current spike is heavily attenuated by the filtering of the LC-tank and therefore, does not dissipate any significant power. At resonance, this current spike produces a larger oscillator output voltage amplitude, $V_m$ , across Rp (i.e., the resistance across the LC-tank at resonance - when total tank reactance is zero), which results in improved efficiency and Signal-to-Noise Ratio (SNR). Additional phase noise improvement results from the reduced conduction angle as noise is only injected form the current source and oscillator core transistors into the LCtank when current is flowing. Class-C oscillators are also characterized by a voltage divider in the feedback path used to control the oscillator output signal amplitude by adjusting the feedback level applied to the core transistor gates. In Figure 4.10 this is implemented as a capacitance divider network created by the DC-blocking capacitor connected to the transistor gates and the transistor gate capacitance. The DC-blocking capacitors allow further adjustment of the core transistor biasing to maximize output amplitude. A detailed discussion of the Class-C oscillator is provided in section 4.7. Class-A, B [40,41] and C [5] oscillators produce sinusoidal output clock signals. That is, they all produce a DC resistance, Rp, at the fundamental resonant frequency, but no significant DC resistance exists at the harmonic frequencies. These oscillators operate with core transistors in saturation. This helps to keep the phase noise low as nonlinearity is limited to that which is caused by transistor channel modulation. In contrast, switching oscillators (i.e., Class-D [42–44] and Class-F [44–46]) use large core transistors operated as switches. This is made possible by the reduced capacitance of nm IC processes that can quickly move between on and off states, essentially avoiding transistor saturation to produce a square wave output. In the off-state a large voltage exists across the core transistor and a very small current flows through it; in the onstate the reverse it true; thus, the power dissipated by the core transistors is small and most of the power is delivered to the load (i.e., the LC-tank). Class-D oscillators produce a relatively high output voltage amplitude from a low VDD, which improves both power dissipation and phase noise. However, this architecture is prone to power supply pushing as the oscillator does not possess any current source or isolation from VDD or VSS. Therefore, this oscillator type was not considered for this design. Class-F oscillators produce finite Rp at odd harmonics across which overtone clock signals are produced. This makes them good candidates for overtone oscillator applications. Additionally, in [45] transformer coupling is used to create voltage peaking at the third harmonic output clock. This also improves the Impulse Sensitivity Function (ISF) to reduce the up-conversion of phase noise - see Appendix A. However, a Class-F oscillator was not considered for this work as the transformer would make the circuit large. At this point it is useful to discuss some of the fundamental differences between LC-tank DCO and VCO frequency tuning. VCO LC-tank The VCO LC-tank tuning capacitance takes the form of a group of switched-capacitor banks (implemented using varactors or combinations of fixed capacitors and varactors) of progressively larger size – usually of a binary capacitance progression (i.e., C, 2C, 4C, 8C, 16C...) [47]. Each bank is tuned, using a continuous control voltage, through the linear range of a varactor or group of varactors. Thus, while frequency tuning within each bank is continuous, mismatch between banks may cause non-monotonicity across all banks. Pros - Tuning is continuous across each varactor. The oscillator does not produce quantization noise in a PLL application, only flicker and thermal noise. Cons – Requires switches, which degrade Q. Control voltage amplitude noise causes phase jitter on the VCO output. Matching between bands can contribute to non-monotonicity in the overall frequency tuning. Larger capacitance varactor banks may require common-centroid layout to reduce noise and mismatch over process [47]. More varactor banks are required to achieve adequate tuning range with the lower supply voltage of smaller geometry IC processes. **DCO LC-tank** The DCO LC-tank tuning capacitance is arranged in some combination of progressively finer tuning arrays of equally sized unit varactor elements [6]. Each varactor element has two states, Con and Coff, controlled by a single bit that switches between the two nonlinear capacitance extremes of the varactor. The linear range of the varactor is avoided. Pros – The varactor control voltage is either VDD or VSS, so voltage amplitude noise has little effect on phase jitter. Using smaller varactors as binary elements in lower voltage processes avoids range and noise problems associated with the reduced linear range of the varactor. Non-monotonicity is less of a problem with varactor arrays as each varactor element is the same size and can be individually programmed to implement dithering and Dynamic Element Matching (DEM) [48]. **Cons** – Introduces quantization noise to the PLL, in addition to flicker and thermal noise. Unfortunately, every circuit realization includes sources of output signal corruption - here principally accumulated phase deviation - more commonly referred to as Phase Noise (PN). DCO output clock frequency-domain PN will produce time-domain phase jitter that increases the Bit Error Ratio (BER) of a transmission link. This will limit both data rate and range, for a given data transmission rate, of the link. The most effective point in the circuit to minimize PN is at the clock source - the DCO. A discussion of individual noise sources that degrade oscillator performance is presented in Appendix A. ## 4.2 Specification The original requirement of this work was for a DCO to produce a rail-to-rail clock operating at 28.0 GHz. The tuning range was to be sufficient to account for Process, Voltage and Temperature (PVT) variation, layout parasitics and layout parasitic variation. The initial frequency tuning resolution requirement was determined to be 500 kHz using a MATLAB® noise model; however, this was later relaxed to several 2.0 MHz. This DCO was to be the heart of a PLL and clock distribution system for a 56-Gb/s NRZ SERDES device, consistent with the requirements listed in [3]. Initial consideration was given to a ring-oscillator design. However, this was dropped and replaced with an LC-tank design as the theoretical maximum Q of ring oscillators is approximately $\pi/2$ [9,10]. This is because ring oscillators dissipate virtually all the energy in the ring and store virtually none on every cycle of the output clock signal. That said, significant work has been done to reduce the ring-oscillator accumulated jitter in Phase Locked Loops (PLL) and Delay Locked Loops (DLL) using injection-locking techniques [49–51]. The 28-GHz DCO was successfully designed; however, it was determined that the digital library components used for the clock distribution circuit within the transmit block (see Figure 2.3) lacked the required gain to operate adequately at 28.0 GHz, so the design was changed to a 14-GHz clock driving a PAM-4 data format [3]. The DCO centre frequency was halved by increasing the LC-tank inductance from approximately 100 pH to approximately 500 pH, see section 4.11. This chapter focuses largely on the implementation of the 28-GHz oscillator design with references made to the 14-GHz modification where appropriate. Several MATLAB®, currently unavailable due to TSMC access limitations, were created to determine the PN requirement for this DCO. The final requirement for the 14-GHz version was a maximum frequency deviation of $\Delta f = 2.0$ MHz or phase deviation of $\Delta \phi \approx 10.2$ fs from (4.2). $$\Delta \phi \approx \frac{1}{f_0} - \frac{1}{(f_0 + \Delta f)} = \frac{1}{14 \ GHz} - \frac{1}{(14 \ GHz + 2 \ MHz)} = 10.2 \ fs$$ (4.2) #### 4.3 Varactor Selection The decision to used PMOS transistors as varactor tuning array elements was based on their superior noise performance and Q-factor over complementary NMOS devices, although NMOS devices have a slightly better tuning range [52]. That is, PMOS varactors operate more in the depletion mode (low capacitance and low resistance), where the Q is higher, than in the accumulation mode (high capacitance and high resistance), where the Q is lower - see Figure 4.2. PN reduction is particularly important in a 7-nm process as the flicker noise corner frequency (typically tens or even hundreds of MHz at baseband) is inversely proportion to process geometry - see section 5.3, Appendix A and (A.30). An additional benefit is that PMOS varactor elements exist in an n-well that isolates them from substrate noise. Unfortunately, this n-well increases the varactor parasitic or off-state capacitance; however, this was not considered a problem as a realizable array tuning solution (i.e., tuning range and resolution) was found for 28 GHz, see sections 4.4, 4.8 and Figure 4.13. Figure 4.2 illustrates the Con/Coff ratio (normalized to Coff) for a 600 nm planar PMOS transistor configured with both $V_B = V_D = V_S$ (Accumulation Mode) and as I-MOS (Inversion Mode) varactors, see Tables 4.1 and 4.2 for AC simulated I-MOS capacitance and Con/Coff values. This diagram was reproduced from [53] where the plots were generated using small signal AC simulation. It closely represents the simulated Con/Coff behaviour of the 7-nm FinFET PMOS devices listed in Table 4.2 and is presented here as the 7-nm AC simulation results were not available for this document. However, it should be noted that the slope of the $C_{mos}$ curves from moderate through strong inversion is lower or flatter with the 7-nm FinFET process than is implied by Figure 4.2. The general implementation of a varactor using a CMOS transistor makes $V_B$ , $V_D$ and $V_S$ common, with the capacitance value, $C_{mos}$ , dependent on the voltage between the bulk and the gate, which is the same as Vsg in Figure 4.2. This is represented by the dashed plot. When $V_{BG} > |V_T|$ , where $|V_T|$ is the transistor threshold voltage, an inversion channel of holes is built up under the gate. This is illustrated by the transition from weak to moderate and to strong inversion. As $V_{BG}$ becomes less than $V_T$ the transistor moves through the depletion region and into the accumulation region, where the voltage between the gate and substrate is positive enough to allow electrons to flow. In the strong inversion and accumulation regions the capacitance Figure 4.2: Accumulation-Mode vs. Inversion-Mode PMOS n-Well Varactor between bulk and gate is give by (4.3). In these cases, mobile charged carriers are drawn close enough to the gate-oxide interface that the effective insulator thickness between gate and bulk has been reduced to $t_{ox}$ . $$C_{mos} = C_{ox} = \epsilon_{ox} (L_{eff} \cdot W_{eff}) / t_{OX}$$ (4.3) In the moderate inversion, weak inversion and depletion regions the value of $C_{mos}$ is reduced from $C_{ox}$ as these regions produce very few mobile charged carriers at the gate-oxide interface, effectively creating a thicker insulator between the gate and bulk. This implies that the off-state Q of the varactor is smaller than the on-state Q as the off-state resistance, $R_{mos}$ , is larger than the on-state $R_{mos}$ . However, as stated in [53], this may not be observed as other circuit effects can dominate. Specifically, the 7-nm FinFET PMOS varactors used in this work showed an increase in Q in the off-state. The inversion-mode varactor differs from the accumulation-mode varactor in that the bulk connection is tied to the highest voltage in the circuit, as it normally would for a PMOS transistor. Its capacitance characteristic is illustrated in Figure 4.2, marked I-MOS. This configuration shows a very distinct off-state capacitance, clearly important for binary control of a varactor array. In [54] there is evidence that I-MOS varactors implemented using PMOS in n-well devices have more distinctly defined on/off states than accumulation-mode varactors. Additionally, in deep-submicrometer processes the linear range, more evident with accumulation-mode varactors, is compressed resulting in a very hight Voltage Controlled Oscillator (VCO) gain $(K_{VCO} = \Delta f/\Delta V)$ that would make this type of design susceptible to noise and operating point shifts. An additional advantage of the I-MOS varactor becomes evident when the actual operation of the oscillator is more closely considered. That is, small signal AC simulation was used to create Figure 4.2, but oscillators operate in a large-signal regime. Therefore, having the better defined on/off states of the I-MOS varactor will yield more consistent (i.e., monotonic) behaviour than would a similar accumulation-mode varactor implementation [53]. Armed with the previous justification, it was decided that the oscillator would be a DCO with an array of PMOS varactors configured as I-MOS elements. A disadvantage of the I-MOS operation is the off-state capacitance variation caused by supply voltage variation. This was overcome by using a low-noise external linear voltage regulator to supply the oscillator. This necessitates separate supply pins for the DCO, which is not unusual. Table 4.1 lists the on and off-state capacitance of inversion-mode varactors implemented using 7-nm PMOS svt (standard voltage threshold) core ( $V_{DD}=0.75\ V$ nominal) transistors. This analysis was carried out using small-signal AC analysis on a single finger transistor. In this process transistor lengths are quantized to 8 nm, 11 nm, 20 nm, 36 nm . . ., and fin count starts at two and increments by one. Scanning across the table we see that on-state capacitance scales linearly with the number of fins for L = 11 nm, 87 aF/fin; L = 20 nm, 125 aF/fin and L = 36 nm, 167 aF/fin. Off-state capacitance is independent of both length and number of fins. On-state capacitance does not scale linearly with length. Table 4.2 shows that the on/off capacitance ratio for length and number of fins of the transistor listed in Table 4.1. This data reveals that the Con/Coff ratio is independent of the number of fins and increases with length. In this limited data set the Con/Coff ratio doubles as the length increases by four or increases with the | Length | | Fin = |--------|-------|-------|-------|-------|-------|-------|-------|-------| | (nm) | State | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | 8 | On | 161 | 242 | 322 | 401 | 477 | 549 | 616 | | | Off | 77 | 115 | 154 | 192 | 230 | 268 | 304 | | 11 | On | 177 | 266 | 354 | 442 | 529 | 613 | 694 | | | Off | 77 | 116 | 154 | 193 | 231 | 269 | 307 | | 20 | On | 251 | 377 | 502 | 628 | 752 | 877 | 1,000 | | | Off | 78 | 117 | 156 | 195 | 234 | 273 | 312 | | 36 | On | 336 | 503 | 671 | 838 | 1,005 | 1,171 | 1,338 | | | Off | 79 | 118 | 158 | 197 | 236 | 276 | 315 | **Table 4.1:** Varactor Capacitance/Finger (aF) square root of the increases in length. **Table 4.2:** Varactor Con/Coff Ratio per Finger (aF/aF) | Length | Fin = |--------|-------|-------|-------|-------|-------|-------|-------| | (nm) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | 8 | 2.09 | 2.10 | 2.09 | 2.09 | 2.07 | 2.05 | 2.02 | | 11 | 2.29 | 2.30 | 2.30 | 2.29 | 2.29 | 2.28 | 2.26 | | 20 | 3.22 | 3.23 | 3.22 | 3.22 | 3.22 | 3.22 | 3.21 | | 36 | 4.26 | 4.26 | 4.26 | 4.25 | 4.25 | 4.25 | 4.25 | Table 4.3 lists the varactor on-state and off-state Q values for the transistors of Table 4.1. Here we see that off-state Q is greater than on-state Q, and that Q increases with length. A possible reason for this is that at smaller lengths the gate resistance is the dominant loss as opposed to the channel loss. At small lengths (i.e., 8 and 11 nm) Q scales with $1/fin^2$ . As length increases (i.e., 36 nm) Q scaling approaches 1/fin, which is likely due to the channel loss becoming dominant. After the type of transistor and varactor element design were determined, the values and trends listed in Tables 4.1, 4.2 and 4.3 were used to select the unit capacitance for the DCO frequency turning array. Clearly the 36 nm length devices gave the best Con/Coff ratio as well as Q value. At this point additional consideration was given to process variation. That is, transistors with fin = 5 or 6 would yield a more stable design than transistors with fewer fins. At the same time conserving Q in the varactor array is critical to minimizing PN. As a compromise, fin = 6 devices were selected as the array unit capacitance element, with fin = 5/6 devices used for creating $\Delta$ -capacitance values required for fine frequency resolution. This is discussed more deeply in section 4.4. Fin =Fin =Fin =Fin =Fin =Length Fin =Fin =State (nm) On Off On Off On Off On Off **Table 4.3:** Varactor Q/Finger ## 4.4 Tuning Array Range and Resolution The DCO frequency-tuning array architecture, recommended in [6], implements a coarse frequency array for PVT-calibration (i.e., coarse tuning to eliminate frequency error due to PVT variation), a medium frequency array for acquisition and a fine frequency array for tracking after the PLL is locked. During the locking process tuning progresses through these arrays in the order presented. Current array settings are frozen and the frequency is normalized as the PLL state machine progresses to the next finer tuning array. The work described in this document takes a different approach to tuning array architecture and operation. This approach exploits the geometry of the 7-nm FinFET process to realize fine resolution and consistent circuit element matching. Specifically, a single array including both coarse and fine tuning elements, where all elements are always available for tuning. Here, the DCO frequency tuning array consists of rows of series varactor pairs connected across signals tank\_I and tank\_r of the tank resonator. There are eight varactor pairs in each row - row loss and Q are discussed in section 4.6; LC-tank interconnect loss is discussed in section 4.8. The state of each varactor pair is controlled through the varactor row Flip-Flop and driver control circuit illustrated in Figure 4.18. Rows used for coarse frequency tuning, referred to as type I, were implemented with PMOS standard voltage threshold (svt) transistors of Length = 36 nm, Fin = 6 and Fingers = 2. Type I rows were optimized for frequency range per bit (i.e., maximum Con/Coff ratio), which is counter to the requirement of fine resolution to minimize phase noise. The remaining rows of the frequency tuning array consist of equal numbers of type II and two type III rows interleaved as in Figure 4.16. Here, fine frequency tuning resolution is achieved through a $\Delta$ -Capacitor [55,56] implementation that takes advantage of the linear relationship between FinFET capacitance and the number of transistor fins. These rows use varactors implemented with PMOS svt transistors of Length = 36 nm, Fingers = 1, type II fins = 6 and type III fins = 5. Coincident varactor elements in adjacent type II/III rows are paired and operate with inverted states (i.e., if type II varactor n is on; type III varactor n is off; and vice versa). This is illustrated in Figure 4.3. Additionally, this modular implementation allows the type I, II and III rows to share a common layout that has been optimized for loss and Q - see sections 4.5 and 4.6. As with type I rows, the state of each $\Delta$ -Capacitor is controlled through the varactor row Flip-Flop and driver control circuit illustrated in Figure 4.18. The operation of a single type II/III varcator pair is described in Table 4.4. Here, we are using typical varactor on/off capacitance values (i.e., type II Con = 493 aF/Coff = 1187.5 aF; type III Con = 412 aF/Coff = 98.8 aF) determined using small signal AC simulation and ignoring the parasitic capacitance of the row layout. It should be noted that these capacitance values differ from those listed in Table 4.1, which were simulated using earlier model parameters. With the $\Delta$ -capacitor pair state = 0, total capacitance is 265.35 aF; with the state = 1, total capacitance is 295.90 aF. Therefore, this circuit will always exhibit at least 265.35 aF and increase by 30.6 aF Figure 4.3: Row Type II/III $\Delta$ -Capacitor Configuration when the control state is one, resolving 1.28 MHz with a nominal centre frequency of 28.0 GHz and tank inductor of 96.8 pH using (B.10). See Appendix B for the derivation of a closed-form equation relating frequency resolution to $\Delta$ -capacitance. Table 4.4: Type II/III Row Varactor Pair Delta Capacitance | $\operatorname{ctrl}\langle n \rangle$ | 5 Fin | 6 Fin | Capacitance (aF) | Delta Capacitance (aF) | |----------------------------------------|-------|-------|--------------------------|------------------------| | 0 | On | Off | 412/2 + 118.7/2 = 265.35 | | | 1 | Off | On | 98.8/2 + 493/2 = 295.90 | 295.90 - 265.35 = 30.6 | Once the varactor array resolution is established the relationship between the type III/type III row $\Delta$ -capacitance and minimum incremental or unit capacitance of the type I rows must be established. This is necessary to ensure the varactor array can be controlled monotonically. That is, the total capacitance of the varactor array must increase/decrease in a linear manner as the array control word increases/decreases. Figure 4.4 illustrates the typical capacitance of a single type I row. This diagram shows that each row of the array has eight separate series varactor pairs controlled by eight control signals (i.e., 1 - 8) and a zero state indicating all varactors off. Cpara\_I = 5.545 fF is the capacitance of the row interconnect structure (i.e., the extracted layout). Coff\_tot shows the capacitance contribution due to the off state of eight parallel varactors. That is, an additional capacitance of 830 aF brings the parasitic capacitance of the row to 6.375 fF. As each individual varactor is turned on the row capacitance is incremented by a unit capacitance of 374 aF. Figure 4.4: Typical Capacitance - Row Type I Figure 4.5 is similar to Figure 4.4 except that its parasitic and off state capacitances represent two type II rows plus two type III rows (i.e., 2 x Cpara\_III + 2 x Cpara\_III = 2 x 5.2 fF + 2 x 5.3 fF = 20.99 fF. The additional parasitic off state capacitance, Coff\_tot, is the sum of type II Coff/2 = 118.7 aF/2 = 59.35 aF plus type III Con/2 = 412.0 aF/2 = 206.0 aF all multiplied by 16 varactors, which equals 4.246 fF for a Coff\_tot = 25.236 fF. The unit or $\Delta$ -capacitance is calculated from Table 4.4 and has a range of 16 steps plus zero to represent all series varactor pairs off. This illustration shows that the 12 x 30.6 aF = 367.2 aF, which is relatively close to the unit capacitance of a type I row, 374 aF. Therefore, 12 will be used as the modulus of the varactor array. The lowest DCO array capacitance, highest tank frequency, will occur when all varactors are off. Incrementing through all the settings of the array starts by counting Figure 4.5: Typical $\Delta$ -Capacitance - Row Type II/III from zero to 11 using the type II/III $\Delta$ -capacitors. The transition from 11 to 12 is realized by simultaneously turning off all the $\Delta$ -capacitors and turning on the first type I varactor. This cycle is repeated until all the type I and all type II/III are turned on, which yields the lowest tank frequency. It must be noted that the error (1.82 %) between the type I row unit capacitance of 374 aF and the type II/III capacitance of 12 on-state varactors, 367.2 aF, is correct for only small signal simulation carried out using the typical process, nominal voltage (0.75 V), temperature (27 °C) and extracted layout. Therefore, the error as well as the slope of the row type II/III Con curve shown in Figure 4.5 will change. The array must be characterized over all PVT and extracted corners to ensure monotonicity and adequate linearity. This is particularly important when frequency dithering or $\Sigma\Delta$ -modulation [36,56] is used for frequency synthesis and phase noise mitigation. Additionally, the frequency step size will change over the range of the varactor array. That is, as the varactor array code increases, the capacitance step becomes smaller w.r.t. the sum of the parasitic plus array capacitance that has been previously turned on. The frequency step size decreases as the array code increases. Therefore, the frequency resolution around the center frequency must be verified to be adequate. This phenomenon will also affect the modulus error discussed previously. Rp and thus tank amplitude, $V_m$ , for a constant source current will also decrease as array code increases. Therefore, it must be verified that the oscillator has enough gain to start at all array codes, as well as all PVT and extraction corners. Also, post oscillator duty-cycle correction circuits should be verified to operate adequately over all array codes. # 4.5 Con/Coff Optimization It is well understood that varactor capacitance is sensitive to PVT; however, it is also sensitive to oscillator amplitude. This is because a varactor is a voltage dependent capacitor as opposed to linear or fixed capacitor element. The general tendency is a reduction in on-state capacitance and increase in off-state capacitance with an increase in tank amplitude, which reduces the on/off capacitance ratio. This variation in varactor capacitance was determined from the ratio of the tank amplitude and fundamental current, using large-signal Periodic Steady State (PSS) analysis rather than small-signal AC analysis. Figure 4.6 illustrates the on-state capacitance of standard voltage threshold (svt) and ultra-low voltage threshold (ulvt) 7-nm PMOS varactors of one finger, six fins and 36 nm length, as amplitude (i.e., equivalent to tank Vp) is varied. Due to the lower threshold voltage, the ulvt device has less variation in on-state capacitance over amplitude. Figure 4.6: On Capacitance (6 Fin, L = 36 nm, 1 Finger) svt, ulvt vs. Amplitude The ratio of on/off-state capacitance is compared for svt and ulvt components in Figure 4.7. Here the on/off ratio of the svt device shows less variation than the ulvt device. This is because the svt device threshold voltage ( $\approx 300$ mV) is closer to VDD/2. Thus, svt devices yield a larger on/off capacitance ratio (i.e., larger frequency step size) and ulvt devices achieve finer frequency resolution. In spite of the larger variation in capacitance, svt devices were chosen for the DCO varactor array based on the smaller variation in on/off capacitance ratio. **Figure 4.7:** On/Off Capacitance Ratio (6 Fin, L = 36 nm, 1 Finger) svt, ulvt vs. Amplitude Figure 4.8 illustrates the varactor capacitance sensitivity to temperature normalized to 50 °C. The temperature variation is from -40 °C to 125 °C. The variation in capacitance, from low to high temperature, is +1.9 % to -1.1 % for Con and -0.7 % to +1.2 % for Coff. Figures 4.9 illustrates the varactor capacitance sensitivity to voltage. The voltage variation is $\pm$ 10 % or 0.75 V $\pm$ 75 mV normalized to 0.75 V. The resulting variation in capacitance is -3.0 % to +2.5 %. Process variation was determined, using slow/typical/fast component libraries, to be -4.0 % to +4.2 %. The total worst case variations due to PVT are summed in Table 4.5. Although variation due to post-layout parasitic extraction is not included, these numbers are presented here as a check for the MATLAB® models demonstrated **Figure 4.8:** On/Off Capacitance (6 Fin, L=36 nm, 1 Finger) Temperature Sensitivity Figure 4.9: Varactor (6 Fin, L = 36 nm, 1 Finger) Voltage Sensitivity | | Con (min) | Con (max) | Coff (min) | Coff (max) | |-------------|-----------|-----------|------------|------------| | Variation | (%) | (%) | (%) | (%) | | Process | -4.0 | +4.2 | -4.0 | +4.2 | | Voltage | -3.0 | +2.5 | -3.0 | +2.5 | | Temperature | -1.1 | +1.9 | -0.7 | +1.2 | | Total | -8.1 | +8.6 | -7.7 | +7.9 | Table 4.5: PVT Variation of Varactor Capacitance in section 4.8. # 4.6 Row Loss Minimization - Q Optimization The minimum value of Q for the frequency tuning array rows is a critical result in the implementation of any DCO. While the Q value of individual varactor elements is quite good at 28 GHz, it is expected that the loaded-Q (i.e., Q factor including resistive losses due to row layout and interconnect - extracted parasitics) of each row will be significantly degraded. This problem can be mitigated by minimizing the loss of the common row layout. An initial layout was created and extracted to produce an equivalent lumped-element circuit. This was analysed and various measures were taken to reduce its loss. This procedure was repeated several times until four candidate row layouts were designed, labelled A, B, C and D. Q results are listed by layout in Tables 4.6 and 4.7. **Table 4.6:** Minimum Q Across Type I Row for A, B, C and D Layouts | Layout | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |--------------|-------|-------|-------|-------|-------|-------|-------|-------|-------| | A | 97.67 | 46.32 | 31.98 | 25.30 | 21.42 | 18.89 | 71.10 | 15.76 | 14.71 | | В | 68.20 | 35.08 | 25.15 | 20.33 | 17.44 | 15.50 | 14.11 | 13.05 | 12.21 | | $\mathbf{C}$ | 63.96 | 34.22 | 24.58 | 19.84 | 17.01 | 15.12 | 13.77 | 12.75 | 11.93 | | D | 49.52 | 24.75 | 17.55 | 14.13 | 12.14 | 10.83 | 9.89 | 9.19 | 8.45 | Differential Q was found using s-parameter analysis and an equation set up in the simulator calculator. Table 4.6 lists the worst-case Q results, based on PVT variation and extracted corner - RCworst\_CCworst, by varactor element position as each varactor is switched on across the type I row. This shows that the off-state Q is larger than the on-state Q, which is consistent with the analysis done in section 4.3. Therefore, as additional varactors are turned on the Q of the row drops. It should be pointed out that at this point in the development of the 7-nm process it was unclear if the transistor channel resistance was modelled accurately. Table 4.7 lists the type I row minimum and maximum Q values over process extremes. Here simulations were carried out using slow/typical/fast process corners at AVDD extremes of 0.675 V and 0.825 V. | <b>Table 4.7:</b> M | Iin/Max Ty | pe I Row Q i | for A, B, C ε | and D Layouts $A$ | Across Corners | |---------------------|------------|--------------|---------------|-------------------|----------------| | | | | | | | | <u>-</u> | RCworst_CC | Cworst 125°C | RCbest_CCbest - $40^{\circ}$ C | | | | |--------------|------------|--------------|--------------------------------|-----------|--|--| | Layout | Minimum Q | Maximum Q | Minimum Q | Maximum Q | | | | A | 14.71 | 106 | 46.43 | 433.50 | | | | В | 12.21 | 73.11 | 40.11 | 334.70 | | | | $\mathbf{C}$ | 11.93 | 68.53 | 39.67 | 311.70 | | | | D | 8.45 | 53.85 | 30.67 | 245.10 | | | In summary, considering only Q the A layout was the best design and D the worst. The difference between the B and C layout Q values is marginal. However, additional considerations were brought to bare in section 4.8 using large signal simulation results to build MATLAB® DCO models. # 4.7 Class-C Oscillator Architecture The Colpitts differential LC-tank oscillator architecture was selected as the core of this DCO. The reference [5] was used as a guide for this harmonic oscillator design based largely on its PN performance. The key features of this design are listed here. 1. A large tail capacitance is a fundamental component that both provides a low impedance noise path to ground and insures class-C operation. - 2. Core transistors are PMOS, selected to minimize noise, that operate principally in saturation with only slight excursions into weak-triode operation. This ensures LC-tank sine-wave fidelity, which minimizes the up-conversion of baseband noise. - 3. DC gate biasing is used to enable a large oscillation amplitude while maintaining the core transistors in saturation. This improves the Signal-to-Noise Ratio (SNR) of the core oscillator. The improved efficiency of Class-C oscillator operation over more conventional designs results in reduced power dissipation for a given output level. However, the major objective in this design is to minimize the contribution to output PN made by the core transistors that supply the negative resistance required to establish oscillation. This is achieved by reducing the current conduction angle to approximately 120° and ensuring that that current aligns with the peak voltage amplitude developed across the tank $R_p$ at resonance. That is, oscillation current plus noise current originating from the oscillator core are injected into the LC-tank only during the conduction angle (i.e., the portion of the oscillation period when current is flowing). Therefore, reducing the conduction angle from 180° to 120° both improves the efficiency and reduces the time that noise is injected into the system. In order to maintain or increase the LC-tank amplitude with a reduced conduction angle an additional current source must be present to both increase the current amplitude and force class-C operation. This is achieved by installing a large tail capacitance, $C_{tail}$ of Figure 4.10. In this circuit, the current source is providing a constant current to the negative resistance circuit. Considering half the differential period of operation, during the $60^{\circ}$ of $180^{\circ}$ that the oscillator core transistor is not conducting, this constant current source charges $C_{tail}$ . When the core transistor is on current from both the constant current source and $C_{tail}$ will flow, increasing $I_{ds}$ of the core transistor. This increases the LC-tank amplitude and reduces the time during which noise is injected into the circuit. Even though the current being injected into the LC-tank is not sinusoidal, the resulting voltage is sinusoidal. This is because the filtering of the LC-tank limits the harmonic content to the fundamental component, which results in a sinusoidal response. Additionally, this insures that the majority of the power is dissipated at the fundamental frequency and virtually no power is dissipated at harmonic frequencies. Reference [5] states that for the same current consumption, the theoretical improvement in PN is 3.9 dB when compared to more elementary differential LC-tank oscillator implementations. Additionally, $C_{tail}$ naturally filters out noise from the biasing current and does not expose this sensitive node to parasitic capacitances that could introduce large PN deterioration. Consideration must also be given to the maximum size of $C_{tail}$ and a closed-form equation is derived in [5] to determine this value. When $C_{tail}$ is too large the LC-tank voltage amplitude will become unstable, an effect referred to as squegging. Generally, this appears as a periodic Amplitude Modulation (AM) of the LC-tank amplitude. This phenomenon is reproducible in simulation and the capacitance value of $C_{tail}$ was reduced with significant margin to guard against squegging in the fabricated design. Figure 4.10: Class-C Oscillator Core Figure 4.10 is the schematic diagram of the class-C LC-tank oscillator core implemented in this work. Analogue High VDD (AHVDD) at the top of the diagram is a nominal 1.5 V that supplies the constant current source, represented by transistor $M_{CS}$ and illustrated by Figure 5.2. Cen is the constant current source Current Enable signal. This voltage was employed to ensure both the current source and oscillator transistors would remain in saturation though out the oscillator period. The operation of the oscillator constant current source is discussed in section 5.1. The $C_{tail}$ is connected between Analogue VSS (AVSS) and the common source of the PMOS negative resistance core transistors. The drain nodes of these transistors are connected to the differential LC-tank nodes, $V_{tank\_L}$ and $V_{tank\_R}$ . The block marked $C_{tune}$ represents the DCO frequency tuning array (see section 4.10) that makes up a resonant circuit with inductor L (see section 4.11). $R_p$ represents the effective resistance of the lumped parallel LC components that make up the tank across which the oscillator sine wave output is generated at resonance. The centre tap of inductor L is connected to AVSS. The voltage divider on the right-hand side of the figure creates a bias voltage that lifts up the gate bias voltage to increase the LC-tank amplitude while maintaining the core transistors in saturation. This is a simplified representation of a programmable DC bias circuit. The DC bias is capacitively isolated from the LC-tank. These isolating capacitors form a voltage divider with the gate capacitance and associated parasitic capacitances. Ideally, from [5] the coupling factor between tank and gate should be K=1. However, the sized of these capacitors must be chosen carefully to minimize distortion of the oscillator signal. That is, as the LC-tank voltage that is fed back to the opposite gate changes, the gate capacitance changes. This distorts the feedback voltage, which in turn distorts that LC-tank voltage. A significant effort was made to minimize this distortion as it would deteriorate the Impulse Sensitivity Function (ISF), discussed in Appendix A, and result in an increase in up-converted thermal and flicker PN. Figure 4.11 (created from [5]) shows three operating modes of the differential Colpitts oscillator. The top plot shows well formed current pulses denoting class-C operation, which produces a tank amplitude, $V_m$ (4.4). $$Vm = I_{bias} \cdot R_p \tag{4.4}$$ where the LC-tank current, $I_{\omega 0}$ , is (4.5). $$I_{\omega 0} \approx I_{bias}$$ (4.5) The centre plot shows $I_{\omega 0}$ when the oscillator is operating in a half circuit class-A mode with $C_{tail}$ removed. Here the transistors are conducting for half the oscillation period, resulting in an approximate square wave current with 50 % duty cycle. Thus, the tank current is reduced (4.6). Figure 4.11: DCO Normalized Bias Current $$I_{\omega 0} \approx (2/\pi) \cdot I_{bias}$$ (4.6) The bottom plot shows $I_{\omega 0}$ when a class-C configuration with a large $C_{tail}$ is operated in deep-triode, where the tank current is reduced (4.7). $$I_{\omega 0} \approx (0.62) \cdot I_{bias} \tag{4.7}$$ In both the half circuit class-A and deep-triode operation the reduction in $V_m$ is approximately 4.3 dB, which is similar to the PN degradation. However, in the case of deep-triode operation the PN penalty is approximately 8.2 dB. According to [5] this is attributed to an increase in transistor noise and distortion in the ISF. In summary, this information is presented here to emphasise that a large $C_{tail}$ can cause significant PN degradation when the oscillator is allowed to operate outside saturation and into deep-triode. Figure 4.12 was re-created from [5] to illustrate the biasing that was used to increase $V_m$ , while ensuring that the core transistors generally remain in saturation and at the same time are not being pushed into breakdown. The signals of this figure were used in simulation, as illustrated, to adjust the oscillator biasing. Figure 4.12: Class-C Oscillator Biasing The left and right nomenclature refers to each side of the differential oscillator as shown in Figure 4.10. Also, the level of the signal labelled $V_{gate\_Left}$ is the same as the drain voltage $V_{tank\_R}$ , and $V_{gate\_Right}$ is the same as the drain voltage $V_{tank\_L}$ . Therefore, using the equation from [5] (4.8), with AVDD = 0.75 V, $V_{bias} = 0.3 V$ , $V_{th} = 0.25 V$ and K = 1, limits the LC-tank amplitude to $V_m < 0.35 V$ . $$V_m < \frac{AVDD - V_{bias} + V_{th}}{1 + K} \tag{4.8}$$ This is an underwhelming result that is based on the two limiting assumptions. First, that $V_{ds} < 0.75 \ V$ . In Figure 4.10, AHVDD (nominal 1.5 V) is at the top of the series combination of current source and oscillator core. Therefore, $V_{source}$ is not limited to 0.75 V, but can be extended to the breakdown limit of the transistor, just as long as both the current source and core transistor remain in saturation. This is achieved by adjusting the level of $V_{bias}$ . That is, as $V_{bias}$ is increased, $V_{source}$ is pushed up closer to AHVDD and $V_m$ is increased. Second, only minimal PN deterioration will occur if the core transistors are allowed to move into moderate, but not deep, triode operation. In this design two catastrophic conditions must be guarded against. First, at oscillator start $V_{source}$ will rise to a higher voltage than during normal operation as the oscillation builds from initial class-A operation to class-C operation. This can push the programmable constant current source transistors out of saturation and starve the core transistors of current. In order to overcome this condition, the gain of the core oscillator is approximately three. Second, the transistor voltages $V_{ds}$ , $V_{gs}$ and $V_{gd}$ must be verified to be less than the breakdown voltage limit (i.e., < 1.0 V) across PVT. In summary, the DCO core was implemented to generate a 28-GHz sine wave signal of $V_m = 1 V$ or 2 V differential. ### 4.8 DCO Model Simulation Tables 4.6 and 4.7 compare the minimum and maximum Q values of the four candidate row layouts discussed in section 4.6. While this is a useful starting point in determining the optimum DCO array implementation, additional parameters such as tuning range, tuning range margin, loss, required current and array size must also be considered. A MATLAB® model was created to determine these parameter values across the DCO tuning array - see Appendix C for model code. The model starts with the basic equation for LC-tank resonator frequency, $f_0$ , (4.9). This was used to determine the oscillator frequency that corresponds to each varactor array setting, C. L is the required inductor value. $$f_0 = \frac{1}{2\pi\sqrt{LC}}\tag{4.9}$$ In this initial model only type I rows were considered. A vector of all array capacitance values, $C_{var}$ , was created using (4.10). Here the capacitance of the tuning array was determined by programming the $C_{on}/C_{off}$ state of every variator element. $$C_{var} = m \left( C_{para} + n \times C_{off} \right) + k \times C_{on} \left( 1 - C_{off} / C_{on} \right)$$ $$\tag{4.10}$$ where m is the number of type I rows, n is the number of varactor elements in each row and k is the number of on-state varactor elements and increments from $k = 1, 2 \dots (m \times n + 1)$ . The parasitic capacitance of the $pll\_dco\_gm$ block, $C_{gm}$ , the $pll\_divider$ block, $C_{div}$ , the selected row layout, $C_{par}$ , as well as $C_{on}$ and $C_{off}$ were determined using large signal PSS simulation. This simulation was carried out at both the Slow NMOS/Slow PMOS (SS) process corner at low voltage and high temperature to create worst case PVT results and at the Fast NMOS/Fast PMOS (FF) process corner at high voltage and low temperature to create best case PVT results. The extracted layouts were done at the reworst\_CCworst and rebest\_CCbest corners. Table 4.8 lists the resulting slow-worst and fast-best case capacitance values that were used to quantify the full range of operation of the DCO. **Table 4.8:** Large Signal Simulation Capacitance Results for Slow/Fast PVT and Layout Extraction reworst\_CCworst/rcbesst\_CCbest | Circuit | Slow - Worst | Fast - Best | |------------|------------------|------------------| | Element | Capacitance (fF) | Capacitance (fF) | | $C_{on}$ | 1.144 | 0.877 | | $C_{off}$ | 0.26385 | 0.2124 | | $C_{para}$ | 5.743 | 5.569 | | $C_{div}$ | 21.51 | 12.86 | | $C_{gm}$ | 120.1 | 91.5 | At vector of total capacitance, C, for each array setting was created using (4.11). $$C = C_{var} + C_{gm} + C_{div} (4.11)$$ Using the Q values for the associated corners (e.g., Table 4.6), a row vector of series resistance elements, $R_{S\_C}$ , associated with each varactor was created (4.12). $$R_{S\_C} = \frac{1}{(\omega QC)} \tag{4.12}$$ Each value of $R_{S\_C}$ was used to create a corresponding parallel resistance row vector, $R_{P\_C}$ , from (4.13). $$R_{P\_C} = \frac{CF}{\left(\left(\omega C\right)^2 \times R_{S\_C}\right)} \tag{4.13}$$ where correction factor $CF = (28 \times 10^9/f_0)$ was used to ensure that the resistance was calculated at the selected array frequency, $f_0$ . A vector of $R_{P\_C}$ values was created for each element of the varactor array and using (4.14) to determine parallel resistance of the inductor, $R_{P\_L}$ , assuming an inductor Q = 15, a vector of total parallel resistance values, $R_{P\_total}$ , was calculated (4.15). $$R_{P,L} = \omega LQ \tag{4.14}$$ $$R_{P,total} = (R_{P,C} \parallel R_{P,L}) \tag{4.15}$$ The Q of every array state was then calculated (4.16). $$Q_{array} = 2\pi \left(28 \times 10^9 \times R_{P\_C} \times C_{var}\right) \tag{4.16}$$ The required oscillator current, Ireq, across array settings was calculated using (4.17), assuming an amplitude, e.g., $V_m = 500 \ mV$ . $$I_{req} = \frac{(V_m \times \pi)}{R_{P.total}} \tag{4.17}$$ Conversely, $V_m$ can be computed for a specific oscillator current, $I_{req}$ , across array settings using (4.18). $$V_m = \frac{(I_{req} \times R_{P\_total})}{\pi} \tag{4.18}$$ Tables 4.9, 4.10, 4.11 and 4.12 summarize the results generated by the MATLAB<sup>®</sup> model, row\_evaluation.m, included in Appendix C. Each table row lists the worst case results of model runs using the selected number of tuning array rows. The selection of the final row layout was based on the assumption that a minimum frequency range or margin of $\pm$ 500 MHz around the 28 GHz centre frequency was required. That is, the tuning array had enough range that the slowest - worst case corner and fastest - best case corner could both select the centre frequency of 28 GHz with at least 500 MHz to spare. This was achieved by selecting the inductance to have approximately equal margin around 28 GHz for both corners. The Array setting column of the tables shows the required setting for a centre frequency of approximately **Table 4.9:** Tuning Range and Loss at 28.0 GHz - Layout A (Slow - reworst CCworst) | Number | Inductor | Margin | Rp_C | Rp | Cvar | Ibias | | Array | |---------|----------|-----------------------|------------|------------|--------|-------|------|---------| | of Rows | (pH) | $(\pm \mathrm{~MHz})$ | $(\Omega)$ | $(\Omega)$ | (pF) | (mA) | Q | Setting | | 18 | 98.5 | 350 | 424.1 | 161.16 | 0.3280 | 9.7 | 24.5 | 103/145 | | 20 | 92.0 | 400 | 388.9 | 149.47 | 0.3513 | 10.5 | 24.0 | 112/161 | | 22 | 86.5 | 450 | 363.9 | 140.27 | 0.3732 | 11.2 | 23.9 | 119/177 | | 24 | 81.5 | 500 | 337.8 | 131.41 | 0.3965 | 12.0 | 23.9 | 128/193 | | 26 | 76.6 | 600 | 309.0 | 122.20 | 0.4218 | 12.9 | 22.9 | 140/209 | **Table 4.10:** Tuning Range and Loss at 28.0 GHz - Layout B (Slow - reworst CCworst) | Number | Inductor | Margin | Rp_C | Rp | Cvar | Ibias | | Array | |---------|----------|-----------------------|------------|------------|--------|-------|------|---------| | of Rows | (pH) | $(\pm \mathrm{~MHz})$ | $(\Omega)$ | $(\Omega)$ | (pF) | (mA) | Q | Setting | | 18 | 103.0 | 400 | 368.1 | 156.35 | 0.3136 | 10.0 | 20.3 | 104/145 | | 20 | 96.7 | 450 | 342.7 | 146.26 | 0.3339 | 10.7 | 20.1 | 111/161 | | 22 | 91.0 | 500 | 318.3 | 136.88 | 0.3548 | 11.5 | 19.9 | 119/177 | | 24 | 86.0 | 550 | 297.4 | 128.73 | 0.3758 | 12.2 | 19.7 | 127/193 | | 26 | 81.5 | 600 | 280.3 | 121.70 | 0.3961 | 12.9 | 19.5 | 134/209 | **Table 4.11:** Tuning Range and Loss at $28.0~\mathrm{GHz}$ - Layout C (Slow - reworst CCworst) | Number | Inductor | Margin | Rp_C | Rp | Cvar | Ibias | | Array | |---------|----------|-----------------------|------------|------------|--------|-------|------|---------| | of Rows | (pH) | $(\pm \mathrm{~MHz})$ | $(\Omega)$ | $(\Omega)$ | (pF) | (mA) | Q | Setting | | 18 | 105.7 | 420 | 375.5 | 160.05 | 0.3054 | 9.8 | 20.2 | 102/145 | | 20 | 99.2 | 500 | 346.7 | 149.16 | 0.3256 | 10.5 | 19.9 | 110/161 | | 22 | 93.6 | 550 | 324.2 | 140.20 | 0.3452 | 11.2 | 19.7 | 117/177 | | 24 | 88.4 | 600 | 302.3 | 131.67 | 0.3654 | 11.9 | 19.4 | 125/193 | | 26 | 83.9 | 650 | 285.0 | 124.60 | 0.3849 | 12.6 | 19.3 | 132/209 | 28 GHz, using typical parameters, w.r.t. the maximum array setting. **Table 4.12:** Tuning Range and Loss at 28.0 GHz - Layout D (Slow - reworst CC-worst) | Number | Inductor | Margin | Rp_C | Rp | Cvar | Ibias | | Array | |---------|----------|----------------------|------------|------------|--------|-------|------|---------| | of Rows | (pH) | $(\pm \mathrm{MHz})$ | $(\Omega)$ | $(\Omega)$ | (pF) | (mA) | Q | Setting | | 18 | 112.1 | 500 | 282.3 | 144.44 | 0.2884 | 10.9 | 14.3 | 106/145 | | 20 | 105.5 | 570 | 261.2 | 134.78 | 0.3064 | 11.7 | 14.1 | 114/161 | | 22 | 99.6 | 630 | 243.0 | 126.28 | 0.3245 | 12.4 | 13.9 | 122/177 | | 24 | 94.4 | 700 | 227.4 | 118.88 | 0.3425 | 13.2 | 13.7 | 130/193 | | 26 | 89.8 | 750 | 214.8 | 112.67 | 0.3598 | 13.9 | 13.6 | 137/209 | Table 4.12 shows that Layout D has the worst loss or lowest Q and uses the largest inductors. Layout A, Table 4.9, has the best loss or highest Q and uses the smallest inductors. However, Layout A requires 24 rows and Layout B, Table 4.10, requires 22 rows to meet the $\pm$ 500 MHz tuning requirement. Therefore, layout C, Table 4.11, was selected for the design as it requires only 20 rows and has a Q similar to Layout B. Also its current requirement is less than Layouts B or D and is similar to Layout A. Examples of the plots used here are presented in Appendix C. It should be noted that the Q values listed in Tables 4.9, 4.10, 4.11 and 4.12 include the total row layout losses for the number of rows included in each model, plus the losses for the nominal variator array setting. Losses in the metal interconnect between the variator array and the inductor were not included as they were small enough to be ignored. That is, $R_{lumped} \approx 55.2 \, m\Omega$ , < 2 % of the series resistance of the inductor, see Appendix D. Furthermore, this metal interconnect added 10.24 pH or approximately 2 % of the total LC-tank inductance, which resulted in $\pm 1\%$ resonant frequency variation in the 14-GHz implementation. Therefore, this self-inductance was ignored. Reference [47] states that parasitic inductance is generally not relevant below 20 GHz. The 20 type I row design was modified to include two type II and two type III rows, working in concert, to implement fine tuning. Several row combinations were tested until the best compromise of 18 type I rows, two type II rows and two type III rows was selected. This selection was carried out using the MATLAB® model DCO\_tuning\_range\_evaluation.m, included in Appendix C, which plots the DCO output frequency against tuning array S values. The model includes all row types, plots multiple corners and also plots extracted circuit simulation results for comparison with the model results. At this point the number of control codes can be determined. Eighteen type I rows of eight varactors each gives a partial product of 144. The type II/III varctors further resolve these 144 codes to 144 x 11 = 1728 codes. Assuming only 11 type II/III codes are used, the all varactors on state includes an additional 11 codes at the end of the array to make 1728 + 11 = 1739. Including the all-off code zero, the DCO tuning array has 1740 separate codes - 0 to 1739. The large signal simulated capacitance values for $C_{on}$ , $C_{off}$ , $C_{para}$ and $C_{gm}$ were updated for the typical and extreme corners as listed in Table 4.13. The inductance value was tuned to 99.22 pH. | Table 4.13: | Large S | Signal | Capacitance for | Worst, | <b>Typical</b> | and Best | Corners | |-------------|---------|--------|-----------------|--------|----------------|----------|---------| | | | | | | | | | | Circuit | Slow - Worst | Typical | Fast - Best | |------------|------------------|------------------|------------------| | Element | Capacitance (fF) | Capacitance (fF) | Capacitance (fF) | | $C_{on}$ | 1.027 | 0.893 | 0.750 | | $C_{off}$ | 0.266 | 0.257 | 0.231 | | $C_{para}$ | 5.955 | 5.554 | 5.288 | | $C_{div}$ | 16.09 | 15.93 | 15.6 | | $C_{gm}$ | 111.4 | 107.8 | 101.6 | Figure 4.13 illustrates the model results for three corners generated by the model, i.e., High Freq cbest\_ccbest, Typical and Low Freq cworst\_ccworst, as well as two corners from circuit simulation, i.e., Sim High Freq cbest\_ccbest and Sim Low Freq cworst\_ccworst. It should be noted that while a separate frequency calculation was carried out and plotted for all 1740 values of S in a very short period of time, only six frequency values were evaluated from circuit simulations, which took much longer. A 28 GHz reference datum was included to improve readability. The tuning margin of 940 MHz at S=0 and 660 MHz at S=1720 demonstrate that the requirement of at least $\pm 500$ MHz has been met by the worst and best case corners. Figure 4.13: DCO Tuning from Model and Simulation - 28 GHz It should be noted that although these tuning curves show a small reduction in frequency step at higher values of S, lower values of frequency, the varactor elements demonstrate a relatively linear response over frequency. The switching between coarse and fine tuning array elements that will occur on every $12^{th}$ step of S does not show significant mismatch or non-monotonicity. However, it was suspected, as little time was available to investigate this condition, that element mismatch may be more prominent in the fabricated circuit or if a Monte Carlo simulation were carried out. The similarity between coincident modelled and simulation curves verifies the usefulness of this model to test tuning array options and produce accurate results. Figure 4.14 plots the percent error w.r.t the simulated results of the extreme corners plotted in Figure 4.13. The maximum error is approximately $\pm$ 0.9 %. The 14 GHz version of the DCO was designed using EM and circuit simulation of a sample of frequency points across the tuning range of S. Figure 4.15 illustrates the tuning range results at 14 GHz. These frequency tuning curves were generated in two ways, which gave almost identical results. That is, the capacitance values of Table 4.13 were used and the inductor was tuned to produce frequency tuning curves with adequate margin. This resulted in an inductance value of L = 398 pH somewhat different than the circuit simulation result of 500 pH. The second approach was to used L = 500 pH and tune the total capacitance to get the same curves. This resulted in a total capacitance of 0.795C, where C is the total capacitance of the Figure 4.14: DCO Tuning Error - Model vs. Circuit Simulation Over Corners LC-tank. This divergence between the circuit simulation and model results may be due to the large signal capacitance values being determined at 28 GHz rather than 14 GHz. Additionally, as process model parameters were evolving during this design, model updates may also have contributed to this difference. Figure 4.15: DCO Tuning from Model and Simulation - 14 GHz The combination of inductor quality factor, $Q_L=15$ , and worst case varactor array quality factor, $Q_{C\text{-worst}}\approx 20$ from Table 4.11 using worst case extracted layout (i.e., using StarRC<sup>TM</sup> from Synopsys [57]), gave an estimate of worst case 28 GHz DCO $Q_{DCO\_worst} \approx 8.6$ (4.19). Simulation results for the 14 GHz DCO revealed a Q = 10. $$\frac{1}{Q_{DCO\_worst}} = \frac{1}{Q_L} + \frac{1}{Q_{C\_worst}} = \frac{1}{15} + \frac{1}{20} = \frac{1}{8.6}$$ (4.19) As a final note on the 28 GHz oscillator, Cadence<sup>®</sup> PNOISE analysis was used to determine that the typical flicker noise corner of the DCO occurs at a frequency offset of approximately 1 MHz with a typical PN of -98 dBc/Hz. This simulation was not recorded for the 14-GHz DCO. ## 4.9 DCO Implementation A functional block diagram of the DCO, excluding the LC-tank inductor, is illustrated in Figure 4.16. The differential LC-tank oscillates across the signals $tank_{\perp}l$ and $tank_{\perp}r$ - see Appendix D for LC-tank interconnect resistance and inductance analysis. The functional block at the top of the diagram marked, $pll_{\perp}dco_{\perp}gm$ , is the oscillator core described in section 4.7. Below this is a thermometer encoded frequency tuning array of 22 rows each having eight varactor elements that can be either switched on or off. The eight varactores of type I rows are each a single finger of six fins of length = 36 nm, the eight varactors of type III rows are each a single finger of five fins of length = 36 nm. There are 18 type I rows numbered one to nine and 14 to 22, two type II rows (numbered 11 and 13) and two type III rows (numbered 10 and 12). The type II and type III rows are interleaved to work in pairs to produce a $\Delta$ -Capacitance for fine frequency resolution as detailed in section 4.4. These rows are placed in the physical center of the array to reduce mismatch that could negatively affect frequency step monotonicity. The address field of each varactor of each row, labelled $array\_data < range >$ , is listed on the left-hand side of the array. Here we see that the addresses increase from the physical centre of the array out towards both array ends. This was done to balance mismatch in the array and minimize frequency step size change across the programming range of the array. In section 4.4 it was determined that the type II/III varactors produce frequency Figure 4.16: DCO Core Functional Block Diagram steps that are $1/12^{th}$ the step size of the type I unit frequency step size. Armed with this relationship, the equations of Table 4.14 were derived to determine the tuning array Frequency Control Work (FCW), S. The range of S is determined from 18 type I rows of eight varactors each, which results in 144 array unit frequency settings that are each divided by 12 to yield $144 \times 12 = 1728$ frequency settings. Including the type II/III array settings gives an additional 12 settings for a total of 1728 + 12 = 1740. Therefore, S has 1740 unique frequency settings defined as $S = \{0, 1, 2, ..., 1739\}$ . Row Type Equation Array Field Number of Bits A = S DIV 12 $array_data < 175 : 32 >$ Ι 144 $array_data < 31:16 >$ II16 $B = S \ MOD \ 12$ $array_data < 15:0 >$ III 16 C = 12 - B **Table 4.14:** Frequency Control Word (S) Derivation The equations of Table 4.14 can be used to determine the varactor states required for a value of S. For example, if S = 655, then A = 655 DIV 12 = 54, B = 655 MOD 12 = 7 and C = 12 - 7 = 5. Therefore, when S = 655 there will be 54 type I, seven type II and five type III varactors set to an on-state, all other varactores will be off. These equations are used in section 4.8 to develop a MATLAB® model of the DCO. This frequency tuning array described above is divided into two sections, i.e., 144 unit varactors that determine the frequency tuning range and 12 fine tuning varactors that establish the frequency resolution. The approach recommended by [6] has a coarse frequency array for PVT-calibration, a medium frequency array for acquisition and a fine frequency array for tracking after the PLL is locked. The tuning position in the first two arrays is fixed and normalized as the PLL state machine progresses from PVT to acquisition and then to tracking. This has the advantage of avoiding matching problems between arrays. The approach taken in this design was to make the array more compact by always giving the PLL state machine access to the full range of tuning. The thinking was that considering the fine resolution that could be achieved in the 7-nm process and by placing the fine tuning rows in the physical centre of the array, the mismatch between the two varactor sizes would be mitigated. The functional block at the bottom of the frequency tuning array, labelled *pll\_divider*, is illustrated in Figure 4.17. This block has several functions. The Driver block isolates the the LC-tank from the clock distribution circuits, self biasing inverter amplifiers convert the sine wave to a square wave and series inverts are used to provide current gain to drive the Clock Source and /4 blocks that follow. The /4 block divides the 14-GHz square wave output clock of the DCO by four to produce a 3.5-GHz feedback clock for the Bang Bang PLL (i.e., differential signals ck4 and ck4b). At his point reference is made only to the 14-GHz implementation of the DCO for PAM-4 operation as the 28-GHz version had been abandon. The Clock Source block has two functions. It distributes the 14-GHz DCO output clock to two Low Voltage Differential Signal (LVDS) drivers and implements current gain, through a chain of inverters, to ensure the LVDS drivers can be driven adequately over all PVT and extraction corners. Two LVDS drivers were implemented to distribute the clock across differential $100-\Omega$ transmission lines to the five SERDES transceivers - three on the left-hand transmission line and two on the right-hand transmission line. Figure 4.17: DCO Clock Buffer and Distribution Functional Block Diagram Figure 4.18 shows the DCO varactor row control functional block diagram. The two blocks at the diagram top illustrate that there is a $Row\_FF$ block associated with each array row. This block controls the on/off state of the eight varactors of each row through the bus clk < 7:0 >. The expanded view of the $Row\_FF$ block shows that D Flip-Flops are employed to latch the row state. The $row\_clk$ signal is common to all $22\ Row\_FF$ blocks so that the states of all the varactors are updated simultaneously. The $Row\_FF$ blocks are located on the left-hand side of the frequency tuning array. All Flip-Flop outputs are buffered, using an inverter from the digital library, to ensure that each varactor state change occurs promptly at the far side of each row across PVT and extracted corners. This inverter drives the common drain/source varactor node to AVDD $(0.75\ V)$ for Con and to AVSS $(0\ V)$ for Coff. Figure 4.18: DCO Varactor Row Control Functional Block Diagram A binary-to-thermometer conversion circuit, not shown here, is used to interface the 22 Row\_FF blocks to the BBPLL state machine. Additionally, care was taken to place circuits on the right-hand side of the frequency tuning varactor array to provide a degree of symmetry to the layout. # 4.10 Description of Frequency Tuning Array Rows Figure 4.19 is a schematic diagram a type I row. The PMOS varactor gates are connected to the LC-tank nodes $tank\_l$ and $tank\_r$ , and all bulk (i.e., substrate) connections are common to AVDD. The varactor ctl < n > (i.e., shorted source to drain) signals are common across four devices. That is, two varactors of length = 36 nm, fins = 6 and fingers = 2 are connected in series across the LC-tank, resulting in two times one half the capacitance of a single varactor. In Figure 4.19 each FinFET finger is represented by a separate transistor element. Figure 4.19: DCO Type I Varactor Row The total frequency tuning array capacitance associated with the type I rows $(C_{array}I)$ is calculated using (4.20), which is derived from Figure 4.19. $$\sum array\_data [175:32] \times (C_{on} - C_{off}) + 18 \times (C_{para} + 8 \times C_{off})$$ (4.20) Here $\sum array\_data$ [175 : 32] is the A parameter of the FCW listed in Table 4.14, $C_{on}$ is the varactor on-state capacitance, $C_{off}$ is the varactor off-state capacitance, 18 is the number of type I rows, $C_{para}$ is the parasitic capacitance of the row layout and 8 is the number of varactor elements in each row. Figure 4.20 is a schematic diagram a type II row. The PMOS varactor gates are connected to the LC-tank nodes $tank\_l$ and $tank\_r$ and all bulk (i.e., substrate) connections are common to AVDD. The varactor ctl < n > signals are connected to the shorted source to drain signals that are common across two devices. That is, two varactors of length = 36 nm, fins = 6 and fingers = 1 are connected in series across the LC-tank, resulting in one half the capacitance of a single varactor. The grey varactor indicates an element that is not populated in the circuit, but the position does exit physically in the common row layout. Figure 4.20: DCO Type II Varactor Row The total frequency tuning array capacitance associated with the type II rows $(C_{array\_II})$ is calculated using (4.21), which is derived from Figure 4.20. $$\sum array\_data \left[31:16\right] \times \frac{1}{2} \times \left(C_{on} - C_{off}\right) + 2 \times \left(C_{para} + \frac{8}{2} \times C_{off}\right) \tag{4.21}$$ Here $\sum array\_data$ [31:16] is the B parameter of the FCW listed in Table 4.14, 1/2 is required as the varactor capacitance is in series, $C_{on}$ is the varactor on-state capacitance, $C_{off}$ is the varactor off-state capacitance, 2 is the number of type II rows, $C_{para}$ is the parasitic capacitance of the row layout and 8/2 is the number of varactor elements in each row divided by two to account for the series connection. Figure 4.21: DCO Type III Varactor Row Figure 4.21 is a schematic diagram a type III row. The PMOS varactor gates are connected to the LC-tank nodes $tank\_l$ and $tank\_r$ and all bulk (i.e., substrate) connections are common to AVDD. The varactor ctl < n > signals are connected to the shorted source to drain signals that are common across two devices. That is, two varactors of length = 36 nm, fins = 5 and fingers = 1 are connected in series across the LC-tank, resulting in one half the capacitance of a single varactor. The grey varactor indicates an element that is not populated in the circuit, but the position does exist physically in the common row layout. The total frequency tuning array capacitance associated with the type III rows $(C_{array\_III})$ is calculated using (4.22), which is derived from Figure 4.21. $$\sum array\_data \left[15:0\right] \times \frac{1}{2} \times \frac{5}{6} \times (C_{on} - C_{off}) + 2 \times \left(C_{para} + \frac{8}{2} \times \frac{5}{6} \times C_{off}\right) \ (4.22)$$ Here $\sum array\_data$ [15:0] is the C parameter of the FCW listed in Table 4.14, 1/2 is required as the varactor capacitance is in series, 5/6 accounts for the reduced capacitance of the five fin varactor element, $C_{on}$ is the varactor on-state capacitance, $C_{off}$ is the varactor off-state capacitance, 2 is the number of type II rows, $C_{para}$ is the parasitic capacitance of the row layout, 8/2 is the number of varactor elements in each row divided by two to account for the series connection and the final 5/6 accounts for the reduced capacitance of the five fin varactor element. This architecture made it possible to create a common layout for the type I, type II and type III rows as the varactor elements were similar. The row optimization for loss and parasitic capacitance is discussed in section 4.6. Variations of (4.20), (4.21) and (4.22) were used in the MATLAB® models discussed in section 4.8 and listed in Appendix C to calculate the LC-tank capacitance. ## 4.11 28-GHz and 14-GHz Inductor Designs The layout of the 28-GHz inductor is shown in Figure 4.22. The inductor centre tap is located on the top part of the diagram, where it extends to connect to AVSS. Progressing downwards from the centre tap the inductor splits into two paths, forming the single turn, continuing to the bottom of the diagram where the two nodes, Tank L and Tank R, connect to the oscillator core transistor drains. The inductor is formed using two layers, AP (Al, thickness $t = 2.4 \mu m$ ) and M12 (Cu, thickness $t = 0.72 \mu m$ ) of width $w = 9.0 \mu m$ . The AP layer is the thickest metal and is normally used to breakout signals to die bump pads. M12, top layer metal and next layer down from the AP, is one of the thickest interconnect metal layers and runs underneath the AP path to reduce DC and AC or High Frequency (HF) resistance. The area of the inductor is isolated by a 137.4 $\mu m$ x 137.4 $\mu m$ keep out region. An analysis of the spiral inductor of Figure 4.22 was carried out using calculated, Figure 4.22: DCO Inductor Layout for 28.0 GHz Operation extracted and Electromagnetic Three-Dimensional (EM 3-D) simulation data generated with the PeakView EMD<sup>TM</sup> field solver software from Lorentz Solutions [58]. Initially, the inductance of the single turn spiral inductor was estimated using Wheeler's formula (4.23) [59]. $$L \approx \frac{9.4\mu \ n^2 \ a^2}{11d - 7a} \tag{4.23}$$ where n=1 is the number of windings, $a=62.8 \ \mu m$ is the average diameter and $d=71.8 \ \mu m$ is the outer diameter to give an inductance estimate of $L=129 \ pH$ . The inductance result from the field solver was $L=114.4 \ pH$ , which is reasonably close to the original estimate. The inductor series DC resistance was calculated to be 231.7 $m\Omega$ using the kit $\Omega/\Box$ parameter values for the AP and M12 layers in the parallel combination used to create the winding. That is, the AP of 11 $m\Omega/\Box$ in parallel with M12 of 22 $m\Omega/\Box$ over a length of 31.6 squares. At frequencies of 28 GHz and 14 GHz the AC or HF resistance must be considered, including skin effect and current crowding in a rectangular conductor cross-section. The skin depth, $\delta$ , of each conductor was calculated for both frequencies using (4.24). $$\delta = \sqrt{\frac{1}{\pi f \mu_r \mu_0 \sigma}} \tag{4.24}$$ where the permeability of a free space vacuum is $\mu_0 = 4\pi \times 10^{-7} \ H/m$ , the relative permeability of Cu is $\mu_r = 0.999$ , the relative permeability of Al is $\mu_r = 1.000$ , the conductivity of Cu is $\sigma = 6.0 \times 10^7 \ S/m$ and the conductivity of Al is $\sigma = 3.5 \times 10^7 \ S/m$ . Table 4.15 lists the skin depth at frequency, the physical size of the smallest dimension of the conducting path (t), the ratio of conductor width to thickness (w/t), and the resulting current crowding factor ( $K_C$ ), from [60]. Values of $K_C$ were found from graphs developed by [61,62] using the ratio of width to thickness. Table 4.15: Skin Depth Comparison and Current Crowding Factor | Frequency | Layer | δ | t | w/t | $K_C$ | |-----------------|------------|-----------|-----------|-----------------|-------| | (width) | (Material) | $(\mu m)$ | $(\mu m)$ | $(\mu m/\mu m)$ | | | 28.0 GHz | AP (Al) | 0.508 | 2.4 | 3.75 | 1.3 | | $(9.0 \ \mu m)$ | M12 (Cu) | 0.388 | 0.72 | 12.50 | 1.6 | | 14.0 GHz | AP (Al) | 0.719 | 2.4 | 3.38 | 1.3 | | $(8.1 \ \mu m)$ | M12 (Cu) | 0.549 | 0.72 | 11.25 | 1.6 | Comparing $\delta$ to t from Table 4.15, it was determined that there is a significant increase in HF resistance w.r.t. DC resistance in the AP conductor, much less increase in M12 at 28 GHz and very little increase in M12 at 14 GHz. The ratio $R_{HF}/R_{DC}$ was found using (4.25) from [60] and the results are listed in Table 4.16. $$R_{HF}/R_{DC} = \frac{K_C \cdot w \cdot t}{2(w+t)\delta} \tag{4.25}$$ Comparing $R_{DC}$ and $R_{HF}$ from Table 4.16 shows that at 28 GHz the AP resistance more than doubles, but the M12 resistance does not. At 14 GHz the AP resistance is almost double and the M12 resistances are similar. This leads to the somewhat counter-intuitive result that the series resistance of the AP and M12 paths at higher frequency are similar. The total series resistance of the parallel AP and M12 paths | Frequency | Layer | $R_{HF}/R_{DC}$ | $R_{DC}$ | $R_{HF}$ | $R_{S\_DC}$ | $R_{S\_HF}$ | |---------------|------------|-------------------|-------------|-------------|-------------|-------------| | (width) | (Material) | $(\Omega/\Omega)$ | $(m\Omega)$ | $(m\Omega)$ | $(m\Omega)$ | $(m\Omega)$ | | 28.0 GHz | AP (Al) | 2.42 | 347.6 | 842.0 | 224 - | | | $(9.0~\mu m)$ | M12 (Cu) | 1.37 | 695.2 | 954.4 | 231.7 | 447.4 | | 14.0 GHz | AP (Al) | 1.67 | 386.0 | 646.1 | 255 | 245.0 | | $(8.1~\mu m)$ | M12 (Cu) | 0.96 | 772.0 | 743.3 | 257.3 | 345.6 | Table 4.16: HF Resistance Including Skin Effect and Current Crowding was calculated for both the DC $(R_{S\_DC})$ and HF $(R_{S\_HF})$ resistances. The 28 GHz $R_{S\_DC}$ and $R_{S\_HF}$ values presented in Table 4.16 are significantly lower than the corresponding resistance values generated using 3-D EM simulation [58] and listed in Table 4.17, i.e., $R_{S\_DC} = 344 \ m\Omega$ and $R_{S\_HF} = 1026 \ m\Omega$ . This difference may be attributed to the fact that neither leakage nor proximity effect analysis was included in these results. Table 4.17: Inductor Field Solver Analysis at 28 GHz | Parameter | Simulated Result | |--------------------------------------|------------------| | Inductance (L) | 114.4 pH | | Series DC Resistance $(R_{s\_DC})$ | $344~m\Omega$ | | Series HF Resistance $(R_{s\_HF})$ | $1026~m\Omega$ | | Parallel HF Resistance $(R_{p\_HF})$ | $406~\Omega$ | | Calculated Inductor Q | 20 | The series parasitic resistance, $R_s$ , of an inductor can be used to determine the parallel equivalent resistance, $R_p$ , by setting the series and parallel Q equations equal to each other as in (4.26) to get (4.27). $$Q = \frac{\omega L}{R_s} = \frac{R_p}{\omega L} \quad \text{and}$$ (4.26) $$R_p = \frac{(\omega L)^2}{R_s} \tag{4.27}$$ Equations (4.26) and (4.27) were used to verify the HF resistance from simulation and determine the approximate Q value of 20 from the simulated inductance and series HF resistance. The HF resistance was used here as the DC resistance does not include all the losses that are present at frequency and degrade the Q value. Using StarRC<sup>TM</sup> from Synopsys [57], the DC resistance of the inductor was determined over extracted corners and is listed in Table 4.18. | Extracted Corner | DC Resistance $(m\Omega)$ | |---------------------------|---------------------------| | Typical 50°C | 237 | | Typical $125^{\circ}$ C | 298 | | rcworst C<br>cworst 125°C | 331 | | rcbest C<br>cbest 125°C | 262 | **Table 4.18:** Extracted DC Resistance by Corner The first two rows of Table 4.18 show that the temperature coefficient is approximately 0.3 %/°C and thus can be ignored. In additional testing the DC resistance varied by 43 % over the temperature range -20°C to +125°C, which is consistent with the previous conclusion. The last two rows show a $\pm 10$ % variation over best to worst case extracted corners. Therefore, these extremes need to be included in the overall corner analysis of the DCO. As discussed in section 4.2, the centre-frequency requirement of the DCO was changed from 28 GHz to 14 GHz. The most direct way of implementing this change was to increase the size of the inductor to $L \approx 500~pH$ from (4.23). This was achieved by adding a second turn to the existing inductor and flipping the centre tap connections to the bottom of the layout, using M11 (Cu). The layout of the 14 GHz inductor is shown in Figure 4.23. The centre tap splits into two paths to ensure a symmetrical connection to AVSS underneath Tank\_L and Tank\_R, which connect to the oscillator core transistor drains as describe previously. The inductor was implemented using two layers, AP (Al) and M12 (Cu), of 8.1 $\mu m$ width. Adding at second turn to form the new inductor made it possible to keep its area small, while having only a small impact on the inductor Q and maintaining the identical keep out area used for the 28-GHz layout. Figure 4.23: DCO Inductor Layout for 14 GHz Operation The initial physical size of the 14-GHz inductor was estimated to within 10 % of the required inductance value using (4.23) and then fine-tuned using the 3-D EM field solver EMX<sup>®</sup> [63], which had recently become available to the design team. A Q of 15 was determined using the EMX<sup>®</sup> field solver. This minimal degradation in Q from the 28-GHz design was attributed to the reduction in the skin-effect of the AP conductor at 14 GHz compensating somewhat for the added resistance of the new turn. The parameter values for the two-turn inductor are listed in Talbe 4.19. The increase in $R_{p,HF}$ over the previous design will allow the oscillator current to be reduced to develop the same $V_m$ amplitude across the tank at resonance. However, care must be taken to ensure that with a reduction in current the corresponding reduction in negative resistance of the core transistors does not make oscillator start-up marginal. The tank\_L and tank\_R signals run from the spiral inductor, across the pll\_dco\_gm | Parameter | Simulated Result | |--------------------------------------|------------------| | Inductance (L) | ≈ 500 pH | | Series HF Resistance $(R_{s\_HF})$ | $2.932~\Omega$ | | Parallel HF Resistance $(R_{p\_HF})$ | $659.8~\Omega$ | | Inductor Q | 15 | **Table 4.19:** Inductor Field Solver Analysis at 14 GHz gain block and frequency tuning array, and terminate in the $pll\_divider$ block using layer M12 conductors legs of width 9.0 $\mu m$ for the 28.0-GHz implementation and 8.1 $\mu m$ for the 14.0-GHz implementation. These conductors add parasitics to the LC-tank that were quantified in Appendix D. The conclusion here was that these inductive and resistive parasitics could be safely ignored. # 4.12 Summary In spite of its smaller area, the initially considered ring-oscillator DCO design was dropped in favour of an LC-tank harmonic oscillator, motivated by the higher Q of the latter design, expected to outperform the ring-oscillator by 20 dB [23]. Furthermore, the options for reducing PN supported by the class-C LC-tank architecture with PMOS core transistors made it an excellent choice for this first 7-nm FinFET design. I-MOS varactors implemented with PMOS transistors were evaluated and selected as most appropriate for the tuning array elements based on noise performance and binary operation. A coarse/fine (i.e., single fin) tuning array combination was implemented as matching between varactor elements of different sizes was considered feasible in this geometry. A synopses of the LC-tank harmonic oscillator noise sources was presented in Appendix A, as well as the derivation of a closed form equation to determine the array capacitance resolution required to tune a minimum frequency step in Approxima B. This equation was consistent with [6] and is accurate near the oscillator operating frequency. Conventional circuit simulation and co-simulation with EM field solvers is time consuming and resource intensive when applied to a DCO with a large frequency control word range. Therefore, it was decided to create system models of the DCO, using MATLAB®, that would run more quickly to accelerate the design process. A significant amount of work was devoted to understanding the performance and parasitics of each modelled circuit block of the DCO. Large signal analysis, analysis of post layout interconnect parasitics and EM simulation was utilized over PVT and extracted corners to ensure accuracy. The programmability and fast run time of these models made simple work of designing a frequency tuning array with adequate margin, as well as determining oscillator amplitude, tank loss, Q and current consumption ranges. The fidelity ( $\pm$ 0.9 %) of this modelling was demonstrated by comparing circuit simulated spot frequency results with those generated by the DCO models. This DCO was designed to operate at 14 GHz with a nominal amplitude of 1.0 Vp from a current source of approximately 10.0 mA across corners. It has a frequency resolution of 2.0 MHz, a tuning range of approximately 2.0 GHz and a Q of approximately 10. This Class-C design is proposed to yield better than 3.9 dB PN improvement over more elementary differential LC-tank oscillator implementations [5]. Further noise reduction through source degeneration is discussed in section 5.4. ## Chapter 5 ## DCO Current Source ### 5.1 Introduction The following list identifies the advancements made through the development of the DCO current source. - 1. Current source flicker noise reduction through resistive source degeneration, previously demonstrated for planar MOSFET devices [64–67], was demonstrated here using FinFET devices. This refers to the degeneration of transistor gm, which is gain for both input signal and input-referenced flicker noise voltage. This noise limiting function extended to the core transistors of the Class-C oscillator [5] as they formed a cascode circuit. Therefore, the current source transistors also limited the noise that can be produced by the oscillator core transistors. - 2. Resistive source degeneration also reduced both flicker and thermal noise by limiting the drain current. Although the source degenerating resistive element contributed thermal noise to the circuit, this noise level was too low to be of concern. - 3. A closed-form solution (5.19) that quantifies this noise reduction behaviour was derived and verified through simulation. - 4. PMOS transistor degeneration combined with the use of long channel devices reduced the current source flicker noise by approximately 7 dB in the 7-nm FinFET implementation discussed in this work. In addition to the measures taken to minimize the flicker and thermal noise of the DCO current source, this chapter discusses the Class-C oscillator current source modularity, implementation, control and calibration. This begins with a description of the current source design and its calibration. The objective of the calibration function is to guarantee that sufficient current is supplied to the oscillator to ensure is starts and operates over all PVT and extracted corners. Additionally, care must be taken to not supply too much current to the circuit as this will push the core transistors into breakdown. # 5.2 Constant Current Source Implementation Figure 5.1 is a block diagram of the DCO current source calibration circuit. The objective here is to ensure that a consistent current level is supplied to the oscillator in consideration of PVT variations. The current calibration algorithm is realized by a State Machine that is initiated after power up and may also be run during oscillator operation to compensate for die temperature changes. Figure 5.1: Oscillator Current Level Calibration Block Diagram The Degenerated Selectable PMOS Current Source block output current level is controlled by the State Machine through a Binary to Thermometer Decoder block that converts a nine bit binary bus to a 21-bit thermometer encoded bus. During calibration the output current is fed to the Clocked Comparator block, as well as a chip PAD that is terminated with an off-chip precision resistor. When signal $dco\_cs\_cal$ is enabled and signal $dco\_ca\_en$ is disabled, this block compares the voltage developed across the external resistor with an internally developed reference voltage. The value of the external resistor mimics the DCO tank parallel equivalent resistance at resonance. As the current source voltage exceeds the reference voltage, the $trigger\_out$ signal becomes active to indicate to the State Machine that the correct current setting has been found. At this point the $dco\_cs\_cal$ signal is disabled and the $dco\_ca\_en$ signal is enabled to switch the output current from the calibration loop to the Class-C PMOS Oscillator $ibias\_gen$ input. Figure 5.2 illustrates the parallel current sources of the Degenerated Selectable PMOS Current Source block of Figure 5.1. The large FinFETs, m22 and m23 (i.e., 20 Fins x 40 Fingers, with m = 2 and L = 86 nm), are switches that direct current to either the calibration loop or to the $-g_m$ transistor sources of the DCO. FinFETs m1 - m21 are configured as degenerated constant current sources that sum at the current\_sum node. In an effort to reduce flicker noise and increase source resistance all these devices have L = 160 nm. The least significant or smallest current source path is m21 (4 Fin x 4 Finger, with m = 1), which supplies 205 $\mu$ A. The next four current sources, m17 - m20 (4 Fin x 8 Fingers, with m = 1), each deliver 410 $\mu$ A, twice that of m21. There are 16 of the largest current sources, m1 - m16 (4 Fin x 8 Fingers with m = 4), each supplying 1.64 mA, eight times m21 and four times each m17 - m20 path. It should be noted that this current source was over designed in order to compensate for unforeseen issues with this new process. These devices are connected to the 1.5 VAHVDD supply through source degenerating resistors to limit the gm and thus, the flicker noise. This is discussed in the section 5.4. This higher than core supply voltage was used to ensure that m1 - m21 would remain in saturation. Therefore, voltage conversion circuits (i.e., $AVDD = 0.75 \ V$ to $AHVDD = 1.5 \ V$ ) were added to the 21 enable lines that determine the output current value. Additionally, care was taken to ensure that no transistor breakdown voltage (i.e., $Vds = Vgs = Vgd < 1.1 \ V$ ) was exceeded over PVT and extracted corners. The theory behind using source degeneration to reduce flicker noise in FinFET Figure 5.2: Degenerated Selectable PMOS Current Source Implementation current source transistors is twofold. First, by degenerating the gm of the current source transistor, the noise gain of the transistor is also reduced. Second, the current source transistor and the -gm transistors of the DCO core are stacked to form a cascode circuit. Therefore, the noise current of the circuit is limited by the degenerated transistor and the noise produced by the core devices will be significantly attenuated. This is demonstrated through theoretical analysis and simulation in section 5.4. ## 5.3 Thermal and Flicker Noise in FinFET Devices The physics of noise in FinFET devices is similar to the physics of noise in planar MOSFET devices. Thus the BSIM4 compact model expressions [68] developed for planar MOSFETs have been adopted, with some modifications (e.g., mobility parameter values), for FinFETs [69]. Sources of noise that may prove to be significant in applications of 7-nm FinFET devices are as follows: 1. Channel Noise - this thermal noise (also Johnston or Nyquist noise) is a function of temperature and conductance (resistance) of the channel, caused by electrons and holes moving randomly at their terminal velocity [69]. It is quantified as a noise current by equation (5.1) and illustrated in Figure 5.3. $$I_{nch}^2 = 4kT\gamma g_m (5.1)$$ where k is the Boltzmann constant (i.e., $1.38064852 \times 10^{-23} \ JK^{-1}$ ), T is absolute temperature in ${}^{\circ}K$ , $\gamma$ is a transistor coefficient (i.e., short channel devices $\gamma = 1.0$ ), gm is the transconductance of the transistor in A/V. Figure 5.3: Channel Noise Circuit This noise may also be presented as a squared noise voltage by multiplying $I_n^2$ by the intrinsic resistance squared, $r_o^2$ . It should be noted that $r_o^2$ is a product of the transistor characteristics and does not represent a physical element. Therefore, while a noise voltage may be expressed across it, it does not produce any intrinsic noise. FinFET devices tend to have a lower output conductance when compared to planar devices of similar geometry. Therefore, FinFET $r_o^2$ values and the resulting voltage gains tend to be larger [70]. - 2. Gate Resistance is thermal noise due to the sheet resistance and geometry of the channel material. - 3. Source and Drain Interconnect Resistance Noise is not normally considered for larger geometry transistors. However, at more aggressive geometries the thermal noise components of $V_{nRD}^2$ and $V_{nRS}^2$ , incurred in device interconnect, may become more significant see Fig 5.4. These parasitic components are external to the intrinsic transistor device and may be lumped together with external drain and source resistances. Therefore, in addition to being random noise sources, they are also parasitic to the analogue Figure 5.4: Source and Drain Interconnect Noise Circuit signal gain. Carrying this point further, $R_S$ will degenerate the transconductance, gm, which in turn reduces the signal and noise conversion from the gate to the channel [14,70]. Therefore, in some circumstances this $R_S$ may reduce effective circuit noise. A noise or signal voltage at the intrinsic transistor drain will see a voltage divider across the parasitic and any intentional external drain resistance. Generally, this combination exists in parallel with $r_o^2$ , which will further modify the output noise. 4. Gate Interconnect Resistance Noise - is thermal noise caused by the gate resistance and is commonly described as an input-referred noise voltage shown in Figure 5.5. In order to achieve a desired current output, a multi-element transistor, with accompanying parasitic gate interconnect resistances, will often be necessary. When the transistor elements are structured uniformly it is possible to model the device as a single element consisting of $R_{Gtot}$ with thermal noise $V_{nGtot}^2$ . If the number of transistor elements is $n, n \geq 32$ and $R_{G1} = R_{G2} = \cdots R_{Gn} = R_G/n$ , it can be shown that the total distributed gate resistance is $R_{Gtot} = R_G/3$ [71]. As with flicker noise, a noise power, $V_n^2$ , at the transistor gate will generate a noise power, $I_n^2$ , in the transistor channel through the square of the transistor transconductance, $g_m^2$ , assuming that $g_{m1} = g_{m2} = \cdots = g_{mn} = g_m/n$ . Therefore, the following equation for channel noise due to gate resistance was developed (5.2). Figure 5.5: Gate Interconnect Noise Circuit $$I_{nG}^2 = (4kTR_{Gtot}/3)g_m^2 (5.2)$$ As with the channel noise shown in Figure 5.3, gate thermal noise may also be referenced to the transistor output using equation (5.3) below. $$V_{nG}^{2} = (4kTR_{Gtot}/3)(g_{m}r_{o})^{2}$$ (5.3) Normally, with planar transistors of larger geometries this $R_{Gtot}$ was small relative to the channel resistance, $1/g_m$ , and could be ignored. However, the fine interconnect pitch used in aggressive geometries like 7 nm results in an $R_{Gtot}$ of the same order as $1/g_m$ and thus cannot be ignored [72]. In [73] it was demonstrated that this parasitic significantly increased the output delay of Ring Oscillators, thus reducing their operating frequency. Additionally, FinFET design must also be considered as gate resistance increases with the number of fins and decreases as the number of fingers increases. This gate resistance must also be considered when FinFET transistors are used as varactors. That is, the varactor Q will be affected by not only channel losses, but also gate losses. This is discussed in section 4.6. Although gate resistance noise was not characterized separately during the design of this current source, this noise should be considered for future optimization. That is, the modularity of the current source implementation may be traded-off against possible lower thermal noise. 5. Flicker Noise - is also referred to as 1/f or pink noise, and is characterized as low frequency random noise that diminishes in amplitude as noise frequency increases. It is explained by two classical theories or physical models - the McWorther model and the Hooge model. In the McWarther model noise is created by carrier number fluctuation in the channel caused by trapping and releasing of surface current carriers at the interface between the gate oxide (SiO<sub>2</sub>) and silicon substrate (Si). In contrast, Hooge proposes that this noise is caused by bulk mobility fluctuations due to carrier scattering, which in turn modulates the drain current [69,74]. The simplified equation (5.4) models flicker noise as a voltage in series with the gate and is valid when the transistor is operating in saturation, see Figure 5.6. $$V_{n1/f}^2 = \frac{K}{WLC_{ox}f} \tag{5.4}$$ Here $V_{n1/f}^2$ is the flicker noise voltage squared, K is a process-dependent constant with units of $V^2F$ , W is gate width, L is gate length, $C_{ox}$ is gate capacitance per unit area and f is noise frequency in Hz. **Figure 5.6:** Flicker (1/f) Noise Circuit The input referred noise voltage, $V_{n1/f}^2$ , may be transformed into a flicker noise current, $I_{n1/f}^2$ , in parallel with the drain current, $I_D$ , by multiplying it by the square of the transistor transconductance, gm, as shown in Figure 5.6. This expands into equation (5.5) shown below. $$I_{n1/f}^2 = \frac{K}{WLC_{ox}f}(gm)^2 (5.5)$$ A unified model that accounts for both McWorther and Hooge processes is included in BSIM4, which is the compact model (i.e., library of process specific component parameters) widely used in SPICE-based simulators like Cadence<sup>®</sup> Spectre [68]. In BSIM4 either a simplified model, based on McWorther (5.6) and useful for hand calculations, or the unified model, enhanced to be continuous over all bias regions, can be selected. The unified model is discussed in [69]. $$S_{id}(f) = \frac{KF \cdot I_{ds}^{AF}}{C_{oxe} \cdot L_{eff}^2 \cdot f^{EF}}$$ $$(5.6)$$ In equation (5.6) KF is the flicker noise coefficient, $I_{ds}$ is the drain current, AF is the flicker noise exponent (normally equal to 1.0), $C_{oxe}$ is effective gate capacitance per unit area, $L_{eff}$ is effective gate length, f is noise frequency and EF is the flicker noise frequency exponent (normally equal to 1.0) [68]. $S_{id}(f)$ is the noise power spectral density, which is the noise current through 1 $\Omega$ integrated over a frequency step of $\Delta f = 1$ Hz. That is, considering only flicker noise current $S_{id}(f) = I_{n1/f}^2 \cdot (1 \Omega)/(1 Hz)$ . In the physical sense, flicker noise can be described as a measure of the quality or homogeneity of the conducting material. Also, the larger the volume of conducting material (i.e., the channel) the lower the flicker noise. Considering equation (5.4), the larger the device area (WL - Width x Length), the lower the flicker noise [75]. Furthermore, flicker noise levels are similar for planar and FinFET transistors with similar gate stacks [14, 70]. Additionally, [74] reports that the flicker noise can be almost 10 times worse for thin gate core devices than for similar thick gate IO devices of the same technology node. This seems to contradict equation (5.4) since $C_{ox} = \varepsilon_{ox}/t_{ox}$ indicates that as $t_{ox}$ decreases flicker noise should also decrease. However, in this case as $t_{ox}$ decreases the volume defect density near the surface of the channel increases by approximately 10 times. This implies an almost linear relationship between volume defect density and flicker noise. It is generally accepted that PMOS transistors produce approximately $1/10^{th}$ the flicker noise of NMOS transistors, all else being equal. The reason for this difference is explained by a number of theories. First, an $n^+$ polysilicon gate layer is used with both types of transistors resulting in an NMOS surface channel and a PMOS buried channel. Second, the differences in tunnelling coefficients due to effective masses and barrier heights of holes and electrons [76]. The discussion of these theories is beyond the scope of this thesis and at the time of this DCO design the magnitude of the 7-nm FinFET flicker noise was unknown. Therefore, PMOS transistors were chosen for the DCO currents source and -gm pair based on the assumption that flicker noise would be a significant problem. Shot noise caused by gate leakage through the drain to source inverse diodes (modeled in BSIM4) [69, 75], burst (Popcorn) noise and bulk recombination noise caused by coupling channel noise onto the gate are not significant in this 7-nm process; therefore, were not considered. # 5.4 Noise Reduction from Source Degeneration Using the classical equation, $gm = \sqrt{2\mu C_{ox}(W/L)I_{ds}}$ [77], for a transistor in saturation, we can show an equivalence between equations (5.5) and (5.6). That is, solving for $I_{ds}$ we get the following: $$I_{ds} = \frac{gm^2L}{2\mu C_{ox}W} \tag{5.7}$$ substituting $I_{ds}$ into equation (5.6) we get: $$S_{id}(f) = \left(\frac{KF}{C_{ox} \cdot L^2 \cdot f^1}\right) \left(\frac{gm^2L}{2\mu C_{ox}W}\right)$$ (5.8) $$S_{id}(f) = \frac{KF \cdot gm^2}{2\mu C_{ox}^2 \cdot LW \cdot f}$$ (5.9) If we make $K \approx KF/2\mu C_{ox}$ , then: $$S_{id}(f) \approx \frac{K \cdot gm^2}{C_{or} \cdot LW \cdot f}$$ (5.10) It should be noted that the constant KF, used in equation (5.9), differs in value and units from the constant K used in equation (5.10). References [14,70,74] advocate that flicker noise level becomes a significant challenge when designing analogue circuits with aggressive geometry transistors such as 7-nm FinFETs. This is critical as the Class-C oscillator will up-convert 1/f transistor flicker noise to $1/f^3$ flicker noise that degrades oscillator phase noise performance close to the frequency of oscillation. Over and above the selection of PMOS FinFET devices for current sources, transistor source degeneration was employed to reduce flicker noise production. The cascode transistor circuit of Figure 5.7 and the classical equation (5.11) for source degeneration were used to arrive at a reduced transistor transconductance, gm', [14]. $$gm' = \frac{gm}{1 + R_S \cdot gm} \tag{5.11}$$ With $R_S = 1/gm$ , (5.11) becomes (5.12). $$gm' = \frac{gm}{2} \tag{5.12}$$ and $$(gm')^2 = \frac{gm^2}{4} \tag{5.13}$$ Substituting equation (5.13) into (5.10) and taking $10 \cdot \log_{10}$ of both sides yields (5.14). $$10 \cdot \log_{10} \left( S_{id}(f) \right) \approx 10 \cdot \log_{10} \left( \frac{K \cdot gm^2}{C_{ox} \cdot LW \cdot f} \cdot \frac{1}{4} \right)$$ (5.14) or stated another way (5.15). $$10 \cdot \log_{10} \left( S_{id}(f) \right) \approx 10 \cdot \log_{10} \left( \frac{K \cdot gm^2}{C_{ox} \cdot LW \cdot f} \right) - 6 \ dB \tag{5.15}$$ Therefore, by inserting a source degeneration resistor equal to the inverse of the transistor transconductance and maintaining the transistors in saturation, the total noise can be reduced by 6 dB. Unfortunately, due to logistical issues it was not possible to simulate flicker noise reduction through source degeneration using 7-nm FinFET technology. In lieu of this, simulations were carried out using planar 65-nm technology to demonstrate the concept. The circuit of Figure 5.7 was used to test channel and flicker noise reduction. Rs is the source degeneration resistor and M\_Source represents the PMOS current source transistor. M\_Cascode represents the -gm transistor of the DCO and Rp is the LC-tank parallel resistance at resonance. Both transistors were biased in saturation. Initially, two noise analyses were carried out: one with Rs = 0 $\Omega$ , and one with Figure 5.7: Channel Noise Degeneration Test Circuit $Rs = 420 \ \Omega.$ Figure 5.8 shows the total noise power spectral density at the drain of M\_Cascode. It should be noted that the thermal noise of Rs, although not significant, was turned on, and the thermal noise of Rp was turned off for this experiment. Also, this noise was identical to the total noise at the drain of M\_Source as the noise contributed by M\_Cascode was approximately 20 dB below that of M\_Source. These results showed both a significant improvement in the flicker noise $10 \ dB/decade$ slope, as well as the thermal noise above the flicker noise corner, $\approx 2 \ \text{MHz}$ . The predicted noise improvement was 6 dB as $R_S = 420~\Omega$ for the degenerated M\_Source transistor, equal to 1/gm of the non-degenerated M\_Source transistor. Table 5.1 lists the total noise improvement for the degenerated case at spot frequencies from 10 kHz to 100 MHz, including the approximate noise corner frequency. **Table 5.1:** Current Source Degeneration Results for $R_S = 420 \ \Omega$ | Frequency (MHz) | 0.001 | 0.01 | 0.1 | 1.0 | 2.0 | 10 | 100 | 1000 | |---------------------|-------|------|-----|-----|-----|-----|-----|------| | $\Delta$ Noise (dB) | 9.5 | 9.4 | 9.4 | 8.6 | 8.1 | 7.0 | 6.6 | 6.6 | The most conspicuous conclusion drawn from Table 5.1 was that for frequencies below the flicker noise corner the noise improvement was approximately 9 dB as Figure 5.8: M\_Source 1/f and Thermal Phase Noise Before and After Degeneration opposed to the predicted 6 dB. Additionally, the flicker noise descends below the channel thermal noise and gradually became insignificant as frequency increased above the flicker noise corner. We can see from equation (5.1) that source degeneration improves the channel thermal noise by 3 dB as it was proportional to gm' rather than $(gm')^2$ . Therefore, it was fair to conclude that the simulated power spectral densities of both flicker and thermal noise were improved by 3 dB more than expected. Although gm was affected by drain current, gm does not fully account for the effect of drain current on the total noise of the MOSFET. In fact, from equation (5.6) we saw that the flicker noise power spectral density was proportion to $I_{ds}$ . Additionally, by substituting $gm = 2I_{ds}/V_{eff}$ [77] into equation (5.1) we saw that the noise power spectral density due to thermal noise was also proportional to $I_{ds}$ , see equation (5.16) below. $$S_{id} = 4kT\gamma \left(\frac{2I_{ds}}{V_{eff}}\right) \tag{5.16}$$ where $V_{eff} = V_{gs} - V_{th}$ . Therefore, over and above the noise reduction due to $R_S$ degenerating gm, there was additional noise reduction due to reduced $I_{ds}$ . This was verified by first expanding the previous experiment to include incremental values of $R_S$ and then calculating the noise reduction associated with the corresponding values of $I_{ds}$ . The results of these simulations were illustrated by Figure 5.9. Here, as $R_S$ increased, the noise decreased. Also, the noise reduction decreased as the value of $R_S$ approached 1/gm. This should be considered when trading off noise reduction versus the required supply current level. Figure 5.9: Total M\_Source Noise for Incremental Values of $R_S$ Table 5.2 showed the reduction in $I_{ds}$ as the value of $R_S$ was increased. Here, $I_{ds}$ with $R_S = 420 \Omega$ was approximately one half the value it was when $R_S = 0 \Omega$ , which yielded the missing 3-dB noise reduction previously discussed. Equation (5.15), which accounted for flicker noise, was rewritten (5.17) to include noise reduction due to the reduction in $I_{ds}$ . | $R_S(\Omega)$ | 0 | 105 | 210 | 315 | 420 | |------------------------------------------------------|-------|-------|-------|-------|-------| | $I_{ds} (\mu A)$ | 403.3 | 318.8 | 266.5 | 230.5 | 204.0 | | $I_{ds}/I_o$ | 1.000 | 0.790 | 0.661 | 0.572 | 0.506 | | $10 \cdot \log_{10} \left( I_{ds}/I_o \right) (dB)$ | 0 | -1.03 | -1.80 | -2.43 | -2.96 | **Table 5.2:** Noise Reduction due to Reduced $I_{ds}$ for Values of $R_S$ $$S_{id}(f) \approx 10 \cdot \log_{10} \left( \frac{K \cdot gm^2}{C_{ox} \cdot LW \cdot f} \right) R_O^2 + 20 \cdot \log_{10} \left( \frac{1}{1 + R_S \cdot gm} \right) + 10 \cdot \log_{10} \left( \frac{I_{ds}}{I_{dso}} \right)$$ $$(5.17)$$ where $I_{dso}$ is the drain current and gm is the MOSFET transconductance when $R_S = 0 \Omega$ , $I_{ds}$ is the drain current when $R_S > 0 \Omega$ , and $R_O$ is the load impedance seen by $I_{ds}$ . Similarly, an equation for thermal noise including source degeneration was written (5.18). $$S_{id}(dB) \approx 10 \cdot \log_{10} (4kT\gamma gm) R_O^2 + 10 \cdot \log_{10} \left(\frac{1}{1 + R_S \cdot gm}\right) + 10 \cdot \log_{10} \left(\frac{I_{ds}}{I_{dso}}\right)$$ (5.18) Equations (5.17) and (5.18) were combined to produce equation (5.19). $$S_{id}(f) (dB) \approx 10 \cdot \log_{10} \left\{ \left( \frac{N_{FV}}{(1 + R_S \cdot gm)^2} + \frac{N_{TV}}{(1 + R_S \cdot gm)} \right) \left( \frac{I_{ds}}{I_{dso}} \right) \right\}$$ (5.19) where $N_{FV}$ is the linear form of the first flicker noise voltage term in equation (5.17) and $N_{TV}$ is the linear form of the first thermal noise voltage term in equation (5.18). Table 5.3 shows that the calculated noise reduction due to source degeneration in the 1/f noise and thermal noise regions. Table 5.4 lists the calculated noise reduction due to source degeneration and current reduction present in both the flicker noise and thermal noise regions. These results show that including the noise reduction due to drain current reduction in | $R_S(\Omega)$ | 0 | 105 | 210 | 315 | 420 | |-----------------------------------------------------------------------|---|--------|-------|-------|-------| | Flicker: $20 \cdot log_{10} \left[ 1/(1 + R_S \cdot gm) \right] (dB)$ | 0 | -1.93 | -3.51 | -4.85 | -6.01 | | Thermal: $10 \cdot log_{10} [1/(1 + R_S \cdot gm)] (dB)$ | 0 | -0.966 | -1.76 | -2.42 | -3.00 | **Table 5.3:** Noise Reduction Factors due to Degeneration for Values of $R_S$ these calculations yields consistence results over source degeneration resistance and frequency. **Table 5.4:** Calculated Flicker and Thermal Noise Reduction for Values of $R_S$ | $R_S(\Omega)$ | 0 | 105 | 210 | 315 | 420 | |---------------------------------------|---|-------|-------|-------|-------| | Flicker Region Noise Reduction $(dB)$ | 0 | -2.95 | -5.31 | -7.28 | -8.97 | | Thermal Region Noise Reduction $(dB)$ | 0 | -1.99 | -3.56 | -4.85 | -5.96 | Table 5.5 compares the simulated noise reduction over frequency to the calculated noise reduction including both source degeneration of the transconductance and the corresponding reduction in drain current. The Table 5.5 error values (i.e., simulation value - calculated value) range from 0.30 dB to 0.64 dB, with calculated always less than simulated. This difference was attributed to the use of the simplified model (5.6) for hand calculation as opposed to the more comprehensive *unified model* used in simulation. This error was relatively consistent across frequency and increased slightly with source resistance value. # 5.5 Summary This chapter described the Class-C DCO constant current source modularity, operation and calibration to compensate for PVT and extracted circuit variations. Current source flicker noise reduction through resistive source degeneration, previously demonstrated for planar MOSFET devices [64–67], was demonstrated through theoretical analysis and simulation. This degeneration had two additional effects: first, thermal noise was also reduced by approximately one half that of flicker noise and second, | | Frequency (MHz) | 0.001 | 0.01 | 0.1 | 1.0 | 2.0 | 10 | 100 | 1000 | |---------------|------------------------------------|-------|------|------|------|------|------|------|------| | $R_S =$ | $\Delta N \operatorname{Sim} (dB)$ | 3.40 | 3.30 | 3.30 | 3.10 | 2.96 | 2.50 | 2.40 | 2.40 | | $105 \Omega)$ | $\Delta N$ Cal (dB) | 2.95 | 2.95 | 2.93 | 2.71 | 2.55 | 2.17 | 2.10 | 1.99 | | | Error (dB) | 0.45 | 0.35 | 0.37 | 0.39 | 0.41 | 0.33 | 0.30 | 0.41 | | $R_S =$ | $\Delta N \operatorname{Sim} (dB)$ | 5.80 | 5.80 | 5.80 | 5.40 | 5.11 | 4.40 | 4.10 | 4.10 | | $210 \Omega)$ | $\Delta N$ Cal (dB) | 5.31 | 5.31 | 5.26 | 4.84 | 4.55 | 3.87 | 3.59 | 3.56 | | | Error (dB) | 0.49 | 0.49 | 0.54 | 0.56 | 0.56 | 0.53 | 0.51 | 0.54 | | $R_S =$ | $\Delta N \operatorname{Sim} (dB)$ | 7.80 | 7.80 | 7.70 | 7.20 | 6.77 | 5.80 | 5.50 | 5.50 | | $315 \Omega)$ | $\Delta N$ Cal (dB) | 7.28 | 7.27 | 7.20 | 6.58 | 6.18 | 5.26 | 4.89 | 4.86 | | | Error (dB) | 0.52 | 0.53 | 0.50 | 0.62 | 0.59 | 0.54 | 0.61 | 0.64 | | $R_S =$ | $\Delta N \operatorname{Sim} (dB)$ | 9.50 | 9.40 | 9.40 | 8.60 | 8.12 | 7.00 | 6.60 | 6.60 | | $420 \Omega)$ | $\Delta N$ Cal (dB) | 8.97 | 8.96 | 8.87 | 8.06 | 7.55 | 6.44 | 6.01 | 5.97 | | | Error (dB) | 0.53 | 0.44 | 0.53 | 0.54 | 0.57 | 0.56 | 0.59 | 0.63 | Table 5.5: Total Noise Reduction - Simulated vs. Calculated both both flicker and thermal noise were further reduced by limiting the drain current. Although the source degenerating resistive element contributed thermal noise to the circuit, this noise level was too low to be of concern. A closed-form solution (5.19) for source degeneration noise reduction was derived and verified through simulation. A summary of parasitic sources of noise and loss in aggressive geometry transistors was discussed. Many of these parasitics can be safely ignored in larger planar geometry transistors, but must be considered when FinFET devices are used. This is not due to FinFET implementation, rather it is due to shrinking geometry in two areas: gate oxide thickness and base layer interconnect resistance. In specific applications of analogue and high frequency circuits these parasitics may become significant and must be considered. It was found that the PMOS transistor degeneration combined with the use of long channel devices reduced the current source flicker noise by approximately 7 dB in the actual 7 nm FinFET design. ## Chapter 6 # BBPLL Time-Based Simulation and Measurement ## 6.1 Introduction The following list identifies the advancements made through the development of the time-based or event-driven simulation model of the BBPLL. - The use of time-based simulation, with its significant reduction in run-time over traditional circuit simulation, was used to simulate a complete digital model of the BBPLL. Results were repeatable and accuracy was consistent with circuit simulation. - 2. A method of accurately mimicking phase noise profiles, with compound slopes, using mathematical curve fitting based on random number generation was demonstrated. The results of these frequency domain equations, random phase noise profiles, were transformed into time-domain jitter vectors for use in the event-driven simulation. This was a critical contribution to the BBPLL digital model as BPD linerization and overall performance were highly dependent on accurate reference and feedback clock phase noise. - 3. Run-time was further reduced (by five times in this example) by creating these jitter vectors prior to simulation rather than during simulation. This also increased the numerical accuracy of the jitter vector time stamps as their value was not truncated during computation. Even with the final time-stamp value quantized to the simulator accuracy of the 1 fs, the jitter value error was less than 1 % for a 1 ms simulation run-time. The majority of the BBPLL circuit operates in the digital domain and was implemented in RTL (Register Transfer Language) code. The exception was the DCO, realized as an analogue LC-tank oscillator with digital (quantized) frequency tuning. Therefore, while the underlying function of the DCO is analogue, the level of abstraction at which the DCO is controlled is digital. This made it possible to create a digital functional model of the DCO from the equations discussed in sections 4.8 and 4.9. Fine tuning adjustments, required to correct the frequency step size across the array tuning range, were included in the model to improve its accuracy. This simple behavioural representation of the DCO FCW-to- $f_0$ tuning characteristic was the last piece required to implement a full digital model of the BBPLL. The objective here was to use even-driven (i.e., time-based) simulation to determine locking behaviour and phase noise characteristics. This eliminated the need for SPICE-based and mixed-mode simulators that require orders of magnitude more computation time. ### 6.2 The Time-Based Model Such a simplistic, in this case Verilog, simulation model cannot be used alone to predict the BBPLL locking behaviour where the DCO and XO (i.e., external crystal oscillator) random processes determine loop operation and output jitter. In response to this limitation, event-driven simulators with support for analogue and real data type modelling have been used to simulate phase locked loops in [78–80]. However, the work described in this chapter used a different approach to model both quantization and random jitter that does not rely on real number modelling. Additionally, the loss of accuracy due to the inherent 1 fs resolution limit of the IEEE-compliant Verilog simulator [81] was greatly reduced. The method discussed in the remaining sections of this chapter facilitated accurate and rapid evaluation of the BBPLL architecture over a wide range of filter settings. ## 6.3 DCO Noise Model Generation The clock edge occurrences of a free-running DCO, including jitter, can be represented by vector t[k] (6.1). $$t[k] = \frac{k}{f_0} + t_j[k] \tag{6.1}$$ where k = 1, 2 ... N and N is equal to the total number of clock edges. The random number vector $t_j[k]$ , representing the DCO random output jitter, must be of zero mean to ensure that the output frequency is equal to $f_0$ , and match the complex DCO phase noise profile $\mathcal{L}(f)$ with distinct flat, -20 and -30 dB/decade regions. This is achieved efficiently by partitioning the random jitter vector into the sum of three uncorrelated jitter vectors as shown in (6.2). $$t_{j}[k] = t_{j-flat}[k] + t_{j-20dB/dec}[k] + t_{j-30dB/dec}[k]$$ (6.2) The jitter vector $t_{j\_flat}[k]$ was obtained using a Gaussian pseudo-random number generator with variance $\sigma_{j\_flat}$ (6.3) as described in [82]. $$\sigma_{j-flat} = \frac{1}{2\pi} \sqrt{\frac{10^{(\mathcal{L}_0/10)}}{f_0}} \tag{6.3}$$ The -20 dB/decade slope of the jitter vector $t_{j-20dB/dec}[k]$ was generated through the integration of a vector obtained using an uncorrelated Gaussian pseudo-random number generator with variance $\sigma_{-20dB/dec}$ (6.4) as described in [82]. $$\sigma_{j-flat} = \frac{f_1}{f_0} \sqrt{\frac{10^{(\mathcal{L}_0/10)}}{f_0}} \tag{6.4}$$ Flicker noise increases as transistor geometry decreases (see section 5.4). Therefore, accurate modelling of the -30 dB/decade phase noise slope remains important for a DCO implemented using 7-nm FinFET, despite circuit techniques that greatly reduce flicker noise. Various techniques have been employed to solve this most demanding computational problem, ranging from large coefficient number FIR/IIR filter combinations [83] to white noise filtered by multiple first order low-pass sections [78,80,84] to the stochastic Voss-McCartney algorithm [85]. This work split the -30 dB/decade slope into a -10 dB/decade slope, approximated by eight first order low-pass sections with logarithmically spaced cut-off frequencies between 1.0 kHz and 1.0 GHz, and a -20 dB/decade slope, represented by an integrator, analogous to the computation of $t_{j-20dB/dec}[k]$ . The jitter vector $t_{j-30dB/dec}[k]$ was finally generated by filtering of a Gaussian pseudo-random number generator with variance $\sigma_{j-30dB/dec}$ (6.5) as described in [82]. $$\sigma_{j-30dB/dec} = \sqrt{\frac{10^{(\mathcal{L}_2/10)} \cdot f_2^3}{\gamma \cdot f_{min} \cdot f_0^3}}$$ (6.5) Equations (6.1) to (6.5) were incorporated in a behavioural DCO model evaluated using an event-driven digital simulator. Here, the jitter vector (6.1) was computed during transient simulation and the occurrence or time-stamp of each DCO output clock transition, t[k], was stored in a file. Post simulation this data was converted into time-domain phase error (rad) and the phase noise Power Spectral Density (PSD) (rad<sup>2</sup>/Hz) was calculated using a Fast Fourier Transform (FFT). The phase noise PSD was subsequently converted into carrier referenced single-sideband phase noise (dBc/Hz). Figure 6.1 shows that the simulated DCO phase noise profile was a sufficient representation of the measured phase noise profile, thereby validating the applicability and accuracy of the proposed time-domain model. Figure 6.1: DCO Phase Noise - Simulated vs. Measured The DCO model parameters are listed in Table 6.1. Table 6.1: DCO Parameter Values | DCO Parameter | Parameter Value | |----------------------------|-------------------------------| | $\overline{\mathcal{L}_0}$ | -143 dBc/Hz | | $\mathcal{L}_1$ | $\text{-}105~\mathrm{dBc/Hz}$ | | $f_1$ | $1.0~\mathrm{MHz}$ | | $\mathcal{L}_2$ | $-75~\mathrm{dBc/Hz}$ | | $f_2$ | $100~\mathrm{kHz}$ | | $f_{min}$ | $1.0~\mathrm{kHz}$ | | $\gamma$ | 1.426 | While this approach offered excellent simulation run-times (e.g., 127 s with an Intel 2.6-GHz Xeon CPU for a 1-ms transient simulation of the 14-GHz DCO output clock), a further 5x simulation speed improvement was gained by generating the jitter vector $t_j[k]$ prior to the event-driven transient simulation. Any programming language supporting floating point numbers can be used to compute $t_j[k]$ . The floating-point vector was subsequently converted to real numbers with 1-fs resolution and saved in a look-up table. Jitter vector values were loaded by the Verilog simulator during execution of the RTL code and can be reused on multiple simulation runs. In addition to reducing simulation time, this approach improved accuracy as the Verilog simulator imposed 1-fs quantization errors no longer accumulate. Figure 6.2 shows how the absolute clock edge position error accumulated as the simulation time progressed when the $t_j[k]$ time-stamp vector was computed during a transient simulation. When the time-stamp vector was generated prior to transient simulation, this error was limited to 1 fs, independent of transient simulation length. However, it should be noted that this approach assumes a constant DCO frequency, which was not true. That is, when the DCO was phase locked with optimum filter settings, the FCW dithers between two or three contiguous codes, changing the instantaneous DCO frequency by approximately 2.0 MHz for each incremental code change. Table 6.2 shows that period, cycle-to-cycle, and absolute jitter errors (pp-peak-to-peak and RMS - Root Mean Square) were less than 1 % for a 1-ms length Figure 6.2: DCO Edge Position Error with 1-fs Resolution Jitter Calculations transient simulation. Therefore, it was concluded that avoiding the 1-fs simulator accuracy limit allowed the overall simulation accuracy to be improved in the free-running DCO, as well as in the locked BBPLL. However, it should be noted that larger errors can occur when the BBPLL is locked with non-optimum filter settings. **Table 6.2:** DCO Parameter Values | Jitter Type | | 14.000 GHz (fs) | 14.002 GHz (fs) | |----------------|-----|-----------------|-----------------| | | pp | 152 | 153 | | Period | RMS | 13.5974 | 13.5968 | | | pp | 210 | 211 | | Cycle-to-Cycle | RMS | 19.2344 | 19.2338 | | | pp | 89411 | 89391 | | Absolute | RMS | 20934.4086 | 20929.6529 | | | | | | # 6.4 Crystal Oscillator Accurate modelling of XO noise is critical to predicting the DPLL in-band phase noise. However, the phase noise profile of commercial low-cost XO modules often does not follow the classical DCO or VCO phase noise profile, e.g., -30 then -20 dB/decade as illustrated in Figure 6.1. The 350-MHz XO used in this work [1] shows a complex phase noise profile that can be modelled with two composite jitter vectors: one flat and the other with a -7 dB/decade slope. This provided relatively accurate phase noise modelling at offset frequencies as low as 1.0 kHz, which was sufficient. The jitter vector representing the flat phase noise was computed using (6.3), similar to the DCO model generation. The -7 dB/decade slope phase noise was generated by multiple first-order digital filter sections with appropriately selected cut-off frequencies [82]. Both jitter vectors were generated prior to simulation to reduce simulation time and enhance accuracy, as discussed previously. Unlike the DCO, this process does not introduce numerical error as the reference frequency was constant during simulation. # 6.5 PLL Closed-Loop Noise Prediction A Verilog BBPLL model was created using the DLF and $\Sigma\Delta$ -modulator RTL code, gate-level representations of the remaining custom logic, the behavioural DCO model, and the accompanying time-stamp vector files that describe the OX and DCO jitter. An accurate DCO tuning characteristic and phase noise profile, required to produce a precise behavioural model, were generated using circuit simulation and layout parasitic extractor software. XO phase noise was derived to accurately mimic the phase noise presented in the manufacturer's datasheet as described in section 6.4. The accuracy of the BBPLL model was evaluated using Verilog simulation to phase lock the BBPLL output clock to the reference XO. Once locked, the simulated DCO output was used to generate the phase noise profile and compute integrated RMS jitter. Additionally, trajectory data, plots of normalized DLF integrator value (y-axis) vs. phase error (x-axis) at clock iterations, was recorded to create state-space diagrams of the BBPLL locked behaviour. These visualizations present easily recognizable patterns that can be used as finger-prints of the operating regime of the BBPLL locked state (i.e., random noise, optimum, and limit cycle) [24]. Simulated and measured phase noise plots were recorded for the random-noise and limit-cycle regimes, as well as the minimum phase noise or optimum locked condition. These states were selected using the filter gain settings listed in Table 6.2. This table compares simulated and measured RMS jitter (integrated from 1 kHz to 100 MHz) across DLF integral $(A_I)$ and proportional $(A_P)$ gain settings. Figure 6.3 demonstrates low loop gain, a small phase margin setting that resulted in phase noise peaking and a distinct random-noise regime trajectory. Figure 6.4 was the optimum configuration with no jitter peaking or significant spurious tones. As expected, this trajectory is narrow, indicating very low jitter. Figure 6.5 illustrates excessive loop gain causing limit-cycle trajectories. These three cases demonstrate that measured and simulated phase noise plots align well. Here, BBPLL jitter behaviour was reliably predicted with an error of less than 5 % for the optimum setting, a noteworthy result given that this is achieved using digital simulations. The model fidelity and process repeatability was verified by comparing the simulated and measured RMS jitter results across several thousand filter settings, all accommodated without a large computing infrastructure. These settings ( $A_I = 1...25$ and $A_P = 7...300$ ) yielded simulated RMS jitter accuracy of $\pm 20$ % with the largest errors occurring at settings exhibiting marginal stability. Errors increased slightly when larger filter gain settings were applied, i.e., $A_P > 300$ , as shown in Table 6.3 for the limit-cycle case. Limit cycles caused instantaneous frequency deviations from the average frequency $f_0$ , which degraded the accuracy of the computed jitter vector. **Table 6.3:** Simulated vs. Measured Jitter (1 kHz - 100 MHz) | | 4 | $A_P$ | Integrated RI | Error | | |-------------|-------|-------|---------------|----------|-------| | Regime | $A_I$ | | Simulated | Measured | (%) | | Low Gain | 2 | 7 | 278 | 343 | -19.0 | | Optimum | 1 | 40 | 147 | 143 | 2.8 | | High Gain | 1 | 300 | 274 | 284 | -3.5 | | Limit Cycle | 100 | 2031 | 1267 | 1026 | 23.5 | Figure 6.3: BBPLL Phase Noise and Trajectory - $A_I=2,\,A_P=7$ Figure 6.4: BBPLL Phase Noise and Trajectory - $A_I=1,\,A_P=40$ Figure 6.5: BBPLL Phase Noise and Trajectory - $A_I=100,\,A_P=2031$ # 6.6 Summary In this chapter the efficacy of using time-based simulation to accurately characterize the locking, phase margin, bandwidth and output jitter behaviour of an all digital BBPLL was demonstrated. Mathematical phase noise models were created that accurately mimic actual XO and DCO measured phase noise data. These phase noise profiles were converted into time-domain clock jitter vectors files used to modify the time positions the DCO and XO clock edges. After being processed by the loop, the resulting DCO output clock jitter was transformed into the frequency domain and presented as phase noise. Three operating regimes were documented - random, optimum and limit-cycle - with their associated state-space trajectory plots. Time-based simulation of the complete digital BBPLL model was shown to produce accurate results for a significant reduction in run-time over traditional circuit simulation. A method of accurately mimicking phase noise profiles, with compound slopes, using mathematical curve fitting based on random number generation was demonstrated. The results of these frequency domain equations were transformed into time-domain jitter vectors that were used in the event-driven simulation. Run-time was further reduced by creating these jitter vectors prior to simulation rather than during simulation. This also increased the numerical accuracy of the jitter vector time stamps as their value was not truncated during computation. Even with the final time-stamp value quantized to the simulator accuracy of the 1 fs the jitter value error was less than 1 % for a 1 ms simulation run-time. This method of all digital BBPLL analysis proved to be a solution to the complex analysis problem that resulted from replacing the TDC used in conventional all digital PLL systems with a BPD. Not only did it result in accurate simulation of the BBPLL behaviour, but it also improved time-efficiency over other simulation methods. ## Chapter 7 # Simulation and Test Results # 7.1 Simulated Results and Die Micrograph Simulation results of the BBPLL are listed in Table 7.1. The DCO capacitive resolution is 75 aF, which results in a calculated minimum frequency step of 2.0 MHz. The DCO frequency tuning array range is 11 %, well centred about the required 14.0 GHz output clock frequency. This range is adequate to compensate for PVT and extracted layout variations. The PN of the oscillator was approximately -104 dBc/Hz at an offset of 1.0 MHz from the 14.0 GHz output frequency, see Figure 6.1. The DCO core transistor gain was simulated to be 3.0 - 3.5, which is considered sufficient for oscillator start-up. The output clock amplitude was Vm $\approx$ 1.0 V, selected to provide maximum amplitude (i.e., maximum Signal-to-Noise Ratio - SNR) while remaining below the transistor breakdown limit of the process. Table 7.1: Simulated Results | Parameter | Parameter Value | |----------------------------------|-----------------| | Capacitance Resolution (aF) | 75 | | Frequency Resolution (MHz/LSB) | 2.0 | | Frequency Tuning Range (%) | 11 | | Output PN at 1.0 MHz (dBc/Hz) $$ | -104 | | Flicker Noise Reduction (dB) | 6 - 9 | | Thermal Noise Reduction (dB) | 3 - 6 | A calculated flicker noise improvement of between 6 and 9 dB and a thermal noise improvement of between 3 and 6-dB was estimated. A 7-dB PN improvement was expected, by design, due to the innovative Class-C oscillator implementation. The Q of the oscillator was simulated to be approximately 10 at 14 GHz. The BBPLL phase margin was calculated to be PM $\approx 60^{\circ}$ for DLF gains of $A_I = 1$ and $A_P = 40$ . Output PN and trajectory results are illustrated by Figures 6.3, 6.4 and 6.5. The area of the BBPLL fabricated die is $0.06~mm^2$ and is illustrated by Figure 7.1. The largest element is the Inductor, $156~\mu m$ x $156~\mu m$ . The oscillator core transistors can be seen at the top of this box as they interface to the varactor array. Above the Varactor Array box are the DCO buffers, divide-by-four flip flops for the feedback clock and the LVDS drivers used for clock distribution. The Current Source and current calibration circuits are on left of the varactor array. The box marked DSP outlines the synthesized BBPLL functions - BBPD, DLF and $\Sigma\Delta$ -modulator (SDM). Figure 7.1: BBPLL Die Micrograph - 7-nm Process ### 7.2 Test Results The measurements described in this section were performed with a 350-MHz commodity off-chip XO [1] BBPLL reference clock. As the BBPLL is at the core of a SERDES circuit and therefore, not directly accessible, its performance was assessed by observing a transmitter output configured to drive a 14-GHz clock (1010...) pattern. Phase noise and integrated jitter were measured at room temperature with BBPLL DLF settings varied across a wide range ( $A_I = 1 \dots 25$ and $A_P = 2 \dots 300$ ). Selected settings and the resulting PN results are illustrated in Figures 6.3 to 6.5 and Table 6.3. The filter settings that yielded the lowest jitter were $A_P/A_I = 40/1$ , and the corresponding PN spectrum (measured with a Keysight E5052B) is shown in Figure 7.2. The spectrum is free of spurious tones and phase noise peaking, indicating limit-cycle free operation with a loop phase margin of approximately 60°. The RMS random jitter integrated from 1 kHz to 100 MHz was 143 fs, measured at the same filter settings. Figure 7.2: Measured PN (1 kHz to 100 MHz Offset) - 14-GHz Output Frequency. Measured periodic jitter was found to be 500 fs peak-to-peak - the real-time oscilloscope plot (from Keysight DSAZ634A) is shown in Figure 7.3. No limit cycles were observed across the mentioned filter range, even with the Sigma-Delta Modulator (SDM) disabled. The gain of the DLF had to be increased to an excessive $A_P > 2000$ to generate limit cycles illustrated by Figure 6.5. Figure 7.3: Periodic Jitter Measurement. Figure 7.4 provides more insight into the various phase noise contributors. In addition to the closed loop phase noise (with and without SDM dithering), the graph shows the free-running DCO PN (measured at -104 dBc/Hz at 1-MHz offset), as well as, reference PN (scaled by the closed-loop gain). Outside the loop bandwidth, the BBPLL PN is 5–6 dB above the free-running DCO PN (measured at 10 MHz), indicating that quantization noise is not entirely eliminated by the SDM. With the SDM disabled, the integrated jitter rises to 270 fs. In-band BBPLL PN is entirely determined by reference PN. The free running DCO PN is not altered when the SDM is turned on, thus confirming the SDM noise shaping properties. Figure 7.5 shows the measured frequency tuning to be both monotonic and linear from 13.7 to 15.7 GHz with approximately 14 % tuning range. This exceeds the Figure 7.4: Breakdown of PN Contributors. simulated frequency range uncertainty of 11 % due to process, voltage and temperature variation. The same plot displays the frequency resolution, which on average measures 1.2 MHz/LSB over the entire tuning range. The worst-case frequency step is 2 MHz/LSB from Table 7.1. The small frequency step variations shown in Figure 7.5 confirm the matching between the 5-fin and 6-fin sized varactors, as well as the negligible impact from random varactor mismatch. Excluding the wireline transmitter clock distribution, the BBPLL dissipates a total of 40 mW from two supplies (0.75 V/1.5 V) of which 63 % is consumed by the DSP section. The remaining 14.8 mW is consumed by the DCO. Figure 7.5: DCO Frequency Tuning characteristic and Step Size. # 7.3 Phase-Locked Loops in Wireline (SERDES) Applications Table 7.2 compares the significant performance parameters of the BBPLL discussed in this thesis with four prominently published works describing PLLs for wireline applications. All designs used LC-tank oscillators and were integer-N implementations except for [86]. FinFET technology was used for all but [87], which was the only other BFD design. Output frequencies were similar - within the range of 11 GHz to 25 GHz. The RMS jitter of this work is consistent or better than other analogue charge-pump PLLs and significantly better than the reported BBPLL at very competitive area and power dissipation. It should be noted that the area parameter for this thesis includes the complete BBPLL, its current source, digital control and decoupling capacitors. Reference [88] does not specify the circuits included in its area estimate, which is conspicuously small. The list that follows is a brief summary of each paper. - 1. Reference [88] J. Kim et al., "A 112Gb/s PAM-4 transmitter with 3-tap FFE in 10nm CMOS," in Int. Solid-State Circuits Conf. Tech, Dig., San Francisco, CA, Feb. 2018, pp. 102–103. - The timing source for this 112/56 Gbps PAM-4/NRZ transmitter is a 14-GHz LC-PLL with an injection-locked quadrature generator operating at 1.0 V from an integrated voltage regulator supplied by 1.5 V. The clock signals undergo per-lane Duty-Cycle Detection/Correction (DCD/DCC) and Quadrature-Error Detection/Correction (QED/QEC). This circuit is implemented in a 10-nm Fin-FET CMOS technology. - 2. Reference [86] P. Upadhyaya et al., "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a configurable ADC in 16nm FinFET," in Int. Solid-State Circuits Conf. Tech, Dig., San Francisco, CA, Feb. 2018, pp. 108–109. This paper demonstrates a fully integrated and adaptive 19-to-56 Gb/s PAM-4 (9.5-to-28 Gb/s in NRZ mode) quad transceiver, with two fractional-N LC-PLLs per quad, implemented in 16-nm FinFET technology. Active inductor clock distribution is employed with DCC. - 3. Reference [89] M. Raj et al., "A 164fsrms 9-to-18GHz sampling phase detector based PLL with in-band noise suppression and robust freugency acquisition in 16nm FinFET," in IEEE Symp. VLSI Circuits Tech. Dig., Kyoto, Japan, Jun. 2017, pp. 182–183. This paper describes a Sampling Phase Detector (SPD) based PLL implemented in a 16-nm FinFET process. The high gain of this SPD suppresses PLL inband noise and its programmability controls the loop bandwidth. Instead of sampling the VCO output directly like sub-sampling PLLs, the output of the frequency divider is sampled. This improves capture range while maintaining in-band noise reduction. The design uses a single programmable charge pump based frequency acquisition technique with programmability and employs an analogue loop filter. The SPD improves the measured inband phase noise from -90.6 dBc/Hz to -104.1 dBc/Hz at 18 GHz with RMS jitter of 164 fs integrated over 10 KHz to 100 MHz, while consuming 29.2 mW. 2X frequency range of 9-to-18 GHz is implemented using two LC VCOs. 4. Reference [87] M. Hekmat et al., "A 25 GHz Fast-Lock Digital LC PLL with Multiphase Output Using a Megnetically-Coupled Loop of Oscillators," in J. Solid-State Circuits, vol. 50, no. 2, Feb. 2015, pp. 490–95. This paper describes a fast-wakeup integer-N Bang-Bang digital PLL, implemented in 40-nm CMOS technology, for SERDES applications. The oscillator generates eight output phases, using four magnetically coupled loops, to implement output clock phase adjustment. This feature has a 2x area improvement of similar prior art. Fast lock upon wakeup is achieved by calibrating the phase of the feedback clock w.r.t. the reference clock using a first-order loop and on-the-fly adjustments of loop parameters. The output clock phases have less than 2°C quadrature error up to 25 GHz. The measured output jitter is 392 fs integrated over 100 kHz to 100 MHz. The BBPLL consumes 64 mW of power, 23 mW of which is consumed by the DCO. | Table 7.2: | Contemporary | SERDES | PLL. | Performance | Comparison | |------------|--------------|--------|------|-------------|------------| | | | | | | | | | | | | | | | D | [88] ISSCC | [86] ISSCC | [89] ISSCC | [87] JSSCC | m: 11/ 1 | | |---------------------|-------------------|-------------------|--------------------|--------------------|--------------------|--| | Parameter | 2018 | 2018 | 2017 | 2015 | This Work | | | Technology | 10-nm FinFET | 16-nm FinFET | 16-nm FinFET | 40-nm CMOS | 7-nm FinFET | | | Architecture | Analogue | Analogue | T | Digital | Digital | | | | Integer-N | Fractional-N | Integer-N | BBPLL | BBPLL | | | Oscillator | LC | LC | LC | LC DCO | LC DCO | | | Reference Frequency | N/A | N/A | $450~\mathrm{MHz}$ | $390~\mathrm{MHz}$ | $350~\mathrm{MHz}$ | | | Output Frequency | $14~\mathrm{GHz}$ | $14~\mathrm{GHz}$ | $18~\mathrm{GHz}$ | $25~\mathrm{GHz}$ | $14~\mathrm{GHz}$ | | | Integrated Jitter | 185 (1 KHz | 180 | 164 (1 KHz | 392 (100 kHz | 143 (1 kHz) | | | $(fs_{rms})$ | - 100 MHz) | N/A | - 100 MHz) | - 100 MHz) | - 100 MHz) | | | Phase Noise | DT / A | N/A | -102 | ≈ <b>-</b> 97.0 | -103.5 | | | (dBc/Hz) (100 KHz) | N/A | | | | | | | Phase Noise | 100 | 27/4 | 407.0 | -102.5 | -108.7 | | | (dBc/Hz) (1 MHz) | -108 | N/A | -107.3 | | | | | Phase Noise | 110 | N/A | -114 | -98.3 | -120.3 | | | (dBc/Hz) (10 MHz) | -119 | | | | | | | Power (mW) | N/A | N/A | 29.2 | 64 | 40 | | | Area $(mm^2)$ | $\approx 0.023$ | $\approx 0.34$ | 0.39 | 0.10 | 0.06 | | | | | | | | | | ## Chapter 8 #### Conclusions and Future Work #### 8.1 Conclusions Until recently BBPLLs have been largely ruled out of low-jitter applications, such as wireline transceivers, as they have been, at first glance, assumed to exhibit inferior behaviour due to their quantized operation, leading to pronounced limit-cycle regime operation, quantization jitter and frequency-domain spurs. This work demonstrates that this is not necessarily the case, in fact the contrary can be true. That is, the total output jitter of the BBPLL discussed in this proposal has been reduced to levels that are sufficient for high-speed wireline applications. The reported jitter is not only significantly lower than that of previously reported digital PLLs, but rivals analogue PLL performance, while offering much improved scalability through implementation in an advanced FinFET CMOS process. This is achieved with a DCO implementation rigorously optimized for low phase noise and fine resolution, combined with innovations such as latency reduction in the fully synthesized digital section. Closed-loop phase noise analysis and budgeting, notoriously difficult due to the non-linearity of the BBPLL, has been thoroughly addressed using an accurate and efficient simulation methodology that exploits both well understood mathematical methods and industry-standard digital simulation. With these innovations at hand, BBPLL implementations can be widely adopted for use in jitter-critical applications, dramatically reducing the challenges of CMOS process scaling. #### 8.2 List of Contributions The work recounted in this proposal makes the following contributions to the current state of the art. - 1. Single-fin modularity was used to implement a fine resolution $\Delta$ -capacitance of 75 aF. This was made possible as the on-state capacitance of the FinFET PMOS inversion-mode varactors has a linear relationship with the number of fins. - 2. A new closed form solution quantifying how source degeneration has been used to reduce transistor flicker noise in oscillators was derived. This is important in this application as the flicker noise produced by small geometry transistors is significantly worse than the flicker noise of larger geometry planar MOSFET transistors. - 3. Taking advantage of the improved performance of the 7-nm process, the digital loop filter in the forward path of the BBPLL was clocked at 10 times the reference frequency and incorporated a lookahead architecture. This new architecture reduces delay or loop latency, which deteriorates jitter performance and phase margin, that would normally be present if the digital loop filter were clocked at the reference frequency. - 4. A new method of efficiently incorporating reference oscillator and DCO jitter with digital time-domain event-driven simulation (i.e., verilog simulator) [90] is proposed. This enabled full functional and phase noise simulation of the BBPLL, while greatly reducing simulation run-time. - 5. Digital time-domain simulator run-time was further reduced (by five times) and output jitter error was improved (< 1 % for a 1 ms simulation time) by calculating jitter time-stamp vectors prior to simulation rather than during simulation. - 6. A novel approach to Large-signal circuit and 3-D EM analyses is proposed to characterize circuit elements and modules to create a mathematical model of the DCO. The run-time of the mathematical model was significantly shorter than that of the DCO circuit simulation, while maintaining accuracy to within $\pm$ 0.9 %. The shortened run-time allowed various DCO array architectures and implementations to be optimized quickly and accurately. This project included the first design and implementation of an LC-tank Class-C oscillator [5] in TSMC's 7-nm CMOS FinFET process. The success of this implementation demonstrated that this process, optimized for digital design, can also be used to realize analogue circuits exhibiting start-of-the-art performance. #### 8.3 Future Work The list blow itemizes areas where future work could lead to improved performance of the BBPLL. 1. For some time $\Sigma\Delta$ -modulators have been used in DPLL forward paths to improve DCO frequency resolution. The objective here is to produce an output clock signal with an effective frequency that is as close as possible to the desired clock frequency in spite of the quantized nature of the DCO. $\Sigma\Delta$ -modulators are useful here for three reasons: first, they can be realized as small stable digital circuits; second, their relatively long period of pseudo-random dithering resists the creation of output noise spurs; and third, the intrinsic quantization noise produced by the $\Sigma\Delta$ -modulator is pushed to higher (out-of-band) frequencies. Higher reference frequencies reduce the feedback division ratio, which reduces the PLL output phase noise bandwidth and the amount of phase noise at the DCO output. That is, for every halving of the feedback divisor the phase noise floor will be reduced by approximately 6 dB. In SERDES applications the reference clock can be hundreds of MHz, which reduces the feedback divider ratio. Unfortunately, with the $\Sigma\Delta$ -modulators in the forward path, the low divider ratio leaves only a small number of clock cycles to converge to a steady state output. Thus, the $\Sigma\Delta$ -modulator in this application may not fully converge before a new input is present, effectively increasing its intrinsic jitter. The ideal solution to this problem is to find a version of the $\Sigma\Delta$ -modulator architecture that generates a fully converged output in a only a few high-speed clock cycles. - 2. The output jitter of the BBPLL could be reduced further by designing the reference signal path to employ a higher frequency reference source (e.g., 500 MHz). This represents an increase from the present 350 MHz reference frequency. - 3. The BBPD of the BBPLL is a single high-speed D flip-flop that possesses an aperture window, $t_A$ , defined by its setup, $t_{SU}$ , and hold, $t_H$ , times (i.e., $t_{SU} + t_H = t_A$ ). An analysis carried out in [26–28] shows that BBPLL output jitter can be reduced by reducing $t_A$ . Therefore, research could be carried out to determine if an alternative D flip-flop type with higher gain and smaller $t_A$ can be used to reduce BBPLL output jitter in this application. - 4. The current 7-nm BBPLL could be converted to a finer geometry process (e.g., 5-nm). Analysis should be done to determine if the DCO resolution will be improved. That is, a single fin at 7 nm gives ≈ 75 aF, will a single fin at 5 nm resolve a smaller consistent capacitance value. Additionally, the opportunity to improve this resolution by choosing a different FinFET scaling in this new process should be investigated. - 5. The tuning of the BBPLL DLF gain settings for lowest PN is a manual process in the present implementation. The output jitter will need to be optimized for PVT variation. Therefore, it would be very useful to automate this process. One method of achieving this is to use the BPD output state history to determine the required changes in the DLF gain settings [91,92] for minimum jitter. A Least Mean Squares (LMS) algorithm is a possible candidate for this function. - 6. A shortcoming of the BPD used in the BBPLL is that it causes lock time to be slow, due to the nonlinear phase correction. The issue can be mitigated by employing a second, more conventional PFD, for the initial conversion to the locked state. Once the PLL is close to lock the PFD is switched off and the BPD is switched on. Research should be done to extend the LMS concept discussed previously to reduce lock time and eliminate the need for the PFD [93, 94]. - 7. Further reduction in oscillator thermal and flicker noise, as well as flicker noise corner, could be achieved by replacing the Class-C oscillator [5] with a switching mode, fundamental or overtone, LC-tank oscillator (i.e., Class-D [42–44], Class-F [44–46] ...). The current Class-C implementation was heavily modified to - improve PN. Research should be carried out to determine if some combination of these modifications could also improve switching oscillator performance. - 8. Further analysis should be done to improve the accuracy of the linear equations derived from the BBPLL functional block diagram illustrated in Figure 3.8. Specifically, the effect of the forward loop delay, $z^{-D}$ , on PM can be improved. #### List of References - [1] Datasheet, "Ultra Series ™ Crystal Oscillator Si541 Data Sheet," 2018. - [2] R. W. Rhea, Oscillator Design and Computer Simulation. Atlanta, GA: Noble Publishing Corporation, second ed., 1995. - [3] OIF (Optical Internetworking Forum), Common Electrical I/O (CEI)- Electrical and Jitter Interoperability agreements for 6G+ bps, 11G+ bps and 25G+ bps I/O (OIF-CEI-03.1). Fremont, CA: The Optical Internetworking Forum, 2017. - [4] T. H. Lee and A. Hajimiri, "Oscillator phase noise: a tutorial," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 3, pp. 326–336, 2000. - [5] A. Mazzanti and P. Andreani, "Class-C harmonic CMOS VCOs, with a general result on phase noise," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 12, pp. 2716–2729, 2008. - [6] R. B. Staszewski and P. T. Balsara, "Fully Digital Control of Oscillator Frequency," in *All-Digital Frequency Synthesizer in Deep-Submicron CMOS*, ch. 2, pp. 33–47, Hoboken, New Jersey: John Willey & Sons, Inc., first ed., 2006. - [7] R. W. Rhea, "Oscillator Noise," in *Oscillator Design and Computer Simulation*, ch. 4, pp. 114–116, Atlanta, GA: McGraw-Hill, Inc., second ed., 1996. - [8] J. A. Edminister, "Series and Parallel Resonance," in *Schaum's Outline of Electric Circuits*, ch. 8, pp. 81–98, New York: McGraw-Hill, Inc., first ed., 1965. - [9] A. A. Abidi, "How Phase Noise Appears in Oscillators," in *Analog Circuit Design-RF Analog-to-Digital Converters; Sensor and Actuator Interfaces; Low-Noise Oscillators, PLLs and Synthesizers*, ch. 16, pp. 271–290, Boston: Kluwer Academic, 1997. - [10] B. Razavi, "A study of phase noise in CMOS oscillators," *Phase-Locking in High-Performance Systems: From Devices to Architectures*, vol. 31, no. 3, pp. 176–188, 2003. - [11] G. E. Moore, "Cramming more components onto integrated circuits," *Proceedings* of the IEEE, vol. 86, no. 1, pp. 82–85, 1998. - [12] G. E. Moore, "Cramming more components onto integrated circuits (Reprinted from Electronics, Volume 38, Number 8, April 19, 1965, pp. 114 ff)," *IEEE Solid-States Circuits Society Newsletter*, vol. 11, no. 3, pp. 33–36, 2006. - [13] D. Bhattacharya and N. K. Jha, "FinFETs: From devices to architectures," *Hindawi Publishing Corporation, Advances in Electronics*, vol. 2014, pp. 1–22, 2015. - [14] P. Wambacq, B. Verbruggen, K. Scheir, J. Borremans, M. Dehan, D. Linten, V. De Heyn, G. Van der Plas, A. Mercha, B. Parvais, C. Gustin, V. Subramanian, N. Collaert, M. Jurczak, and S. Decoutere, "The potential of FinFETs for analog and RF circuit applications," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 54, no. 11, pp. 2541–2551, 2007. - [15] T.-J. King Liu, "FinFET History, Fundamentals and Future," in Symposium on VLSI short course, no. 3, (Berkeley, CA), pp. 1–23, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720-1770 USA, 2012. - [16] D. Friedman, "Considerations and Implementations for High Data Rate Serial Link Design," in *IEEE Solid-State Circuits Society*, (Toronto), pp. 1–73, IBM Thomas J. Watson Research Center, 2018. - [17] Y. Chang, Low-Power Wireline Transmitter Design. Phd thesis, University of California, Los Angeles, 2018. - [18] B. Razavi, "The Bridged T-Coil [A Circuit for All Seasons]," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 4, pp. 9–13, 2015. - [19] J. Kim, A. Balankutty, A. Elshazly, Y. Y. Huang, H. Song, K. Yu, and F. O'Mahony, "A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS," in *IEEE International Solid-State Circuits Conference*, vol. 58, (San Francisco, CA), pp. 60–61, IEEE, 2015. - [20] C. Menolfi, J. Hertle, T. Toifl, T. Morf, D. Gardellini, M. Braendli, P. Buchmann, and M. Kossel, "A 28Gb/s source-series terminated TX in 32nm CMOS SOI," *IEEE International Solid-State Circuits Conference*, vol. 55, pp. 334–335, 2012. - [21] K. Suzuki, Y. Tomita, H. Yamaguchi, T. Cheung, T. Yamamoto, and H. Tamura, "A 24-Gb/s source-series terminated driver with inductor peaking in 28-nm CMOS," in *Proceedings 2012 IEEE Asian Solid-State Circuits Conference, A-SSCC*, (Kobe, Japan), pp. 137–140, IEEE, 2012. - [22] A. Hajimiri and T. H. Lee, *The Design of Low Noise Oscillators*. Norwell, MA: Kluwer Academic, first ed., 1999. - [23] M. Hossain and A. Chan Carusone, "5-10 Gb/s 70 mW burst mode AC coupled receiver in 90-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 3, pp. 524–537, 2010. - [24] G. Marucci, S. Levantino, P. Maffezzoni, and C. Samori, "Analysis and Design of Low-Jitter Digital Bang-Bang Phase-Locked Loops," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 1, pp. 26–36, 2014. - [25] M. Zanuso, D. Tasca, S. Levantino, A. Donadel, C. Samori, and A. L. Lacaita, "Noise analysis and minimization in bang-bang digital PLLs," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, pp. 835–839, nov 2009. - [26] S. Bashiri, S. Aouini, N. Ben-Hamida, and C. Plett, "Analysis and modeling of the phase detector hysteresis in bang-bang PLLs," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, pp. 347–355, feb 2015. - [27] J. Lee, K. S. Kundert, and B. Razavi, "Analysis and modeling of bang-bang clock and data recovery circuits," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1571–1580, 2004. - [28] C. Jiang, P. Andreani, and U. D. Keil, "Detailed behavioral modeling of bangbang phase detectors," in *IEEE Asia-Pacific Conference on Circuits and Systems*, *Proceedings*, *APCCAS*, pp. 716–719, IEEE, 2006. - [29] S. Tertinek, J. P. Gleeson, and O. Feely, "Binary phase detector gain in bang-bang phase-locked loops with DCO jitter," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 57, pp. 941–945, dec 2010. - [30] N. Da Dalt, "Markov Chains-Based Derivation of the Phase Detector Gain in Bang-Bang PLLs," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 53, pp. 1195–1199, nov 2006. - [31] N. Da Dalt and A. Sheikholeslami, "Effect of Jitter on Bang-Bang CDR," in *Understanding Jitter and Phase Noise: A Circuit and Systems Perspective*, ch. 3, pp. 155–160, New York: Cambridge University Press, first ed., 2018. - [32] J. W. M. Rogers, C. Plett, and F. Dai, "Noise-Shaping Effect," in *Integrated Circuit Design for High Speed Frequency Synthesis*, ch. 9, pp. 306 307, Norwood, MA: Artech House, Inc., first ed., 2006. - [33] S. Pavan, R. Schreier, and G. C. Temes, "Noise-Shaping," in *Understanding Delta-Sigma Data Converters* (R. J. Baker, ed.), ch. 2, pp. 42–52, Hoboken, New Jersey: John Willey & Sons, Inc., second ed., 2017. - [34] D. Pfaff, R. Abbott, X. J. Wang, B. Zamanlooy, S. Moazzeni, R. Smith, and C. C. Lin, "A 14-GHz Bang-Bang Digital PLL with sub-150fs Integrated Jitter for Wireline Applications in 7nm FinFET," in *Proceedings of the Custom Integrated Circuits Conference*, (Austin, TX), pp. 1–4, IEEE, 2019. - [35] D. Pfaff, R. Abbott, X. J. Wang, S. Moazzeni, R. Mason, and R. R. Smith, "A 14-GHz Bang-Bang Digital PLL with Sub-150-fs Integrated Jitter for Wireline Applications in 7-nm FinFET CMOS," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 3, pp. 580–591, 2020. - [36] R. B. Staszewski and P. T. Balsara, "Implementation of Tracking Bits," in *All-Digital Frequency Synthesizer in Deep-Submicron CMOS*, ch. 3, pp. 64–73, Hoboken, New Jersey: John Willey & Sons, Inc., first ed., 2006. - [37] N. Da Dalt, "Linearized analysis of a digital bang-bang PLL and its validity limits applied to jitter transfer and jitter generation," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 55, pp. 3663–3675, dec 2008. - [38] J. W. M. Rogers, C. Plett, and F. Dai, "Continuous-Time Analysis for PLL Synthesizers," in *Integrated Circuit Design for High Speed Frequency Synthesis*, ch. 3, pp. 52–58, Norwood, MA: Artech House, Inc., first ed., 2006. - [39] A. Hajimiri and T. H. Lee, "Tuned-Tank Oscillators," in *The Design of Low Noise Oscillators*, ch. 3, pp. 17–24, Norwell, MA: Kluwer Academic, first ed., 1999. - [40] L. Fanori and P. Andreani, "Highly efficient class-C CMOS VCOs, including a comparison with class-B VCOs," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 7, pp. 1730–1740, 2013. - [41] F. Chicco, A. Pezzotta, and C. C. Enz, "Analysis of power consumption in LC oscillators based on the inversion coefficient," in *Proceedings IEEE International Symposium on Circuits and Systems*, (Neuchatel, Switzerland), pp. 1–4, EPFL, 2017. - [42] L. Fanori and P. Andreani, "A 2.5-to-3.3GHz CMOS Class-D VCO," in Proceedings of the Custom Integrated Circuits Conference, (San Jose, CA), pp. 346–348, IEEE, 2013. - [43] L. Fanori and P. Andreani, "Class-D CMOS oscillators," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 12, pp. 3105–3119, 2013. - [44] M. Shahmohammadi, M. Babaie, and R. B. Staszewski, "A 1/f Noise Upconversion Reduction Technique for Voltage-Biased RF CMOS Oscillators," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 11, pp. 2610–2624, 2016. - [45] M. Babaie and R. B. Staszewski, "A class-F CMOS oscillator," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 12, pp. 3120–3133, 2013. - [46] Y. Hu, T. Siriburanon, and R. B. Staszewski, "A 30-GHz Class-F23 Oscillator in 28nm CMOS using harmonic extraction and achieving 120 kHz l/f3 Corner," in *ESSCIRC 2017 43rd IEEE European Solid State Circuits Conference*, (Leuven, Belgium), pp. 87–90, 2017. - [47] Z. Zong, G. Mangraviti, and P. Wambacq, "A 22-29 GHz Voltage-Biased LC-VCO with Suppressed Flicker Noise over Tuning Range in 22nm FD-SOI," 17th IEEE International New Circuits and Systems Conference, NEWCAS 2019, vol. 2, no. 1, pp. 19–22, 2019. - [48] R. B. Staszewski and P. T. Balsara, "Dynamic Element Matching of Varactors," in *All-Digital Frequency Synthesizer in Deep-Submicron CMOS*, ch. 3, pp. 70–71, Hoboken, New Jersey: John Willey & Sons, Inc., first ed., 2006. - [49] Z. Bai, X. Zhou, and R. Mason, "A novel Injection Locked Rotary Traveling Wave Oscillator," in *Proceedings IEEE International Symposium on Circuits and Systems*, pp. 1768–1771, IEEE, 2014. - [50] Z. Bai, X. Zhou, R. D. Mason, and G. Allan, "A 2-GHz Pulse Injection-Locked Rotary Traveling-Wave Oscillator," *IEEE Transactions on Microwave Theory* and Techniques, vol. 64, no. 6, pp. 1854–1866, 2016. - [51] N. T. Abou-El-Kheir, R. D. Mason, M. Li, and M. C. Yagoub, "A High-Performance Low Complexity All-Digital Fractional Clock Multiplier," in *Proceedings 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC*, pp. 73–76, IEEE, 2019. - [52] J. Maget, M. Tiebout, and R. Kraus, "MOS varactors with n- and p-type gates and their influence on an LC-VCO in digital CMOS," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 7, pp. 1139–1147, 2003. - [53] P. Andreani and S. Mattisson, "On the use of MOS varactors in RF VCO's," in *Phase-Locking in High-Performance Systems: From Devices to Architectures* (B. Razavi, ed.), vol. 9200, pp. 157–162, IEEE, 2003. - [54] R. B. Staszewski, C. M. Hung, D. Leipold, and P. T. Balsara, "A first multigigahertz digitally controlled oscillator for wireless applications," *IEEE Transactions* on Microwave Theory and Techniques, vol. 51, no. 11, pp. 2154–2164, 2003. - [55] J. Zhuang, Q. Du, and T. Kwasniewski, "A 3.3 GHz LC-based digitally controlled oscillator with 5kHz frequency resolution," in 2007 IEEE Asian Solid-State Circuits Conference, A-SSCC, (Jeju, Korea), pp. 428–431, IEEE, 2007. - [56] Z. Bai and R. Mason, "A 20kHz frequency resolution DCO," in IEEE International Conference on Solid-State and Integrated Circuit Technology, Proceedings, pp. 257–259, IEEE, 2010. - [57] Manual, "StarRC<sup>™</sup> Custom Parasitic Extraction for Analog Mixed-Signal and Digital IC Design," 2011. - [58] J. Zhao, "PeakView EMD™, 3D full-wave and high-precision electromagnetic solver," 2009. - [59] H. A. Wheeler, "Simple Inductance Formulas for Radio Coils," *Proc. IRE*, vol. 16, no. 10, pp. 1398–1400, 1928. - [60] A. Payne, "THE AC RESISTANCE OF RECTANGULAR CONDUCTORS," 2016. - [61] S. E. Rutherford and J. D. Cockcroft, "Skin effect in rectangular conductors at high frequencies," *Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character*, pp. 533–542, 1929. - [62] S. J. Haefner, "ALTERNATING-CURRENT RESISTANCE OF RECTANGULAR CONDUCTORS," *Proceedings of the Institute of Radio Engineers*, vol. 25, no. 4, pp. 434–447, 1937. - [63] S. Kapur and D. E. Long, "EMX®: A commercial full-wave 3D Electromagnetic simulator," 2010. - [64] M. D. Wei, S. F. Chang, and C. S. Chen, "A low phase-noise QVCO with integrated back-gate coupling and source resistive degeneration technique," *IEEE Microwave and Wireless Components Letters*, vol. 19, no. 6, pp. 398–400, 2009. - [65] S. M. Moon, M. Q. Lee, and B. S. Kim, "Design of quadrature CMOS VCO using source degeneration resistor," *Digest of Papers IEEE Radio Frequency Integrated Circuits Symposium*, pp. 535–538, 2005. - [66] H.-M. Chien, "VCO with Power Supply Rejection Enhancement Circuit," 2006. - [67] C. G. Tan and R. Tsang, "Flicker Noise Degeneration Technique for VCO," 2009. - [68] X. J. Xi, M. Dunga, J. He, W. Liu, K. M. Cao, X. Jin, J. J. OU, M. Chan, A. M. Niknejad, and C. Hu, BSIM4.3.0 MOSFET Model User's Manual. Berkeley, CA: University of California, 2003. - [69] Y. S. Chauhan, D. D. Lu, S. Vanugopalan, S. Khandelwal, J. P. Duarte, N. Paydavosi, A. M. Niknejad, and C. Hu, FinFET Modeling for IC Simulation and Design Using the BSIM-CMG Standard. Walthan, MA: Academic Press Elsevier, 2015. - [70] V. Subramanian, B. Parvais, J. Borremans, A. Mercha, D. Linten, P. Wambacq, J. Loo, M. Dehan, C. Gustin, N. Collaert, S. Kubicek, R. Lander, J. Hooker, F. Cubaynes, S. Donnay, M. Jurczak, G. Groeseneken, W. Sansen, and S. Decoutere, "Planar bulk MOSFETs versus FinFETs: An analog/RF perspective," *IEEE Transactions on Electron Devices*, vol. 53, no. 12, pp. 3071–3077, 2006. - [71] B. Razavi, R. H. Yan, and K. F. Lee, "Impact of Distributed Gate Resistance on the Performance of MOS Devices," *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*, vol. 41, no. 11, pp. 750–754, 1994. - [72] K. Miyaguchi, B. Parvais, L. Å. Ragnarsson, P. Wambacq, P. Raghavan, A. Mercha, A. Mocuta, D. Verkest, and A. Thean, "Modeling FinFET metal gate stack resistance for 14nm node and beyond," in 2015 International Conference on IC Design and Technology, ICICDT 2015, pp. 2–5, 2015. - [73] E. Solis Avila, J. C. Tinoco, A. G. Martinez-Lopez, M. A. Reyes-Barranca, A. Cerdeira, and J. P. Raskin, "Parasitic Gate Resistance Impact on Triple-Gate FinFET CMOS Inverter," *IEEE Transactions on Electron Devices*, vol. 63, no. 7, pp. 2635–2642, 2016. - [74] Y. M. DIng, D. D. Misra, and P. Srinivasan, "Flicker Noise Performance on Thick and Thin Oxide FinFETs," *IEEE Transactions on Electron Devices*, vol. 64, no. 5, pp. 2321–2325, 2017. - [75] K. R. Laker and W. M. C. Sansen, "Noise Sources in FET," in *Design of Analog Integrated Circuits and Systems*, ch. 1, pp. 74–86, New York: McGraw-Hill, Inc., 1994. - [76] K. K. O, N. Park, and D. J. Yang, "1/f noise of NMOS and PMOS transistors and their implications to design of voltage controlled oscillators," in *IEEE Radio Frequency Integrated Circuits Symposium*, *RFIC*, *Digest of Technical Papers*, pp. 59–62, 2002. - [77] D. A. Johns and K. W. Martin, "Device Model Summary," in *Analog Integrated Circuit Design*, ch. 1, pp. 56–61, New York: John Willey & Sons, Inc., first ed., 1997. - [78] R. B. Staszewski, C. Fernando, and P. T. Balsara, "Event-driven simulation and modeling of phase noise of an RF oscillator," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 52, no. 4, pp. 723–733, 2005. - [79] S. Liao and M. Horowitz, "A verilog piecewise-linear analog behavior model for mixed-signal validation," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 8, pp. 2229–2235, 2014. - [80] J. Park, K. Muhammad, and K. Roy, "Efficient Modeling of 1/fα Noise Using Multirate Process," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 25, no. 7, pp. 1247–1256, 2006. - [81] IEEE Computer Society, IEEE Standard for SystemVerilog Unified Hardware Design, Specification, and Verification Language. New York: IEEE, 2017. - [82] N. Da Dalt and A. Sheikholeslami, *Understanding Jitter and Phase Noise: A Circuits and Systems Perspective*. New York: Cambridge University Press, 2018. - [83] N. J. Kasdin, "Discrete Simulation of Colored Noise and Stochastic Processes and $1/f\alpha$ Power Law Noise Generation," *Proceedings of the IEEE*, vol. 83, no. 5, pp. 802–827, 1995. - [84] K. Kundert, "The Fracpole Suite," 2008. - [85] T. Wen and T. Kwasniewski, "Phase noise simulation and modeling of ADPLL by system verilog," in BMAS 2008 - Proceedings of the 2008 IEEE International Behavioral Modeling and Simulation Workshop, (San Jose, CA), pp. 29–34, 2008. - [86] P. Upadhyaya, C. F. Poon, S. W. Lim, J. Cho, A. Roldan, W. Zhang, J. Namkoong, T. Pham, B. Xu, W. Lin, H. Zhang, N. Narang, K. H. Tan, G. Zhang, Y. Frans, and K. Chang, "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a configurable ADC in 16nm FinFET," in *Digest of Technical Papers IEEE International Solid-State Circuits Conference*, vol. 61, (San Francisco, CA), pp. 108–110, IEEE, 2018. - [87] M. Hekmat, F. Aryanfar, J. Wei, V. Gadde, and R. Navid, "A 25 GHz fast-lock digital LC PLL with multiphase output using a magnetically-coupled loop of oscillators," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 2, pp. 490–502, 2015. - [88] J. Kim, A. Balankutty, R. Dokania, A. Elshazly, H. S. Kim, S. Kundu, S. Weaver, K. Yu, and F. O'Mahony, "A 112Gb/s PAM-4 transmitter with 3-Tap FFE in 10nm CMOS," in *Digest of Technical Papers IEEE International Solid-State Circuits Conference*, vol. 61, (San Francisco, CA), pp. 102–104, IEEE, 2018. - [89] M. Raj, A. Bekele, D. Turker, P. Upadhyaya, Y. Frans, and K. Chang, "A 164fs-rms 9-to-18GHz sampling phase detector based PLL with in-band noise suppression and robust frequency acquisition in 16nm FinFET," in *IEEE Symposium on VLSI Circuits, Digest of Technical Papers*, (Kyoto, Japan), pp. C182–C183, Xilinx Inc. San Jose, California, USA, 2017. - [90] Standard, "IEEE Standard for SystemVerilog—Unified Hardware Design, Specification, and Verification Language. Standard 1800-2017." - [91] S. Jang, S. Kim, S. H. Chu, G. S. Jeong, Y. Kim, and D. K. Jeong, "An Optimum Loop Gain Tracking All-Digital PLL Using Autocorrelation of Bang-Bang Phase-Frequency Detection," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 62, no. 9, pp. 836–840, 2015. - [92] T. K. Kuan and S. I. Liu, "A digital bang-bang phase-locked loop with automatic loop gain control and loop latency reduction," *IEEE Symposium on VLSI Circuits*, Digest of Technical Papers, vol. 2015-Augus, pp. C138-C139, 2015. - [93] L. Bertulessi, L. Grimaldi, D. Cherniak, C. Samori, and S. Levantino, "A low-phase-noise digital bang-bang PLL with fast lock over a wide lock range," in *Digest of Technical Papers IEEE International Solid-State Circuits Conference*, vol. 61, pp. 252–254, IEEE, 2018. - [94] Q. Huang, C. Zhan, and J. Burm, "A low-complexity fast-locking digital PLL with multi-output bang-bang phase detector," in 2016 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2016, pp. 418–420, IEEE, 2016. - [95] Y. Chen, Y. H. Liu, Z. Zong, J. Dijkhuis, G. Dolmans, R. B. Staszewski, and M. Babaie, "A supply pushing reduction technique for LC oscillators based on ripple replication and cancellation," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 1, pp. 240–252, 2019. - [96] R. B. Staszewski and P. T. Balsara, "Frequency Synthesis," in *All-Digital Frequency Synthesizer in Deep-Submicron CMOS*, ch. 1, pp. 1–5, Hoboken, New Jersey: John Willey & Sons, Inc., first ed., 2006. - [97] F. M. Gardner, "Properties of Phase-Noise Spectra," in *Phaselock Techniques*, ch. 7, pp. 153–159, Hoboken, New Jersey: John Willey & Sons, Inc., third ed., 2005. - [98] M. Perrott, *Noise in Voltage Controlled Oscillators (Lecture Notes)*. Cambridge, MA: MIT Open Courseware, 2005. - [99] A. Hajimiri and T. H. Lee, "Upconversion of Low Frequency Noise," in *The Design of Low Noise Oscillators*, ch. 4, pp. 55–65, Norwell, MA: Kluwer Academic, first ed., 1999. # Appendix A # Oscillator Phase Noise Figure A.1: Oscillator Phase Noise Sources The major sources of PN in a DCO are categorized in Figure A.1. These are external noise described as oscillator frequency-pushing, intrinsic noise associated with the tank resonator resistance (thermal noise) and active negative resistance (thermal and flicker noise), as well as extrinsic noise from the current source (thermal and flicker noise) and oscillator load described as frequency-pulling. It should be noted that resistive elements also generate flicker noise, but this noise is not significant relative to other noise sources. Also, while additional types of noise originate from active circuit elements (e.g., shot noise and burst or popcorn noise) they are not considered significant in this application. A significant source of DCO noise can originate from the external DC power supply regulator. This noise is generally very low frequency, characterized as wander (jitter below 10 Hz) or red noise, to frequencies of a tens of kHz. It tends to roll-off at 20 dB/decade at baseband and 40 dB/decade after up-conversion by the oscillator, as shown by the dashed line in Figure A.2. The noise mechanism is a form of Amplitude Modulation (AM) to Phase modulation (PM) conversion in that variations in supply voltage amplitude will affect the bias points of the transistors (i.e., active negative resistance transistor pair and varactors), which in turn will change the oscillator amplitude and phase at the frequency of the supply noise variation. This is referred to as oscillator frequency-pushing [95] and can be minimized by using low noise DC regulators and external supply filtering. An oscillator pushing conversion-gain, $K_{push}$ , measured in Hz/V integrated over a noise Bandwidth (BW<sub>n</sub>) was developed as follows. The RMS supply noise voltage, $V_{ns}$ , was expressed as (A.1), where $V_{n,peak}$ is in units of $V/\sqrt{BW_n}$ . $$V_{ns} = \frac{V_{n,peak}\sqrt{BW_n}}{\sqrt{2}} \quad (V) \tag{A.1}$$ The DCO Power Supply Rejection Ratio (PSRR) was converted to linear form using (A.2). $$PSRR_L = 10^{\frac{PSRR(dB)}{20}} \quad (V/V) \tag{A.2}$$ Therefore, using (A.1) and (A.2), an expression for $K_{push}$ was developed (A.3). $$\frac{V_{ns}}{PSSR_L}K_{push} = \frac{1}{\sigma_{ps}} \quad (s^{-1})$$ (A.3) where $\sigma_{ps}$ is the PN due to the supply. This result can be found experimentally using Keysight Technologies E5052B Signal Source Analyser. Although the varactors of the frequency tuning array still remains sensitive to frequency-pushing, the class-C oscillator described in this chapter exhibits some immunity to this PN as the negative resistance pair operate mainly in saturation. Figure A.2: Oscillator Phase Noise Spectrum Figure A.2 illustrates the single-sided spectral noise density in units of decibel carrier per hertz (dBc/Hz) [96,97]. The dashed portion of the profile is due to supply pushing and load pulling (i.e., Extrinsic Noise). As mentioned previously, the effects of load pulling are normally eliminated by isolating the oscillator output from the load with buffers. Extrinsic noise originates from circuits that are outside the intrinsic DCO, but share the DCO die. This includes the current source, discussed in sections 5.3 and 5.4, Clock buffer and PLL. The clock buffer of Figure A.1, labelled Driver in the functional block diagram of Figure 4.17, has two main functions. First, it converts the LC-tank oscillator sine-wave output to a rail-to-rail square-wave signal capable of driving the clock distribution circuits. This requires significant additional current, as well as Duty-Cycle Distortion (DCD) correction. Second, it provides isolation between the DCO and the clock distribution circuits by presenting a high stable impedance to the LC-tank. This prevents oscillator frequency-pulling, which is described as variation in clock distribution impedance pulling the LC-tank off its selected frequency. The PLL controls the DCO frequency through the Frequency Tuning control signal shown in Figure A.1. This is a thermometer encoded bus that sets the on/off state of every individual varactor of the DCO frequency tuning array. Each control signal of this bus connects to a varactor through a buffer that provides a consistent control voltage level to the varactor-pair drain/source node. This is illustrated in Figure 4.18, DCO Varactor Row Control Functional Block Diagram. It should be noted that while the binary nature of the varactor control signals provides immunity to noise sources originating at the PLL, these signals are a path for frequency-pushing from the external DC power supply. The DCO has two Intrinsic noise sources: the noise due to the tank loss and noise due to the active negative resistance. The intrinsic noise sources of Figure A.1 are presented in more detail in Figure A.3. Here the noise sources are shown as noise current sources combining to produce a noise voltage or power across the $Z_{tank}$ (A.4). Figure A.3: Tank Resonator Intrinsic Noise Sources The derivation that follows is reproduced from [22, 98] and has been included to add valuable insight into PN analysis of the oscillator described in this work. It begins with the classical equation for LC-tank resonance to which PN or phase variation, $\Delta\omega$ , is added (A.5). Expanding $\omega$ to: $\omega = \omega_0 + \Delta \omega$ and substuting: $\omega_0 = \frac{1}{\sqrt{L_p C_P}}$ gives: $$\omega = \frac{1}{\sqrt{L_p C_P}} + \Delta \omega$$ (A.5) Substituting (A.5) into (A.4) and removing negligible terms gives (A.6). $$Z_{tank}(\Delta\omega) \approx -\frac{j\omega_0 L_p}{2\Delta\omega(\omega_0 L_p C_p)} = \frac{-j}{2} \left(\frac{1}{\omega_0 C_p}\right) \frac{\omega_0}{\Delta\omega}$$ (A.6) Using: $$Q = R_p \omega_0 C_p \implies \frac{1}{\omega_0 C_p} = \frac{R_p}{Q}$$ (A.7) and substituting (A.7) into (A.6) yields the squared impedance magnitude: $$|Z_{tank}(\Delta\omega)|^2 \approx \left(\frac{R_p f_0}{2Q \Delta f}\right)^2$$ (A.8) A reasonable assumption is that the noise currents originating from the negative resistance $(i_{nRn})$ and tank resistance $(i_{nRp})$ are uncorrelated. Therefore, the following equation can be written for noise power (represented as $v_{nOut}^2$ normalized to one Ohm) per unit hertz or noise power spectral density (A.9). $$\frac{v_{nOut}^2}{\Delta f} = \left(\frac{i_{nRp}^2}{\Delta f} + \frac{i_{nRn}^2}{\Delta f}\right) \mid Z_{tank}(\Delta \omega) \mid^2 = \frac{i_{nRp}^2}{\Delta f} \left(1 + \frac{i_{nRn}^2}{\Delta f} \cdot \frac{\Delta f}{i_{nRp}^2}\right) \mid Z_{tank}(\Delta \omega) \mid^2$$ (A.9) where noise factor is: $$F(\Delta f) = \left(1 + \frac{i_{nRn}^2}{\Delta f} \cdot \frac{\Delta f}{i_{nRp}^2}\right)$$ (A.10) and noise factor is defined as: $F(\Delta f) = \frac{total\ tank\ noise\ at\ \Delta f}{tank\ noise\ due\ to\ tank\ loss\ at\ \Delta f}$ The single-sided noise spectrum due to the tank resistance, $R_p$ , can be expressed as: $$\frac{i_{nRp}^2}{\Delta f} = \frac{4kT}{R_p} \tag{A.11}$$ By substituting (A.11) and (A.8) into A.9, the solution for $v_{nOut}^2/\Delta f$ can be written as: $$\frac{v_{nOut}^2}{\Delta f} = \frac{4kT}{R_p} F(\Delta f) \left(\frac{R_p}{2Q} \cdot \frac{f_0}{\Delta f}\right)^2 = 4kTF(\Delta f) R_p \left(\frac{R_p}{2Q} \cdot \frac{f_0}{\Delta f}\right)^2 \tag{A.12}$$ The noise power spectral density, $v_{nOut}^2/\Delta f$ , represents the total phase and amplitude noise of the oscillator output. However, the equipartition theorem [22] states that the noise will split evenly between amplitude and phase when the oscillating signal is a sine wave. As amplitude variations are suppressed by the oscillator feedback, going forward only the accumulated phase deviation or PN is considered in (A.13). $$\frac{v_{nOut}^2}{\Delta f} = 2kTF(\Delta f)R_p \left(\frac{R_p}{2Q} \cdot \frac{f_0}{\Delta f}\right)^2 \tag{A.13}$$ From [96] the definition of single-sided phase noise, $\mathcal{L}(\Delta\omega)$ , is as follows: $$\mathcal{L}(\Delta\omega) = 10 \cdot Log_{10} \left( \frac{noise\ power\ in\ 1 - Hz\ bandwidth\ at\ frequency\ \omega_0 + \Delta\omega}{carrier\ power} \right)$$ (A.14) Alternatively, (A.14) can be described as single-sided PN or half the spectral density of the upper and lower side-band noise, $S_{DSBnoise}(\Delta\omega)$ , as shown by (A.15). $$\mathcal{L}(\Delta\omega) = 10 \cdot Log_{10} \left( \frac{S_{DSBnoise}(\Delta\omega)}{2} \right)$$ (A.15) Oscillator signal output power is referenced to tank loss, $R_p$ , using (A.16). $$P_{signal} = \frac{V_{signal\_rms}^2}{R_n} = \frac{(V_m/\sqrt{2})^2}{R_n}$$ (A.16) where $V_m$ is the voltage magnitude of the tank oscillation signal. The resulting noise power spectral density is determined using (A.17). $$S_{noise}(\Delta f) = \frac{1}{R_n} \cdot \frac{v_{nOut}^2}{\Delta f} \tag{A.17}$$ Equations (A.16) and (A.17) are now combined to produce an expression for $\mathcal{L}(\Delta\omega)$ , (A.18). $$\mathcal{L}(\Delta\omega) = 10 \cdot Log_{10} \left( \frac{S_{noise}(\Delta f)}{P_{signal}} \right) = 10 \cdot Log_{10} \left[ \left( \frac{2kTF(\Delta f)}{P_{signal}} \right) \left( \frac{1}{2Q} \cdot \frac{f_0}{\Delta f} \right)^2 \right]$$ (A.18) The intrinsic noise sources, negative resistance noise plus tank resistance noise, are both developed across $R_p$ ; therefore, these sources can be considered to supply equal noise levels. Thus, the noise factor reduces to: $$F(\Delta f) = \left(1 + \frac{i_{nRn}^2}{\Delta f} \cdot \frac{\Delta f}{i_{nRp}^2}\right) = 2$$ (A.19) and assuming $F(\Delta f)$ is constant across frequency, $\mathcal{L}(\Delta f)$ can be written as: $$\mathcal{L}(\Delta\omega) = 10 \cdot Log_{10} \left[ \left( \frac{4kT}{P_{signal}} \right) \left( \frac{1}{2Q} \cdot \frac{f_0}{\Delta f} \right)^2 \right]$$ (A.20) Equation (A.20) [4] is used to find the $\mathcal{L}(\Delta\omega)$ for the intrinsic thermal noise (random and flat across frequency at baseband), which is up-converted by the oscillator to roll-off at -20 dB/decade. This equation was modified by Leeson [7,39] to compute $\mathcal{L}(\Delta\omega)$ for the three regions, up-converted flicker noise (-30 dB/decade), up-converted thermal noise (-20 dB/decade) and base-band thermal noise (0 dB/decade), of Figure A.2, assuming $F(\Delta f)$ is constant over frequency (A.19). $$\mathcal{L}(\Delta\omega) = 10 \cdot Log_{10} \left[ \left( \frac{2FkT}{P_{signal}} \right) \left( 1 + \left( \frac{1}{2Q} \cdot \frac{f_0}{\Delta f} \right)^2 \right) \left( 1 + \frac{\Delta f_{1/f^3}}{|\Delta f|} \right) \right]$$ (A.21) In summary, the parameters of (A.21) are: F is noise factor (approximately 2), k is Boltzmann's constant (1.38 x $10^{-23}JK^{-1}$ ), T is absolute temperature, $P_{signal}$ is the oscillating signal power, Q is the loaded quality factor of the LC tank, $f_0$ is the frequency of the oscillating signal, $\Delta f$ is the offset frequency at the point of calculation from $f_0$ and $\Delta f_{1/f^3}$ is the frequency of the flicker noise corner. The flicker noise corner is the transition point point between the -30 dB/decade and -20 dB/decade slops. It should be noted that this oscillator flicker noise corner and the base-band flicker noise corner (transition point between -10 dB/decade and 0 dB/decade slops) are not coincident [99]. The impact that noise has on an oscillator output signal can be thought of in terms of a time-varying current impulse injected into the signal. When this current impulse is applied to the output signal peak it has little effect. However, when the current impulse is applied to output signal zero-crossings the resulting phase deviation may be significant. This time-varying function is referred to as the Impulse Sensitivity Function (ISF or $\Gamma$ ) and is approximately proportional to the derivative of the DCO output waveform [22,98]; therefore, it is periodic. Figure A.4, redrawn from [22,98], illustrates the ISF plots that result from an LC-tank oscillator (right) and ring oscillator (left) output waveforms. Comparing top and bottoms plots shows a 90° shift to the right in the peak of the ISF plot. This implies that the points of maximum sensitivity of to PN occur at the zero-crossings of the LC-tank oscillator and at the rise/fall transitions of the the ring oscillator. A worst-case condition for both implementations. However, a closer analysis shows that these implementations are different. Figure A.4: Impulse Sensitivity Function LC vs. Ring Oscillaor Figure A.5 shows that for an ideal differential class-C LC-tank oscillator implementation peak current is injected at the ISF nulls. Therefore, the impact that PN has on this oscillator is less than that of a ring oscillator where peak current injection is coincident with the ISF peaks. In its simplest form, the quality factor [8], or Q, of an oscillator is described by (A.22). Energy is stored in the magnetic field of the inductor and in the electric field of the capacitor as charge. Current flows between these two circuit elements at a rate dependent on the amount of inductance and capacitance, which are both lossless. The parasitic resistance of these circuit elements dissipates energy. In the parallel resonant circuit of the Colpitts oscillator the inductive and capacitive reactances have equal Figure A.5: Impulse Sensitivity Function for LC Oscillator magnitude, so only the resistance, $R_p$ , is left. As this resistance in not infinity, but some finite value, energy is dissipated during every cycle and must be replenished by the active negative resistance circuit to sustain oscillation. $$Q = 2\pi \left( \frac{\text{Maximum Stored Energy}}{\text{Energy Dissipated per Cycle}} \right)$$ (A.22) The Q of an oscillator [8] is normally considered at resonance; thus, the larger the Q, the better the frequency selectivity of the circuit. This leads to a second common definition of Q as the ratio of the peak power frequency or centre frequency, $f_0$ , to the difference between the frequencies at the half power points or circuit Bandwidth (BW) (A.23). $$Q = \left(\frac{\omega_0}{\omega_{3dB_H} - \omega_{3dB_L}}\right) = \left(\frac{f_0}{f_{3dB_H} - f_{3dB_L}}\right) = \frac{f_0}{BW} \tag{A.23}$$ The value of Q across frequency traces out a bandpass filter response. Although no quantitative relationship between Q and ISF is made here, both these measures describe the susceptibility, or conversely the immunity, of an oscillating resonant circuit to PN. This is address in the following discussion. Many oscillators, including the Colpitts oscillator, can be described as harmonic oscillators. This results from a degree of non-linear operation that creates fundamental signal harmonic tones in the output spectrum as opposed to a single pure tone. This is illustrated by the top plot of Figure A.6. Since the ISF is periodic it can be expanded in a Fourier series (A.24). $$\Gamma(\omega_0 \tau) = \frac{C_0}{\sqrt{2}} + \sum_{n=1}^{\infty} C_n \cos(n\omega_0 \tau + \theta_n)$$ (A.24) The top two plots of Figure A.6 show that the coefficients, $C_0, C_1, C_2 \dots C_n$ , of the ISF Fourier series can be used as scaling factors of a down-conversion transfer function. These scaling factors determine how the 1/f noise offset from the fundamental and random noise offset from each harmonic will combine to produce spectral growth at base-band or zero frequency. Figure A.6: Noise Conversion from Intrinsic Noise Sources The bottom plot of Figure A.6 shows the up-conversion or integration of the baseband frequency deviation to Phase Modulation (PM) around the fundamental. This integration is described by (A.25). $$\phi_{out}(t) = \int_{-\infty}^{t} \left( \frac{C_0}{\sqrt{2}} + \sum_{n=1}^{\infty} C_n \cos(n\omega_0 \tau + \theta_n) \right) \frac{i_n(\tau)}{q_{max}} d\tau$$ (A.25) where $i_n(\tau)/q_{max}$ is the input noise current normalized to the maximum charge, $q_{max}$ , $q_{max} = V_{max} \times C_{total}$ and $V_{max}$ is the maximum voltage swing across $C_{total}$ , which is the total capacitance of the LC-tank. In [22, 98] the result of (A.25) is converted to equations for power spectral density (A.26) and single-sided PN (A.27) in dBc/Hz. $$S_{\phi_{out}}(f) = \left(\frac{1}{2\pi\Delta f}\right)^2 \left(\sum_{n=0}^{\infty} C_n^2\right) \frac{1}{4} \left(\frac{1}{q_{max}}\right)^2 \frac{i_n^2}{\Delta f}$$ (A.26) $$\mathcal{L}(\Delta f) = 10 \cdot Log_{10} \left( S_{\phi_{out}}(\Delta f) \right) \tag{A.27}$$ Figure A.7 shows a process similar to Figure A.6 for the 1/f and thermal noise that originates from the current source - details in section 5.3. The current source provides current to both sides of the differential oscillator; therefore, it operates at twice the resonant frequency of the oscillator. As a result the noise around the even harmonics is scaled down by $C_0$ and the even Fourier coefficients. Figure A.7: Noise Conversion from the Current Source Equations (A.28) and (A.29) were derived [22, 98] from (A.27) to compute the PN for the flicker noise (-30 dB/decace) and thermal noise (-20 dB/decode) region of Figure A.2. The variable $f_{1/f}$ is the base-band transistor flicker noise corner frequency. $$\mathcal{L}(\Delta f) \mid_{1/f^3} = 10 \cdot Log_{10} \left[ \left( \frac{1}{2\pi \Delta f} \right)^2 (C_0)^2 \frac{1}{4} \left( \frac{1}{q_{max}} \right)^2 \frac{i_n^2}{\Delta f} \left( \frac{f_{1/f}}{\Delta f} \right) \right]$$ (A.28) It is important to note from (A.28) that the impact that intrinsic and current source flicker noise has on the output PN of the oscillator is directly proportional to $C_0^2$ . This coefficient represents the RMS value of the derivative of the oscillator output signal. If the oscillator output signal is a perfect sine wave, then $C_0 = 0$ . However, any distortion in the signal will result in $C_0$ having a finite value. This is particularly important as process geometries decrease resulting in an increase in flicker noise. $$\mathcal{L}(\Delta f) \mid_{1/f^2} = 10 \cdot Log_{10} \left[ \left( \frac{1}{2\pi \Delta f} \right)^2 \left( \sum_{n=0}^{\infty} C_n^2 \right) \frac{1}{4} \left( \frac{1}{q_{max}} \right)^2 \frac{i_n^2}{\Delta f} \right]$$ (A.29) The impact that thermal noise has on the output PN is directly proportion to the sum of all the squared Fourier coefficients, as can be seen from (A.29). The exception is for the current source thermal noise, which is affected by the even coefficients, as was explained earlier. An important consequence of this is that the base-band transistor flicker corre frequency is not the same as the oscillator flicker corner frequency. This is shown in (A.30), where $\Delta f_{1/f^3}$ is the oscillator flicker noise corner offset frequency and $f_{1/f}$ is the base-band transistor flicker noise corner frequency and $\Delta f_{1/f^3} < f_{1/f}$ . $$\Delta f_{1/f^3} = \left(\frac{C_0^2}{\sum_{n=0}^{\infty} C_n^2}\right) f_{1/f} \tag{A.30}$$ The 0 dB/decade single-sided PN level, shown in Figure A.2, is determined from (A.31), where F = 2 and the circuit is normalized to 1 $\Omega$ . $$\mathcal{L}(\Delta f) \mid_{0} dB = 10 \cdot Log_{10} \left( \frac{2FkT}{P_{sig}} \right)$$ (A.31) # Appendix B # **Array Resolution Derivation** This appendix details the derivation of a closed-form solution that establishes a relationship between a frequency tuning array step change in capacitance, $\Delta C^T$ , and the resulting step change in frequency, $\Delta f^T$ . This relationship is used to determine if the minimum $\Delta C^T$ results in a frequency resolution that meets the PN requirements of the DCO. The minimum tuning capacitance (i.e., varactor size) required of the DCO is determined using (B.9). The derivation begins with the fundamental equation for an LC-tank resonant frequency (B.1) to which we include the relationship between $\Delta f$ and $\Delta C$ . $$f = \frac{1}{2\pi\sqrt{LC}} \tag{B.1}$$ $$(f - \Delta f) = \frac{1}{2\pi\sqrt{L(C + \Delta C)}}$$ (B.2) $$\sqrt{C + \Delta C} = \frac{1}{2\pi\sqrt{L}(f - \Delta f)}$$ $$C + \Delta C = \frac{1}{4\pi^2 \cdot L \cdot (f - \Delta f)^2}$$ $$\Delta C = \frac{1}{4\pi^2 \cdot L \cdot (f - \Delta f)^2} - C$$ (B.3) where: $$\sqrt{C} = \frac{1}{2\pi\sqrt{L} \cdot f}$$ and $C = \frac{1}{4\pi^2 \cdot L \cdot f^2}$ (B.4) Then put (B.4) into (B.3) to get: $$\Delta C = \frac{1}{4\pi^{2} \cdot L \cdot (f - \Delta f)^{2}} - \frac{1}{4\pi^{2} \cdot L \cdot f^{2}}$$ $$= \frac{4\pi^{2} \cdot L \cdot f^{2} - 4\pi^{2} \cdot L \cdot (f - \Delta f)^{2}}{16\pi^{4} \cdot L^{2} \cdot f^{2} \cdot (f - \Delta f)^{2}}$$ $$= \frac{4\pi^{2} \cdot L \cdot (f^{2} - (f - \Delta f)^{2})}{16\pi^{4} \cdot L^{2} \cdot f^{2} \cdot (f - \Delta f)^{2}}$$ $$= \frac{f^{2} - (f^{2} - 2f\Delta f + \Delta f^{2})}{4\pi^{2} \cdot L \cdot f^{2} \cdot (f - \Delta f)^{2}}$$ $$= \frac{(2f\Delta f + \Delta f^{2})}{4\pi^{2} \cdot L \cdot f^{2} \cdot (f - \Delta f)^{2}}$$ $$= \frac{(2f - \Delta f) \cdot \Delta f}{4\pi^{2} \cdot L \cdot f^{2} \cdot (f - \Delta f)^{2}}$$ (B.5) If $f \gg \Delta f$ , then we can simplify (B.5) by ignoring $\Delta f$ where it is subtracted from f, resulting in $$\Delta C \approx \frac{2f \cdot \Delta f}{4\pi^2 \cdot L \cdot f^2 \cdot f^2}$$ Finally: $$\Delta C \approx \frac{1}{2} \cdot \frac{\Delta f}{\pi^2 \cdot L \cdot f^3}$$ (B.6) It is equally correct to begin this derivation with (B.2) replaced by (B.7). $$(f + \Delta f) = \frac{1}{2\pi\sqrt{L(C - \Delta C)}}$$ (B.7) This results in the replacement of (B.6) with its negation (B.8). $$\Delta C \approx -\frac{1}{2} \cdot \frac{\Delta f}{\pi^2 \cdot L \cdot f^3}$$ (B.8) Therefore, it follows that both these results can be generalized in (B.9) as the absolute value of $\Delta C$ . $$|\Delta C| \approx \frac{1}{2} \cdot \frac{\Delta f}{\pi^2 \cdot L \cdot f^3}$$ (B.9) This result (B.9) is consistent with [6], repeated here as (B.10), which solves for a change in tracking frequency, $\Delta f^T(f)$ , resulting from a change in tracking bank capacitance, $\Delta C^T$ . $$\Delta f^T(f) = -2\pi^2 L \Delta C^T f^3 \tag{B.10}$$ Using (B.9) the required capacitance resolution for the 14-GHz DCO was found to be 75 aF (B.11) from the frequency deviation requirement of 2.0 MHz discussed in section 4.2. $$\Delta C^{T}(f) \approx \frac{1}{2} \cdot \left( \frac{2.0 \times 10^{6}}{\pi^{2} \cdot 500 \times 10^{-12} \cdot (14.0 \times 10^{9})^{3}} \right) = 75 \ aF$$ (B.11) ## Appendix C ### DCO Model Code #### C.1 Contents - row\_evaluation.m This section lists the MATLAB® code (row\_evlauation.m) that models basic frequency tuning (Figure C.1), parallel resistance of the varactor array (Figure C.2), total parallel resistance including both inductor and varactor array (Figure C.3), current requirement for tank amplitude (Figure C.4) and tank amplitude (Figure C.5). - Frequency Calculation Slow, rc\_Cworst\_CCworst - Frequency Calculation Fast, rc\_Cbest\_CCbest - Parallel Resistance Correction Factor for Frequency - Loss Calculation - Plot Rp due to Varactor Selection - Inductor Loss - Plot Rp due to both Varactor Selection and Inductor - Calculate required current - Plot required current for each setting - Plot Vm for each setting. ``` clc; clear all; close all; freq = 28e9; m = 20; n = 8; ``` ``` K = 1:1:m*n+1; % L = 95.6e-12; L = 90.5e-12; % L = 105.7e-12; % L = 100e-12; QL = 15; Amp = 0.5; ``` #### Frequency Calculation - Slow, rc\_Cworst\_CCworst ``` Con = (1+0.086)*0.91e-15; Coff = (1+0.060)*0.26e-15; Cpar = 5.743e-15; Cdiv = 21.51e-15; Cgm = 120.1e-15; ``` #### Frequency Calculation - Fast, rc\_Cbest\_CCbest ``` Con = (1-0.081)*0.91e-15; Coff = (1-0.059)*0.26e-15; Cpar = 5.569e-15; Cdiv = 12.86e-15; Cgm = 91.5e-15; Cvar = m*(Cpar+n*Coff)+K*Con*(1-Coff/Con); % Cvar = [m*(Cpar+n*Coff) Cvar]; C = Cvar+Cgm+Cdiv; f_dco = 1/2/pi./sqrt(C*L); figure(1); plot(K,f_dco/1e9,'-','color','b'); title('\fontsize{22} Frequency Range, Extracted Corners - Layout C'); xlabel('\fontsize {22} Varactor Array Code, S'); ylabel('\fontsize {22} Frequency (GHz)'); grid on; ``` #### Parallel Resistance Correction Factor for Frequency ``` CF = (28e9./f_dco).^2; ``` #### Loss Calculation ``` % Cworst_CCworst Qmax = [63.96, 34.22, 24.58, 19.84, 17.01, 15.12, 13.77, 12.75, 11.93]; % Cbest_CCbest \% Qmax = [291.6,126,85.8,67.65,57.31,50.62,45.93,42.43,39.67]; \% Cap = [8.861, 9.618, 10.37, 11.13, 11.89, 12.64, 13.4, 14.16, 14.91] *1e-15; p = 0:1:8; Cap = (Cpar+n*Coff)+p*Con*(1-Coff/Con); Rs_min = 1./(2*pi*freq.*Cap.*Qmax); Rp_max = 1./(Rs_min.*(Cap.*2*pi*freq).^2); for j = 2:m Cond1_x(j)=1/(Rp_max(length(Rp_max))/(j-1)); end for j = 1:m matCond2(j,(2:9))=1./Rp_max(2:9); end for j = 1:m-1 Cond3_x(j)=1./(Rp_max(1)/(m-j)); end % Cond1_x=[Cond1_x 0]; Cond1=Cond1_x'; Cond3_x = [Cond3_x 0]; Cond3=Cond3_x'; ``` #### Plot Rp due to Varactor Selection ``` figure(2); plot(K,matR5); title('\fontsize{22} Rp Due to Varactor Array - Layout C'); xlabel('\fontsize{22} Varactor Array Code, S'); ylabel('\fontsize{22} Rp (Ohms)'); grid on; % set(gca,'YTick',[50 100 150 200 250 300 350 400 450 500 600 ... % 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 ... %2000 2200 2400 2600 2800 3000 3500 4000 4500 5000 5500 6000 ... %7000 8000 9000]); ``` #### **Inductor Loss** ``` matR6 = 1./((1./matR5)+(1/RLp)); ``` #### Plot Rp due to both Varactor Selection and Inductor ``` figure(3); plot(K,matR6); title('\fontsize{22} Rp for Varactor Array and Inductor - Layout C'); xlabel('\fontsize{22} Varactor Array Code, S'); ylabel('\fontsize{22} Rp (Ohms)'); grid on; % set(gca,'YTick',[120 125 130 135 140 145 150 155 160 165 170 ... % 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245]); ``` #### Calculate required current ``` Ireq = Amp*pi./matR6; ``` #### Plot required current for each setting ``` figure(4); plot(K,Ireq*1000); title('\fontsize{22} Current for 1.0 Vpp Tank Amplitude - Layout C'); xlabel('\fontsize{22} Varactor Array Code, S'); ylabel('\fontsize{22} Ireq (mA)'); grid on; ``` #### Plot Vm for each setting. ``` Ireq = 10; Vm = Ireq*matR6/pi; figure(5); plot(K,Vm); title('\fontsize {22} Single Ended Tank Voltage (Vm) - Layout C'); xlabel('\fontsize {22} Varactor Array Code, S'); ylabel('\fontsize {22} Vm (mV)'); grid on; ``` Figure C.1: 28-GHz Frequency Tuning Range with 20 Rows Figure C.2: Parallel Array Resistance - 28 GHz Figure C.3: Parallel Resistance Including Inductor and Array - 28 GHz Figure C.4: Required DCO Current - 28 GHz Figure C.5: Tank Amplitude from 10 mA - 28 GHz # C.2 Contents - DCO\_tuning\_range\_evaluation.m This section lists the MATLAB® code (DCO\_tuning\_range\_evaluation.m) that models coarse/fine tuning of the DCO and plots the error between the model and circuit simulated frequency results. - High Frequency cbest\_ccbest - Typical - Low Frequency cworst\_ccworst #### High Frequency cbest\_ccbest ``` Con = 0.750e-15; %N7_SERDES56G_tb tb_pvt_row_2 adexl_0 2016/07/26 Coff = 0.231e-15; %pss sim ss_mos/tt_res/tt_mom/1.5V/0.67V/-40C/500mVp Cpar = 5.288e-15; Cdiv = 15.6e-15; %pss sim ss/0.675V/-40C/500mVp 2016/07/26 %pss sim ss/0.675V/-40C/500mVp 2016/07/26 Cgm = 101.6e-15; CarryI = A*(Con-Coff)+TypeI*(Cpar+n*Coff); CarryII = B*(1/2)*(Con-Coff)+TypeII*(Cpar+(n/2)*Coff); CarryIII = C*(1/2)*(5/6)*(Con-Coff)+TypeIII*(Cpar+(n/2)*(5/6)*Coff); Ctot = CarryI+CarryII+CarryIII+Cgm+Cdiv; f_dco_tt = 1./(2*pi.*sqrt(Ctot*L)); axes('fontsize',16); plot(S,f_dco_tt/1e9,'--','color','r'); xlabel('\fontsize {22} Varactor Array Code'); ylabel('\fontsize {22} Frequency (GHz)'); set(gca,'YTick',[23 24 25 26 26.4 26.6 26.8 27.0 27.2 27.4... 27.6 27.8 28.0 28.2 28.4 28.6 28.8 29.0 29.2 29.4 29.6 29.8... 30 31 32 33 34]); grid on; hold on; ``` #### **Typical** ``` Con = 0.893e-15; %N7_SERDES56G_tb tb_pvt_row_2 adexl_0 2016/07/26 Coff = 0.257e-15; %pss sim tt_mos/tt_res/tt_mom/1.5V/0.75V/500mVp Cpar = 5.554e-15; %pss sim tt_mos/tt_res/tt_mom/1.5V/0.75V/500mVp Cdiv = 15.93e-15; Cgm = 107.8e-15; CarryI = A*(Con-Coff)+TypeI*(Cpar+n*Coff); CarryII = B*(1/2)*(Con-Coff)+TypeII*(Cpar+(n/2)*Coff); CarryIII = C*(1/2)*(5/6)*(Con-Coff)+TypeIII*(Cpar+(n/2)*(5/6)*Coff); Ctot = CarryI+CarryII+CarryIII+Cgm+Cdiv; f_dco_T = 1./(2*pi.*sqrt(Ctot*L)); plot(S,f_dco_T/1e9,'-','color','m'); grid on; ``` #### Low Frequency cworst\_ccworst ``` Con = 1.027e-15; %N7_SERDES56G_tb tb_pvt_row_2 adexl_0 2016/07/26 Coff = 0.266e-15; Cpar = 5.955e-15; Cdiv = 16.09e-15; Cgm = 111.4e-15; CarryI = A*(Con-Coff)+TypeI*(Cpar+n*Coff); CarryII = B*(1/2)*(Con-Coff)+TypeII*(Cpar+(n/2)*Coff); CarryIII = C*(1/2)*(5/6)*(Con-Coff)+TypeIII*(Cpar+(n/2)*(5/6)*Coff); Ctot = CarryI+CarryII+CarryIII+Cgm+Cdiv; f_dco_L = 1./(2*pi.*sqrt(Ctot*L)); plot(S,f_dco_L/1e9,'-.','color','b'); grid on; ``` ``` Code_F = [0 340 700 1040 1400 1720]; Code_S = [20 \ 340 \ 700 \ 1020 \ 1400 \ 1720]; Fast = [30.327 29.726 29.074 28.473 27.863 27.341]; Slow = [29.125 28.234 27.300 26.537 25.689 25.020]; plot(Code_F,Fast,'-+','color','r'); plot(Code_S,Slow,'-x','color','b'); f0 = ones(1, max_sel+1)*28; plot(S,f0,'-','color','r'); legend('\fontsize{20} High Freq cbest\_ccbest',... '\fontsize{20} Typical',... '\fontsize{20} Low Freq cworst\_ccworst',... '\fontsize{20} Sim High Freq cbest\_ccbest',... '\fontsize{20} Sim Low Freq cworst\_ccworst',... '\fontsize{20} 28 GHz Datum'); hold off; % Determine error. f_{dco_tt_measured} = [30.8e9, 30.15e9, 29.49e9, 28.85e9, ... %28.25e9,27.67e9]; f_dco_tt_measured = [30.327e9, 29.726e9, 29.074e9, 28.473e9, ... 27.863e9,27.341e9]; f_{dco_tt_matlab} = [f_{dco_tt(1)}, f_{dco_tt(341)}, f_{dco_tt(701)}, \dots] f_dco_tt(1041),f_dco_tt(1401),f_dco_tt(1721)]; f_dco_tt_err = (f_dco_tt_matlab-f_dco_tt_measured); f_dco_tt_err = f_dco_tt_err./f_dco_tt_measured*100; f_{dco_T_{measured}} = [30e9, 29.29e9, 28.57e9, 27.86e9, ... %27.18e9,26.52e9]; f_dco_T_measured = [30e9, 29.29e9, 28.57e9, 27.86e9, ... 27.18e9,26.52e9]; f_{dco_T_{matlab}} = [f_{dco_T(1)}, f_{dco_T(349)}, f_{dco_T(697)}, ... ``` ``` f_dco_T(1045),f_dco_T(1393),f_dco_T(1740)]; f_dco_T_err = (f_dco_T_matlab-f_dco_T_measured); f_dco_T_err = f_dco_T_err./f_dco_T_measured*100; f_dco_L_measured = [29.49e9, 28.47e9, 27.52e9, 26.63e9, ... 25.81e9,25.05e9]; f_dco_L_measured = [29.125e9, 28.234e9, 27.300e9, 26.537e9, ... 25.589e9,25.020e9]; f_dco_L_matlab = [f_dco_L(21), f_dco_L(341), f_dco_L(701), ... f_dco_L(1021),f_dco_L(1401),f_dco_L(1721)]; f_dco_L_err = (f_dco_L_matlab-f_dco_L_measured); f_dco_L_err = f_dco_L_err./f_dco_L_measured*100; figure(2); axes('fontsize',16); plot(Code_F,f_dco_tt_err,'--+','color','r'); xlabel('\fontsize{22} Varactor Array Code'); ylabel('\fontsize{22} Error (%)'); grid on; hold on; plot(Code_S,f_dco_L_err,'-.x','color','b'); legend('\fontsize{20} High Freq cbest\_ccbest',... '\fontsize{20} Low Freq cworst\_ccworst',... 'Location','NorthEast'); ``` # Appendix D # Inductor Leg Parasitic Analysis Figure D.1 shows a 3-D EM simulated (using Peakview<sup>®</sup> [58]) parasitic inductance of 1.65 pH and calculated resistance of 11.0 $m\Omega$ on each path from the inductor to the gain block. These values are very small; however, their relative significance must be determined before they can be confidently excluded from the MATLAB<sup>®</sup> oscillator models. Figure D.1: DCO Leg Inductance A parasitic inductance of 381.3 fH and resistance of 2.4 $m\Omega$ were found to exist on the conductor path from the gain block to the first varactor row and between each successive row to the $pll\_divider$ . This poses an interesting problem. That is, it is tempting to find a lumped equivalent value of these parasitics by simply summing the value of each parasitic element as if they were series components between the $pll\_dco\_gm$ block and the $pll\_divider$ block. However, this is incorrect as each successive varactor row sees the sum of all the parasitic components in series back to the $pll\_dco\_gm$ block. Stated explicitly for the parasitic resistance, $R_{para}$ , where $R_{para}$ is the interconnect resistance between each row, the total parasitic resistance between the $pll\_dco\_gm$ block and the $n^{th}$ row is $nR_{para}$ for each inductor leg. Therefore, the total parasitic resistance seen by all row elements across each inductor leg is $R_{para\_SUM}$ (D.1). $$R_{para\_SUM} = R_{para} + 2R_{para} + 3R_{para} + 4R_{para} \dots + nR_{para}$$ (D.1) This can be simplified to (D.2). $$R_{para\_SUM} = \left(\frac{n(n+1)}{2}\right) R_{para} \tag{D.2}$$ If the varactor array is considered a single element, then it is proposed here that the lumped parasitic resistance seen by the varactor array is the average of the total parasitic resistances, $R_{para\_AVG}$ , seen by all array rows (D.3). $$R_{para\_AVG} = \frac{R_{para\_SUM}}{n} = \left(\frac{n(n+1)}{2n}\right) R_{para}$$ (D.3) Additionally, each unit varactor element consists of two series transistors, which leads to the assumption that the total capacitance of each unit element is one half the capacitance of one transistor for both Con and Coff states. This series connection assumption is extended to the parasitic path; thus, $R_{para\_AVG}$ needs to be doubled to account for both legs. Therefore, the lumped resistance, $R_{para\_LUMPED}$ is found using (D.4). $$R_{para\_LUMPED} = \left(\frac{n(n+1)}{2n}\right) 2R_{para} = (n+1) R_{para}$$ (D.4) It is proposed here that this parasitic lumped R can be approximated by (D.5). $$R_{lumped} = (n+1) R_{para} = (22+1) 2.4 \ m\Omega \approx 55.2 \ m\Omega$$ (D.5) Table 4.16 shows that for the 8.1 $\mu m$ M12 Cu conductor there is little difference between $R_{DC}$ and $R_{HF}$ at 14 GHz. It is assumed that proximity effects would be minimal across the inductor legs. While leakage losses may increase the value of $R_{lumped} \approx 55.2 \, m\Omega$ significantly, this resistance value was considered small, < 2 %, in comparison to the series resistance of the inductor. Therefore, this parasitic resistance was not considered further. It was assumed that there is minimal mutual-inductance between the parasitic inductances sections of the inductor legs. Therefore, the total self-inductance for each leg of 22 rows is $22 \times 381.3$ f + 1.85 p = 10.24 pH or approximately 2 % of the total inductance. As was discussed for the lumped parasitic resistance, the total lumped parasitic inductance was determined by replacing $R_{para}$ in (D.4) with the parasitic inductance between each row. This result was determined using (D.6). $$L_{lumped} = (N+1)L = (22+1)381.3 fH \approx 8.77 pH$$ (D.6) This value is very small when compared to the main inductance value, so will not be considered further. Additional field solver simulations confirmed that the oscillator LC-tank self-inductance is mostly insensitive to process and temperature variation. That is, only a $\pm$ 1 % resonant frequency variation was associated with self-inductance variation. It should be noted that additional frequency tuning margin was considered in section 4.8.