VLSI 课堂笔记（下）-CSDN博客

本文链接：https://blog.csdn.net/water_yellow/article/details/136360462

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

在这里插入图片描述
The inverter, w is the width of the transistor, noticing that how the width change in the different combinational logic gate.

NAND gate (recommended)

2-INPUT
在这里插入图片描述

3-INPUT
在这里插入图片描述
4-INPUT

(p=20, n=10)

NOR gate

2-INPUT
在这里插入图片描述

3-INPUT
在这里插入图片描述

4-INPUT
在这里插入图片描述

COMPLEX GATE

for “OR(+)” logic, nmos parallel, pmos series
for “AND( $\cdot$ )” logic, nmos series, pmos parallel
then connect two part

$\overline {AB+CD}$
在这里插入图片描述

$\overline {(A+B)C}$
在这里插入图片描述

fanout parameter

在这里插入图片描述
$F=\frac {32C}C=32\\ f_{output}=\sqrt[N] F=\sqrt[5] {32}=2\\ t=Nt_{p0}(1+f/\gamma)=Nt_{p0}(1+f)=5t_{p0}(1+2)=15t_{p0}\\ E_{supply}=30CV_{DD}^2\;(30C=2C+4C+8C+16C)$

在这里插入图片描述
$t=t_{p0}(1+\frac{out}{in})\\ t=t_{p0}(1+\frac {S_2}{1})+t_{p0}(1+\frac {12}{S_2})+t_{p0}(1+\frac {S_4}{12})+t_{p0}(1+\frac {6}{S_4})+t_{p0}(1+\frac {64}{4})$

在这里插入图片描述
fanout is not 3 but (1+4+2)=7, $t=t_{p0}(1+7)=8t_{p0}$

在这里插入图片描述
fanout is not 1 but 8, $t=t_{p0}(1+8)=9t_{p0}$

equalized transistor size

series
make width(nominator) the same, then add the length(denominator)
在这里插入图片描述

parallel
make length(denominator) the same, then add the width(nominator)
在这里插入图片描述

Static circuit : every signal is driven by V_DD or ground(directly or indirectly)

Frequency of output flipping （0 to 1）

for 2-NAND gate
在这里插入图片描述

$P_0(0\to 1)=P_0(0)*P_0(1)=(\frac 14)*(\frac 34)=\frac 3{16}$

for 2-NOR gate
在这里插入图片描述

$P_0(0\to 1)=P_0(1)*P_0(0)=(\frac 14)*(\frac 34)=\frac 3{16}$

Design Techniques to Reduce Switching Activity

Logic Restructuring

probility:

1	$O_1$	$O_2$	F
$p_1$ (chain)	1/4	1/8	1/16
$p_0=1-p_1$ (chain)	3/4	7/8	15/16
$p_{0\to1}$ (chain)= $p_0\times p_1$ (chain)	3/16	7/64	15/256
$p_1$ (tree)	1/4	1/4	1/16
$p_0=1-p_1$ (tree)	3/4	3/4	15/16
$p_{0\to1}$ (tree)= $p_0\times p_1$ (tree)	3/16	3/16	15/256

Input Ordering
probability of $Z$ is equal
in the first circuit:
$P_{0\to 1}(intermediate \;node)=(1-0.5\times0.2)\times(0.5\times0.2)$
in the second circuit:
$P_{0\to 1}(intermediate \;node)=(1-0.2\times0.1)\times(0.2\times0.1)$
Time-multiplexing resource
Glitch Reduction by balancing signal paths

logical effort

modified basic delay equation:
$t_p=t_{p0}(p+gf/\gamma)$
$t_{p0}$ : intrinsic delay
$f$ : electrical effort, ratio between external load and input capacitance
$p$ : the ratio of the intrinsic(or unloaded) delay of gate and inverter
$g$ : logical effort, how much more input capacitance a gate presents to deliver the same output current as an inverter.

Logic efforts of common logic gates, assuming a PMOS/NMOS ratio of 2

在这里插入图片描述

Path effort $H$

$H = GBF$
$G$ : logic effort $G=\prod g_i$
$F$ : electrical effort, $C_L/C_{IN}$
$B$ : path branching effort, $B=\prod b_i=\prod(\frac{C_{onpath}+C_{offpath}}{C_{onpath}})$
在这里插入图片描述
BEST stage effort (the gate effort that minimize the path delay): $\sqrt[N] H$
minimum delay: $D=t_{p0}(\sum_{j=1}^Np_j+\frac{N(\sqrt[N] H)}{\gamma})$
example:

Ratioed logic

Differential Cascode Voltage Switch Logic (DCVSL)

a ratioed logic style that completely eliminates static currents and provides rail-to-rail swing. Such a gate combines two concepts: differential logic and positive feedback.
在这里插入图片描述
XOR-XNOR DCVSL gate

AND/NAND gate in DCVSL:

pass transistor logic(not optimal)

allowing the primary inputs to drive gate terminals as well as source/drain terminals
在这里插入图片描述
B=H — top transistor turn on — copies the input A to the output F.
B=L — bottom pass transistor turn on — passes a 0.

Transmission Gate Logic

Robust and Efficient Pass-Transistor Design, low static power dissipation and increased noise margins
在这里插入图片描述
placing a NMOS device in parallel with a PMOS device
$A=B\;if \;C=1\\ cutoff\;if\;C=0$
Transmission gate multiplexer:
$\bar{F}=(A\cdot S+B\cdot \bar{S})$

Transmission gate XOR
在这里插入图片描述
if B=1, M1M2 inverter work, M3M4 cutoff, $F=\bar{A}B$
if B=0, M1M2 cutoff, M3M4 transmission gate work, $F=A\bar{B}$

Dynamic logic design (with clock signal)

schmitt trigger

The voltage-transfer characteristic displays different switching thresholds for positive- and negative-going input signals — can suppress the ringing on the signal.
在这里插入图片描述
CMOS Schmitter Trigger

When $V_{in}=0$ , $V_{out}=0$ , the feedback loop biases the PMOS M4 while M3 is off.
Now, M2M4 in parallel and M1 become an inverter. The effective transistor ratio of the inverter to kM1/(kM2+kM4), which moves the switching threshold upwards.
Once $V_{in}=1$ , $V_{out}=1$ , the feedback loop turns off M4, and the NMOS M3 is activated. This extra pull-down device speeds up the transition.
When initial $V_{in}=1$ , the pull-down network originally consists of M1and M3 in parallel, while the pull-up network is formed by M2. This reduces the value of the switching threshold to $V_{M-}$ .

BI CMOS

BICMOS = bipolar + CMOS
good driving capability, low power, small size but expensive

bipolar two input NAND
(figure)

(cascade) Domino logic

Precharge: clock signal=0, all the output charge to VDD
evaluation: clock signal =1, $\begin{cases} \text{ouput discharge}&\text{ if PDN/foot device/ ground switch on}\\\text{prechaged value remain store}&\text{ if PDN/foot device/ ground switch off}\end{cases}$
this called conditionally discharge

The inverter between each stage: keep the input of each stage obeying monatomicity principle ( $0\to1$ )
Reason:
domino cascade

during precharge(clk=0), out1 and out2 charge to 1. then clk=1 and In=1, out1 discharge( $1\to 0$ ). out1 is in2, in2 from 1 to 0, the situation of out2 change from discharge to “store”. Not sure the $\Delta V$ of discharging, so not sure the level of the output $\to$ output is undefined.
So we want to guarantee the inputs can only make a single $0\to1$ transition during the evaluation period.

Domino type
(figure)
D2 type domino no foot device, which avoid unexpected discharge, reduce transistor size but more dangerous.

comparation between Domino circuit and Static circuit

Domino	Static
favors down transition	balanced output(rise time=fall time)
only need pull-down network	transistor should be able to pull up as well as pull down
less gate loading, input drives only NMOS	more gate loading, input drive both PMOS and NMOS
logic threshold voltage=device threshold voltage, earlier switching point and less delay	logic threshold voltage= $\frac {V_{DD}}2$ , slower
require clock signal, increase clock power and clock loading	easy to design
dangerous	robust and safe

Issues in dynamic design:

charge leakage
charge sharing
backgate coupling
clock feedthrough

charge sharing

(figure6.59)

After precharged to VDD, during evaluation, $A0\to 1$
transistor Ma on, the charge stored originally on capacitor $C_L$ is redistributed over $C_L$ and $C_a$ . This causes a drop in the output voltage.
if $\Delta V_{out}<V_{TN}$
$\Delta V_{out}=-\frac{C_a}{C_L}[V_{DD}-V_{TN}(V_X)]\\ V_X=V_{DD}-V_{TN}(V_X)$
if $\Delta V_{out}>V_{TN}$
$\Delta V_{out}=-V_{DD}(\frac{C_a}{C_a+C_L})\\ V_X=V_{out}$

Clock delay domino (CD domino)

Using a self-timed delay-matched clock tree for the precharge and evaluation clock in a pipeline stage. The clock to gate is delayed until the input to the gate have been stabilized.

Relationship between clk and input signal for domino circuit

Successive D2 gate, should start precharge after the previous domino D2 type finish precharging(guarantee input is 0)
Evaluation clock edge must arrive before inputs have evaluated. Clock evaluation edge should go high and after that input should arrive

•Domino block: we can not combine static signal and domino signal, unless domino signal goes through a dynamic-to-static invertor.

Clock skew(clock misalignment) & Jitter

Clock skew: Spatial variation of the clock signal. Delay between the global clock signal arrive at different hierarchical part in a IC. Clock skew should be less than $0.1T_{cycle}$
Jitter: Different cycle time of each cycle. Cycle time is not constant

$\text{sequential circuit design}\begin{cases} \text{transparent latch}\begin{cases}\text{clock positive latch}\\\text{clock negative latch}\end{cases}\\ \text{flip-flop}\begin{cases}\text{edge trigger}\\\text{master slave}\end{cases} \end{cases}$

master and slave

(figure 7.18)

feedback inverter $I_2$ and $I_4$ should be “weak” (small size)
first stage:

CLK=0	Q=Data
CLK=1	Q(n)=Q(n-1)

second stage:

CLK=0	Q(n)=Q(n-1)
CLK=1	Q=Data

setup time and hold time in flip-flop

![setupandhold ](https://img-blog.csdnimg.cn/direct/497355b6a41543278955e77d8088442f.png)

setup time: the minimum amount of time before the clock’s active edge that the data must be stable for is to be latched correctly
hold time: minimum amount of time after the CLK’s active edge during which data must be stable

H-tree 在这里插入图片描述

a path for CLK signal to traverse the IC
clock buffer: Ideal clock arrival time is 0, adding buffer every where

clock skew causes:

different VDD
temperature variation
inter-connect mismatch
process variation
clock loading different buffer drive different number of flip-flop

clock header: solve the CLK skew causes

make all clock buffer orientation at the same time
thermal analysis
make wider buffer

clock grid structure (shield each CLK with DC signal between)

Power distribution:

IR drop
$L\frac{di}{dt}$ (conductor)
$\Delta V_{DD}$ : system power variation
tested guard band: making sure if one part of design fails, the chip still can continue to operate

Phase Lock Loop (PLL)

synchronize the local clock an system clock
(figure)

local clock and system clock are compared using a phase detector, output UP(phase lag) or DOWN(phase lead) signal
up and down signals are fed into a charge pump, which translates the digital encoded control information into analog voltage.
Loop filter remove high frequency component, reduce jitter
In voltage controlled oscillation, up signal speed up VCO, causes the local signal to catch up system clock. Down signal slow down VCO eliminating the phase lead of the local clock, until two clock synchronize
goes to phase detector again

3-stage current starved VCO (can have more stages)
(figure)

Timing: the time delay when signal propagates from flip-flop1 to flip-flop2

Sum of delay:
$T_{clock\_to\_Q}+T_{wire}+T_{logic}+T_{wire}+T_{setup}\le T_{cyc}$

Latch based design VS flip-flop based design

Latch based	flip-flop bases
need data path(many data in parallel)	easy to design
timing flexibility	strict timing
time borrowing
timing verification(bad)

can not use flip-flop and latch at the same part

time borrowing (figure)

Memory design - ROM RAM

ROM: read only memory
RAM: random access memory
both ROM and RAM are volatile memory

$ROM\begin{cases} \text{Mask programmed}\\\text{Programmable ROM} \\\text{Eraserable ROM(EOROM;EEROM)}\end{cases}$

$RAM\begin{cases} \text{static RAM}\\\text{dynammic RAM(high threshold voltage with body effect for less leakage but slower)} \\\end{cases}$

speed from high to low: flip-flop and latch > register file > cache(SRAM)

4bits*4bits=16 bits Rom

Column line is precharged(both RAM and ROM). Row input for both ROM and RAM, ROM column output, RAM column both input and output

(figure of ROM)

read ROM data: active transistor will pull the whole column down, so that column output 0

Memory access time: row access time+ column access time （worst case）
Row access time (worst case): RC delay, from resistor R2 to R256(end of the row)
Column(bit line) access time: (worst case) $C_{tot}=C_{wire}+(C_{drain\_body}+C_{gate\_drain})\times 128$

ROM compile: determine the number of row line and column line of the ROM
example:
32K ROM, = $2^{15}=32,168=2^1\times2^{14}=2^2\times2^{13}=2^3\times2^{11}=...=2^7\times2^{18}$ any possible combination

Read and write RAM

two basic principle:

we should be able to overwrite the existing data (if stored 1, overwrite to 0)
when we read, we should be able to read without modifying the existing data

soft error: some outer IC $\alpha$ particles have high energy, hit the IC and will change the data. To protect the IC from $\alpha$ particles, we usually increase the cell size so decrease the density

(figure of waveform)

(figure of TR)

store 0 and read 0:

Stored 0: M1-on M5-off M2-off M6-on. Node N1=0.
precharge phase will charge C to 1.
Read 0: M4-on M3-on, C connect directly to Node N1. We want to keep the N1=0(do not modify existing data), so M1 still open, pull down the C to 0(successfully read 0).
Notice that M1 should be bigger(stronger) than M3 to pull down C

store 1 and write 0:

store 1: M1-off M5-on M2-on M6-off, N1=0
$C$ and $\tilde{C}$ is charged to 1
write 0: M3M4-off M1-on, M5-off, M2-off, M6-on, N1=0, N2=1.

data write circuit (DWC)： $W$ and $\tilde{W}$ , $\tilde{W}=1$ , not write, both M1 M2 off

sense amp: magnify the small transition

SRAM:

one memory cell has two task: input and output.
read through column line
The memory block is shared by many ALU and can be accessed at the same time
multiple access transistor connect with different CPU. Multiple CPU can read synchronously and write same data synchronously.
Memory control allow CPU write the same data at the same time- a device that detect what CPU write

SRAM	DRAM
fast	slow
low voltage threshold	high voltage threshold for less leakage
larger and low device density	smaller and high device density(only 1 transistor and 1 capacitor per cell)

DRAM cell has two type:
在这里插入图片描述

Trench cell: bigger area and deeper
stack cell: cap above the gate

I/O

type of power source VDD:

VDDD/GNDD: digital
VDDA/GNDA: analog
VDDM/GNDM: memory

Input protection device
the noise from power supply will cause latch up and ESD(electro static discharge), to prevent this, we need to use input protection device.
gate oxide break down voltage: approximate 40V to 100V

bounding wire and bounding pad:bounding wire is thick wire with large resistance, capacitance and inductance between CPU and mother board.
protect the gate oxide which will be affected by ESD: input pad. Schmitt trigger inside input pad can filter the noise. Input pad with ESD protection- a current limiting and diode clamps(large diodes)
output pad: require chain of successively large buffers.
bidirectional pad

two types of test: function test and electrical test
DFT: design for testability
Testing: the process by which a defect in the system can be exposed
input test vector into devices under test(DUT) then collect the response, the response has to be compared with reasonable output
在这里插入图片描述
BIST system: built in self test.

Faults:

transmit fault — due to $\alpha$ particle and power supply fluctuation
intermittent fault — due to external reasons, like loose connection, humidity and temperature
the faults mention above are permanent
stuck-at 0
stuck-at 1
stuck-at open :output hold previous value, we have to initialize the value before test
Delay fault\
bridge fault ( use IDDQ test to check: check the current from VDD to GND, if internal circuit not activate(no input), but still current, something wrong.）