VLSI 课堂笔记(下)

上:VLSI课堂笔记(上)

DESIGNING COMBINATIONAL LOGIC GATES IN CMOS

在这里插入图片描述
The inverter, w is the width of the transistor, noticing that how the width change in the different combinational logic gate.

NAND gate (recommended)

2-INPUT
在这里插入图片描述

3-INPUT
在这里插入图片描述
4-INPUT
在这里插入图片描述
(p=20, n=10)

NOR gate

2-INPUT
在这里插入图片描述

3-INPUT
在这里插入图片描述

4-INPUT
在这里插入图片描述

COMPLEX GATE

for “OR(+)” logic, nmos parallel, pmos series
for “AND( ⋅ \cdot )” logic, nmos series, pmos parallel
then connect two part

A B + C D ‾ \overline {AB+CD} AB+CD
在这里插入图片描述

( A + B ) C ‾ \overline {(A+B)C} (A+B)C
在这里插入图片描述

fanout parameter

在这里插入图片描述
F = 32 C C = 32 f o u t p u t = F N = 32 5 = 2 t = N t p 0 ( 1 + f / γ ) = N t p 0 ( 1 + f ) = 5 t p 0 ( 1 + 2 ) = 15 t p 0 E s u p p l y = 30 C V D D 2    ( 30 C = 2 C + 4 C + 8 C + 16 C ) F=\frac {32C}C=32\\ f_{output}=\sqrt[N] F=\sqrt[5] {32}=2\\ t=Nt_{p0}(1+f/\gamma)=Nt_{p0}(1+f)=5t_{p0}(1+2)=15t_{p0}\\ E_{supply}=30CV_{DD}^2\;(30C=2C+4C+8C+16C) F=C32C=32foutput=NF =532 =2t=Ntp0(1+f/γ)=Ntp0(1+f)=5tp0(1+2)=15tp0Esupply=30CVDD2(30C=2C+4C+8C+16C)

在这里插入图片描述
t = t p 0 ( 1 + o u t i n ) t = t p 0 ( 1 + S 2 1 ) + t p 0 ( 1 + 12 S 2 ) + t p 0 ( 1 + S 4 12 ) + t p 0 ( 1 + 6 S 4 ) + t p 0 ( 1 + 64 4 ) t=t_{p0}(1+\frac{out}{in})\\ t=t_{p0}(1+\frac {S_2}{1})+t_{p0}(1+\frac {12}{S_2})+t_{p0}(1+\frac {S_4}{12})+t_{p0}(1+\frac {6}{S_4})+t_{p0}(1+\frac {64}{4}) t=tp0(1+inout)t=tp0(1+1S2)+tp0(1+S212)+tp0(1+12S4)+tp0(1+S46)+tp0(1+464)

在这里插入图片描述
fanout is not 3 but (1+4+2)=7, t = t p 0 ( 1 + 7 ) = 8 t p 0 t=t_{p0}(1+7)=8t_{p0} t=tp0(1+7)=8tp0

在这里插入图片描述
fanout is not 1 but 8, t = t p 0 ( 1 + 8 ) = 9 t p 0 t=t_{p0}(1+8)=9t_{p0} t=tp0(1+8)=9tp0

equalized transistor size

series
make width(nominator) the same, then add the length(denominator)
在这里插入图片描述

parallel
make length(denominator) the same, then add the width(nominator)
在这里插入图片描述
在这里插入图片描述

Static circuit : every signal is driven by V_DD or ground(directly or indirectly)

Frequency of output flipping (0 to 1)

for 2-NAND gate
在这里插入图片描述

P 0 ( 0 → 1 ) = P 0 ( 0 ) ∗ P 0 ( 1 ) = ( 1 4 ) ∗ ( 3 4 ) = 3 16 P_0(0\to 1)=P_0(0)*P_0(1)=(\frac 14)*(\frac 34)=\frac 3{16} P0(01)=P0(0)P0(1)=(41)(43)=163

for 2-NOR gate
在这里插入图片描述

P 0 ( 0 → 1 ) = P 0 ( 1 ) ∗ P 0 ( 0 ) = ( 1 4 ) ∗ ( 3 4 ) = 3 16 P_0(0\to 1)=P_0(1)*P_0(0)=(\frac 14)*(\frac 34)=\frac 3{16} P0(01)=P0(1)P0(0)=(41)(43)=163

Design Techniques to Reduce Switching Activity

  1. Logic Restructuring
    在这里插入图片描述
    probility:
1 O 1 O_1 O1 O 2 O_2 O2F
p 1 p_1 p1(chain)1/41/81/16
p 0 = 1 − p 1 p_0=1-p_1 p0=1p1(chain)3/47/815/16
p 0 → 1 p_{0\to1} p01(chain)= p 0 × p 1 p_0\times p_1 p0×p1(chain)3/167/6415/256
p 1 p_1 p1(tree)1/41/41/16
p 0 = 1 − p 1 p_0=1-p_1 p0=1p1(tree)3/43/415/16
p 0 → 1 p_{0\to1} p01(tree)= p 0 × p 1 p_0\times p_1 p0×p1(tree)3/163/1615/256
  1. Input Ordering在这里插入图片描述
    probability of Z Z Z is equal
    in the first circuit:
    P 0 → 1 ( i n t e r m e d i a t e    n o d e ) = ( 1 − 0.5 × 0.2 ) × ( 0.5 × 0.2 ) P_{0\to 1}(intermediate \;node)=(1-0.5\times0.2)\times(0.5\times0.2) P01(intermediatenode)=(10.5×0.2)×(0.5×0.2)
    in the second circuit:
    P 0 → 1 ( i n t e r m e d i a t e    n o d e ) = ( 1 − 0.2 × 0.1 ) × ( 0.2 × 0.1 ) P_{0\to 1}(intermediate \;node)=(1-0.2\times0.1)\times(0.2\times0.1) P01(intermediatenode)=(10.2×0.1)×(0.2×0.1)
  2. Time-multiplexing resource
  3. Glitch Reduction by balancing signal paths

logical effort

modified basic delay equation:
t p = t p 0 ( p + g f / γ ) t_p=t_{p0}(p+gf/\gamma) tp=tp0(p+gf/γ)
t p 0 t_{p0} tp0: intrinsic delay
f f f: electrical effort, ratio between external load and input capacitance
p p p: the ratio of the intrinsic(or unloaded) delay of gate and inverter
g g g: logical effort, how much more input capacitance a gate presents to deliver the same output current as an inverter.

Logic efforts of common logic gates, assuming a PMOS/NMOS ratio of 2

在这里插入图片描述

Path effort H H H

H = G B F H=GBF H=GBF
G G G: logic effort G = ∏ g i G=\prod g_i G=gi
F F F: electrical effort, C L / C I N C_L/C_{IN} CL/CIN
B B B: path branching effort, B = ∏ b i = ∏ ( C o n p a t h + C o f f p a t h C o n p a t h ) B=\prod b_i=\prod(\frac{C_{onpath}+C_{offpath}}{C_{onpath}}) B=bi=(ConpathConpath+Coffpath)
在这里插入图片描述
BEST stage effort (the gate effort that minimize the path delay): h = H N h= \sqrt[N] H h=NH
minimum delay: D = t p 0 ( ∑ j = 1 N p j + N ( H N ) γ ) D=t_{p0}(\sum_{j=1}^Np_j+\frac{N(\sqrt[N] H)}{\gamma}) D=tp0(j=1Npj+γN(NH ))
example:
在这里插入图片描述

Ratioed logic

Differential Cascode Voltage Switch Logic (DCVSL)

a ratioed logic style that completely eliminates static currents and provides rail-to-rail swing. Such a gate combines two concepts: differential logic and positive feedback.
在这里插入图片描述
XOR-XNOR DCVSL gate
在这里插入图片描述
AND/NAND gate in DCVSL:
在这里插入图片描述

pass transistor logic(not optimal)

allowing the primary inputs to drive gate terminals as well as source/drain terminals
在这里插入图片描述
B=H — top transistor turn on — copies the input A to the output F.
B=L — bottom pass transistor turn on — passes a 0.

Transmission Gate Logic

Robust and Efficient Pass-Transistor Design, low static power dissipation and increased noise margins
在这里插入图片描述
placing a NMOS device in parallel with a PMOS device
A = B    i f    C = 1 c u t o f f    i f    C = 0 A=B\;if \;C=1\\ cutoff\;if\;C=0 A=BifC=1cutoffifC=0
Transmission gate multiplexer:在这里插入图片描述
F ˉ = ( A ⋅ S + B ⋅ S ˉ ) \bar{F}=(A\cdot S+B\cdot \bar{S}) Fˉ=(AS+BSˉ)

Transmission gate XOR
在这里插入图片描述
if B=1, M1M2 inverter work, M3M4 cutoff, F = A ˉ B F=\bar{A}B F=AˉB
if B=0, M1M2 cutoff, M3M4 transmission gate work, F = A B ˉ F=A\bar{B} F=ABˉ

Dynamic logic design (with clock signal)

schmitt trigger

The voltage-transfer characteristic displays different switching thresholds for positive- and negative-going input signals — can suppress the ringing on the signal.
在这里插入图片描述
CMOS Schmitter Trigger
在这里插入图片描述

  1. When V i n = 0 V_{in}=0 Vin=0, V o u t = 0 V_{out}=0 Vout=0, the feedback loop biases the PMOS M4 while M3 is off.
    Now, M2M4 in parallel and M1 become an inverter. The effective transistor ratio of the inverter to kM1/(kM2+kM4), which moves the switching threshold upwards.
  2. Once V i n = 1 V_{in}=1 Vin=1, V o u t = 1 V_{out}=1 Vout=1, the feedback loop turns off M4, and the NMOS M3 is activated. This extra pull-down device speeds up the transition.
  3. When initial V i n = 1 V_{in}=1 Vin=1, the pull-down network originally consists of M1and M3 in parallel, while the pull-up network is formed by M2. This reduces the value of the switching threshold to V M − V_{M-} VM.

BI CMOS

BICMOS = bipolar + CMOS
good driving capability, low power, small size but expensive

bipolar two input NAND
(figure)

(cascade) Domino logic

  1. Precharge: clock signal=0, all the output charge to VDD
  2. evaluation: clock signal =1, { ouput discharge  if PDN/foot device/ ground switch on prechaged value remain store  if PDN/foot device/ ground switch off \begin{cases} \text{ouput discharge}&\text{ if PDN/foot device/ ground switch on}\\\text{prechaged value remain store}&\text{ if PDN/foot device/ ground switch off}\end{cases} {ouput dischargeprechaged value remain store if PDN/foot device/ ground switch on if PDN/foot device/ ground switch off
    this called conditionally discharge

The inverter between each stage: keep the input of each stage obeying monatomicity principle ( 0 → 1 0\to1 01)
Reason:
domino cascade

during precharge(clk=0), out1 and out2 charge to 1. then clk=1 and In=1, out1 discharge( 1 → 0 1\to 0 10). out1 is in2, in2 from 1 to 0, the situation of out2 change from discharge to “store”. Not sure the Δ V \Delta V ΔV of discharging, so not sure the level of the output → \to output is undefined.
So we want to guarantee the inputs can only make a single 0 → 1 0\to1 01 transition during the evaluation period.

Domino type
(figure)
D2 type domino no foot device, which avoid unexpected discharge, reduce transistor size but more dangerous.

comparation between Domino circuit and Static circuit

DominoStatic
favors down transitionbalanced output(rise time=fall time)
only need pull-down networktransistor should be able to pull up as well as pull down
less gate loading, input drives only NMOSmore gate loading, input drive both PMOS and NMOS
logic threshold voltage=device threshold voltage, earlier switching point and less delaylogic threshold voltage= V D D 2 \frac {V_{DD}}2 2VDD, slower
require clock signal, increase clock power and clock loadingeasy to design
dangerousrobust and safe

Issues in dynamic design:

  1. charge leakage
  2. charge sharing
  3. backgate coupling
  4. clock feedthrough

charge sharing

(figure6.59)

After precharged to VDD, during evaluation, B = 0 , A 0 → 1 B=0, A0\to 1 B=0,A01
transistor Ma on, the charge stored originally on capacitor C L C_L CL is redistributed over C L C_L CL and C a C_a Ca. This causes a drop in the output voltage.
if Δ V o u t < V T N \Delta V_{out}<V_{TN} ΔVout<VTN
Δ V o u t = − C a C L [ V D D − V T N ( V X ) ] V X = V D D − V T N ( V X ) \Delta V_{out}=-\frac{C_a}{C_L}[V_{DD}-V_{TN}(V_X)]\\ V_X=V_{DD}-V_{TN}(V_X) ΔVout=CLCa[VDDVTN(VX)]VX=VDDVTN(VX)
if Δ V o u t > V T N \Delta V_{out}>V_{TN} ΔVout>VTN
Δ V o u t = − V D D ( C a C a + C L ) V X = V o u t \Delta V_{out}=-V_{DD}(\frac{C_a}{C_a+C_L})\\ V_X=V_{out} ΔVout=VDD(Ca+CLCa)VX=Vout

Clock delay domino (CD domino)

Using a self-timed delay-matched clock tree for the precharge and evaluation clock in a pipeline stage. The clock to gate is delayed until the input to the gate have been stabilized.

Relationship between clk and input signal for domino circuit

  1. Successive D2 gate, should start precharge after the previous domino D2 type finish precharging(guarantee input is 0)
  2. Evaluation clock edge must arrive before inputs have evaluated. Clock evaluation edge should go high and after that input should arrive

•Domino block: we can not combine static signal and domino signal, unless domino signal goes through a dynamic-to-static invertor.

Clock skew(clock misalignment) & Jitter

Clock skew: Spatial variation of the clock signal. Delay between the global clock signal arrive at different hierarchical part in a IC. Clock skew should be less than 0.1 T c y c l e 0.1T_{cycle} 0.1Tcycle
Jitter: Different cycle time of each cycle. Cycle time is not constant

sequential circuit design { transparent latch { clock positive latch clock negative latch flip-flop { edge trigger master slave \text{sequential circuit design}\begin{cases} \text{transparent latch}\begin{cases}\text{clock positive latch}\\\text{clock negative latch}\end{cases}\\ \text{flip-flop}\begin{cases}\text{edge trigger}\\\text{master slave}\end{cases} \end{cases} sequential circuit design transparent latch{clock positive latchclock negative latchflip-flop{edge triggermaster slave

master and slave

(figure 7.18)

feedback inverter I 2 I_2 I2 and I 4 I_4 I4 should be “weak” (small size)
first stage:

CLK=0Q=Data
CLK=1Q(n)=Q(n-1)

second stage:

CLK=0Q(n)=Q(n-1)
CLK=1Q=Data

setup time and hold time in flip-flop

![setupandhold](https://img-blog.csdnimg.cn/direct/497355b6a41543278955e77d8088442f.png)

setup time: the minimum amount of time before the clock’s active edge that the data must be stable for is to be latched correctly
hold time: minimum amount of time after the CLK’s active edge during which data must be stable

H-tree在这里插入图片描述

a path for CLK signal to traverse the IC
clock buffer: Ideal clock arrival time is 0, adding buffer every where

clock skew causes:

  1. different VDD
  2. temperature variation
  3. inter-connect mismatch
  4. process variation
  5. clock loading different buffer drive different number of flip-flop

clock header: solve the CLK skew causes

  1. make all clock buffer orientation at the same time
  2. thermal analysis
  3. make wider buffer

clock grid structure (shield each CLK with DC signal between)

Power distribution:

  1. IR drop
  2. L d i d t L\frac{di}{dt} Ldtdi (conductor)
  3. Δ V D D \Delta V_{DD} ΔVDD: system power variation
  4. tested guard band: making sure if one part of design fails, the chip still can continue to operate

Phase Lock Loop (PLL)

synchronize the local clock an system clock
(figure)

  1. local clock and system clock are compared using a phase detector, output UP(phase lag) or DOWN(phase lead) signal
  2. up and down signals are fed into a charge pump, which translates the digital encoded control information into analog voltage.
  3. Loop filter remove high frequency component, reduce jitter
  4. In voltage controlled oscillation, up signal speed up VCO, causes the local signal to catch up system clock. Down signal slow down VCO eliminating the phase lead of the local clock, until two clock synchronize
  5. goes to phase detector again

3-stage current starved VCO (can have more stages)
(figure)

Timing: the time delay when signal propagates from flip-flop1 to flip-flop2

Sum of delay:
T c l o c k _ t o _ Q + T w i r e + T l o g i c + T w i r e + T s e t u p ≤ T c y c T_{clock\_to\_Q}+T_{wire}+T_{logic}+T_{wire}+T_{setup}\le T_{cyc} Tclock_to_Q+Twire+Tlogic+Twire+TsetupTcyc

Latch based design VS flip-flop based design

Latch basedflip-flop bases
need data path(many data in parallel)easy to design
timing flexibilitystrict timing
time borrowing
timing verification(bad)

can not use flip-flop and latch at the same part

time borrowing(figure)

Memory design - ROM RAM

ROM: read only memory
RAM: random access memory
both ROM and RAM are volatile memory

R O M { Mask programmed Programmable ROM Eraserable ROM(EOROM;EEROM) ROM\begin{cases} \text{Mask programmed}\\\text{Programmable ROM} \\\text{Eraserable ROM(EOROM;EEROM)}\end{cases} ROM Mask programmedProgrammable ROMEraserable ROM(EOROM;EEROM)

R A M { static RAM dynammic RAM(high threshold voltage with body effect for less leakage but slower) RAM\begin{cases} \text{static RAM}\\\text{dynammic RAM(high threshold voltage with body effect for less leakage but slower)} \\\end{cases} RAM{static RAMdynammic RAM(high threshold voltage with body effect for less leakage but slower)

speed from high to low: flip-flop and latch > register file > cache(SRAM)

4bits*4bits=16 bits Rom

Column line is precharged(both RAM and ROM). Row input for both ROM and RAM, ROM column output, RAM column both input and output

(figure of ROM)

read ROM data: active transistor will pull the whole column down, so that column output 0

Memory access time: row access time+ column access time (worst case)
Row access time (worst case): RC delay, from resistor R2 to R256(end of the row)
Column(bit line) access time: (worst case) C t o t = C w i r e + ( C d r a i n _ b o d y + C g a t e _ d r a i n ) × 128 C_{tot}=C_{wire}+(C_{drain\_body}+C_{gate\_drain})\times 128 Ctot=Cwire+(Cdrain_body+Cgate_drain)×128

ROM compile: determine the number of row line and column line of the ROM
example:
32K ROM, = 2 15 = 32 , 168 = 2 1 × 2 14 = 2 2 × 2 13 = 2 3 × 2 11 = . . . = 2 7 × 2 18 2^{15}=32,168=2^1\times2^{14}=2^2\times2^{13}=2^3\times2^{11}=...=2^7\times2^{18} 215=32,168=21×214=22×213=23×211=...=27×218 any possible combination

Read and write RAM

two basic principle:

  1. we should be able to overwrite the existing data (if stored 1, overwrite to 0)
  2. when we read, we should be able to read without modifying the existing data

soft error: some outer IC α \alpha α particles have high energy, hit the IC and will change the data. To protect the IC from α \alpha α particles, we usually increase the cell size so decrease the density

(figure of waveform)

(figure of TR)

store 0 and read 0:

  1. Stored 0: M1-on M5-off M2-off M6-on. Node N1=0.
  2. precharge phase will charge C to 1.
  3. Read 0: M4-on M3-on, C connect directly to Node N1. We want to keep the N1=0(do not modify existing data), so M1 still open, pull down the C to 0(successfully read 0).
    Notice that M1 should be bigger(stronger) than M3 to pull down C

store 1 and write 0:

  1. store 1: M1-off M5-on M2-on M6-off, N1=0
  2. C C C and C ~ \tilde{C} C~ is charged to 1
  3. write 0: M3M4-off M1-on, M5-off, M2-off, M6-on, N1=0, N2=1.

data write circuit (DWC): W W W and W ~ \tilde{W} W~, W ~ = 1 \tilde{W}=1 W~=1, not write, both M1 M2 off

sense amp: magnify the small transition

SRAM:

  1. one memory cell has two task: input and output.
  2. read through column line
  3. The memory block is shared by many ALU and can be accessed at the same time
  4. multiple access transistor connect with different CPU. Multiple CPU can read synchronously and write same data synchronously.
  5. Memory control allow CPU write the same data at the same time- a device that detect what CPU write
SRAMDRAM
fastslow
low voltage thresholdhigh voltage threshold for less leakage
larger and low device densitysmaller and high device density(only 1 transistor and 1 capacitor per cell)

DRAM cell has two type:
在这里插入图片描述

  1. Trench cell: bigger area and deeper
  2. stack cell: cap above the gate

I/O

type of power source VDD:

  1. VDDD/GNDD: digital
  2. VDDA/GNDA: analog
  3. VDDM/GNDM: memory

Input protection device
the noise from power supply will cause latch up and ESD(electro static discharge), to prevent this, we need to use input protection device.
gate oxide break down voltage: approximate 40V to 100V

  1. bounding wire and bounding pad:bounding wire is thick wire with large resistance, capacitance and inductance between CPU and mother board.
    在这里插入图片描述

  2. protect the gate oxide which will be affected by ESD: input pad. Schmitt trigger inside input pad can filter the noise. Input pad with ESD protection- a current limiting and diode clamps(large diodes)
    在这里插入图片描述

  3. output pad: require chain of successively large buffers.

  4. bidirectional pad
    在这里插入图片描述

two types of test: function test and electrical test
DFT: design for testability
Testing: the process by which a defect in the system can be exposed
input test vector into devices under test(DUT) then collect the response, the response has to be compared with reasonable output
在这里插入图片描述
BIST system: built in self test.

Faults:

  1. transmit fault — due to α \alpha α particle and power supply fluctuation
  2. intermittent fault — due to external reasons, like loose connection, humidity and temperature
    the faults mention above are permanent
  3. stuck-at 0
  4. stuck-at 1
  5. stuck-at open :output hold previous value, we have to initialize the value before test
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
  6. Delay fault\
  7. bridge fault ( use IDDQ test to check: check the current from VDD to GND, if internal circuit not activate(no input), but still current, something wrong.)
  • 6
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值