DESIGNING COMBINATIONAL LOGIC GATES IN CMOS
The inverter, w is the width of the transistor, noticing that how the width change in the different combinational logic gate.
NAND gate (recommended)
2-INPUT
3-INPUT
4-INPUT
(p=20, n=10)
NOR gate
2-INPUT
3-INPUT
4-INPUT
COMPLEX GATE
for “OR(+)” logic, nmos parallel, pmos series
for “AND(
⋅
\cdot
⋅)” logic, nmos series, pmos parallel
then connect two part
A
B
+
C
D
‾
\overline {AB+CD}
AB+CD
(
A
+
B
)
C
‾
\overline {(A+B)C}
(A+B)C
fanout parameter
F
=
32
C
C
=
32
f
o
u
t
p
u
t
=
F
N
=
32
5
=
2
t
=
N
t
p
0
(
1
+
f
/
γ
)
=
N
t
p
0
(
1
+
f
)
=
5
t
p
0
(
1
+
2
)
=
15
t
p
0
E
s
u
p
p
l
y
=
30
C
V
D
D
2
(
30
C
=
2
C
+
4
C
+
8
C
+
16
C
)
F=\frac {32C}C=32\\ f_{output}=\sqrt[N] F=\sqrt[5] {32}=2\\ t=Nt_{p0}(1+f/\gamma)=Nt_{p0}(1+f)=5t_{p0}(1+2)=15t_{p0}\\ E_{supply}=30CV_{DD}^2\;(30C=2C+4C+8C+16C)
F=C32C=32foutput=NF=532=2t=Ntp0(1+f/γ)=Ntp0(1+f)=5tp0(1+2)=15tp0Esupply=30CVDD2(30C=2C+4C+8C+16C)
t
=
t
p
0
(
1
+
o
u
t
i
n
)
t
=
t
p
0
(
1
+
S
2
1
)
+
t
p
0
(
1
+
12
S
2
)
+
t
p
0
(
1
+
S
4
12
)
+
t
p
0
(
1
+
6
S
4
)
+
t
p
0
(
1
+
64
4
)
t=t_{p0}(1+\frac{out}{in})\\ t=t_{p0}(1+\frac {S_2}{1})+t_{p0}(1+\frac {12}{S_2})+t_{p0}(1+\frac {S_4}{12})+t_{p0}(1+\frac {6}{S_4})+t_{p0}(1+\frac {64}{4})
t=tp0(1+inout)t=tp0(1+1S2)+tp0(1+S212)+tp0(1+12S4)+tp0(1+S46)+tp0(1+464)
fanout is not 3 but (1+4+2)=7,
t
=
t
p
0
(
1
+
7
)
=
8
t
p
0
t=t_{p0}(1+7)=8t_{p0}
t=tp0(1+7)=8tp0
fanout is not 1 but 8,
t
=
t
p
0
(
1
+
8
)
=
9
t
p
0
t=t_{p0}(1+8)=9t_{p0}
t=tp0(1+8)=9tp0
equalized transistor size
series
make width(nominator) the same, then add the length(denominator)
parallel
make length(denominator) the same, then add the width(nominator)
Static circuit : every signal is driven by V_DD or ground(directly or indirectly)
Frequency of output flipping (0 to 1)
for 2-NAND gate
P 0 ( 0 → 1 ) = P 0 ( 0 ) ∗ P 0 ( 1 ) = ( 1 4 ) ∗ ( 3 4 ) = 3 16 P_0(0\to 1)=P_0(0)*P_0(1)=(\frac 14)*(\frac 34)=\frac 3{16} P0(0→1)=P0(0)∗P0(1)=(41)∗(43)=163
for 2-NOR gate
P 0 ( 0 → 1 ) = P 0 ( 1 ) ∗ P 0 ( 0 ) = ( 1 4 ) ∗ ( 3 4 ) = 3 16 P_0(0\to 1)=P_0(1)*P_0(0)=(\frac 14)*(\frac 34)=\frac 3{16} P0(0→1)=P0(1)∗P0(0)=(41)∗(43)=163
Design Techniques to Reduce Switching Activity
- Logic Restructuring
probility:
1 | O 1 O_1 O1 | O 2 O_2 O2 | F |
---|---|---|---|
p 1 p_1 p1(chain) | 1/4 | 1/8 | 1/16 |
p 0 = 1 − p 1 p_0=1-p_1 p0=1−p1(chain) | 3/4 | 7/8 | 15/16 |
p 0 → 1 p_{0\to1} p0→1(chain)= p 0 × p 1 p_0\times p_1 p0×p1(chain) | 3/16 | 7/64 | 15/256 |
p 1 p_1 p1(tree) | 1/4 | 1/4 | 1/16 |
p 0 = 1 − p 1 p_0=1-p_1 p0=1−p1(tree) | 3/4 | 3/4 | 15/16 |
p 0 → 1 p_{0\to1} p0→1(tree)= p 0 × p 1 p_0\times p_1 p0×p1(tree) | 3/16 | 3/16 | 15/256 |
- Input Ordering
probability of Z Z Z is equal
in the first circuit:
P 0 → 1 ( i n t e r m e d i a t e n o d e ) = ( 1 − 0.5 × 0.2 ) × ( 0.5 × 0.2 ) P_{0\to 1}(intermediate \;node)=(1-0.5\times0.2)\times(0.5\times0.2) P0→1(intermediatenode)=(1−0.5×0.2)×(0.5×0.2)
in the second circuit:
P 0 → 1 ( i n t e r m e d i a t e n o d e ) = ( 1 − 0.2 × 0.1 ) × ( 0.2 × 0.1 ) P_{0\to 1}(intermediate \;node)=(1-0.2\times0.1)\times(0.2\times0.1) P0→1(intermediatenode)=(1−0.2×0.1)×(0.2×0.1) - Time-multiplexing resource
- Glitch Reduction by balancing signal paths
logical effort
modified basic delay equation:
t
p
=
t
p
0
(
p
+
g
f
/
γ
)
t_p=t_{p0}(p+gf/\gamma)
tp=tp0(p+gf/γ)
t
p
0
t_{p0}
tp0: intrinsic delay
f
f
f: electrical effort, ratio between external load and input capacitance
p
p
p: the ratio of the intrinsic(or unloaded) delay of gate and inverter
g
g
g: logical effort, how much more input capacitance a gate presents to deliver the same output current as an inverter.
Logic efforts of common logic gates, assuming a PMOS/NMOS ratio of 2
Path effort H H H
H
=
G
B
F
H=GBF
H=GBF
G
G
G: logic effort
G
=
∏
g
i
G=\prod g_i
G=∏gi
F
F
F: electrical effort,
C
L
/
C
I
N
C_L/C_{IN}
CL/CIN
B
B
B: path branching effort,
B
=
∏
b
i
=
∏
(
C
o
n
p
a
t
h
+
C
o
f
f
p
a
t
h
C
o
n
p
a
t
h
)
B=\prod b_i=\prod(\frac{C_{onpath}+C_{offpath}}{C_{onpath}})
B=∏bi=∏(ConpathConpath+Coffpath)
BEST stage effort (the gate effort that minimize the path delay):
h
=
H
N
h= \sqrt[N] H
h=NH
minimum delay:
D
=
t
p
0
(
∑
j
=
1
N
p
j
+
N
(
H
N
)
γ
)
D=t_{p0}(\sum_{j=1}^Np_j+\frac{N(\sqrt[N] H)}{\gamma})
D=tp0(∑j=1Npj+γN(NH))
example:
Ratioed logic
Differential Cascode Voltage Switch Logic (DCVSL)
a ratioed logic style that completely eliminates static currents and provides rail-to-rail swing. Such a gate combines two concepts: differential logic and positive feedback.
XOR-XNOR DCVSL gate
AND/NAND gate in DCVSL:
pass transistor logic(not optimal)
allowing the primary inputs to drive gate terminals as well as source/drain terminals
B=H — top transistor turn on — copies the input A to the output F.
B=L — bottom pass transistor turn on — passes a 0.
Transmission Gate Logic
Robust and Efficient Pass-Transistor Design, low static power dissipation and increased noise margins
placing a NMOS device in parallel with a PMOS device
A
=
B
i
f
C
=
1
c
u
t
o
f
f
i
f
C
=
0
A=B\;if \;C=1\\ cutoff\;if\;C=0
A=BifC=1cutoffifC=0
Transmission gate multiplexer:
F
ˉ
=
(
A
⋅
S
+
B
⋅
S
ˉ
)
\bar{F}=(A\cdot S+B\cdot \bar{S})
Fˉ=(A⋅S+B⋅Sˉ)
Transmission gate XOR
if B=1, M1M2 inverter work, M3M4 cutoff,
F
=
A
ˉ
B
F=\bar{A}B
F=AˉB
if B=0, M1M2 cutoff, M3M4 transmission gate work,
F
=
A
B
ˉ
F=A\bar{B}
F=ABˉ
Dynamic logic design (with clock signal)
schmitt trigger
The voltage-transfer characteristic displays different switching thresholds for positive- and negative-going input signals — can suppress the ringing on the signal.
CMOS Schmitter Trigger
- When
V
i
n
=
0
V_{in}=0
Vin=0,
V
o
u
t
=
0
V_{out}=0
Vout=0, the feedback loop biases the PMOS M4 while M3 is off.
Now, M2M4 in parallel and M1 become an inverter. The effective transistor ratio of the inverter to kM1/(kM2+kM4), which moves the switching threshold upwards. - Once V i n = 1 V_{in}=1 Vin=1, V o u t = 1 V_{out}=1 Vout=1, the feedback loop turns off M4, and the NMOS M3 is activated. This extra pull-down device speeds up the transition.
- When initial V i n = 1 V_{in}=1 Vin=1, the pull-down network originally consists of M1and M3 in parallel, while the pull-up network is formed by M2. This reduces the value of the switching threshold to V M − V_{M-} VM−.
BI CMOS
BICMOS = bipolar + CMOS
good driving capability, low power, small size but expensive
bipolar two input NAND
(cascade) Domino logic
- Precharge: clock signal=0, all the output charge to VDD
- evaluation: clock signal =1,
{
ouput discharge
if PDN/foot device/ ground switch on
prechaged value remain store
if PDN/foot device/ ground switch off
\begin{cases} \text{ouput discharge}&\text{ if PDN/foot device/ ground switch on}\\\text{prechaged value remain store}&\text{ if PDN/foot device/ ground switch off}\end{cases}
{ouput dischargeprechaged value remain store if PDN/foot device/ ground switch on if PDN/foot device/ ground switch off
this called conditionally discharge
The inverter between each stage: keep the input of each stage obeying monatomicity principle (
0
→
1
0\to1
0→1)
Reason:
during precharge(clk=0), out1 and out2 charge to 1. then clk=1 and In=1, out1 discharge(
1
→
0
1\to 0
1→0). out1 is in2, in2 from 1 to 0, the situation of out2 change from discharge to “store”. Not sure the
Δ
V
\Delta V
ΔV of discharging, so not sure the level of the output
→
\to
→ output is undefined.
So we want to guarantee the inputs can only make a single
0
→
1
0\to1
0→1 transition during the evaluation period.
Domino type
(figure)
D2 type domino no foot device, which avoid unexpected discharge, reduce transistor size but more dangerous.
comparation between Domino circuit and Static circuit
Domino | Static |
---|---|
favors down transition | balanced output(rise time=fall time) |
only need pull-down network | transistor should be able to pull up as well as pull down |
less gate loading, input drives only NMOS | more gate loading, input drive both PMOS and NMOS |
logic threshold voltage=device threshold voltage, earlier switching point and less delay | logic threshold voltage= V D D 2 \frac {V_{DD}}2 2VDD, slower |
require clock signal, increase clock power and clock loading | easy to design |
dangerous | robust and safe |
Issues in dynamic design:
- charge leakage
- charge sharing
- backgate coupling
- clock feedthrough
charge sharing
After precharged to VDD, during evaluation,
B
=
0
,
A
0
→
1
B=0, A0\to 1
B=0,A0→1
transistor Ma on, the charge stored originally on capacitor
C
L
C_L
CL is redistributed over
C
L
C_L
CL and
C
a
C_a
Ca. This causes a drop in the output voltage.
if
Δ
V
o
u
t
<
V
T
N
\Delta V_{out}<V_{TN}
ΔVout<VTN
Δ
V
o
u
t
=
−
C
a
C
L
[
V
D
D
−
V
T
N
(
V
X
)
]
V
X
=
V
D
D
−
V
T
N
(
V
X
)
\Delta V_{out}=-\frac{C_a}{C_L}[V_{DD}-V_{TN}(V_X)]\\ V_X=V_{DD}-V_{TN}(V_X)
ΔVout=−CLCa[VDD−VTN(VX)]VX=VDD−VTN(VX)
if
Δ
V
o
u
t
>
V
T
N
\Delta V_{out}>V_{TN}
ΔVout>VTN
Δ
V
o
u
t
=
−
V
D
D
(
C
a
C
a
+
C
L
)
V
X
=
V
o
u
t
\Delta V_{out}=-V_{DD}(\frac{C_a}{C_a+C_L})\\ V_X=V_{out}
ΔVout=−VDD(Ca+CLCa)VX=Vout
Clock delay domino (CD domino)
Using a self-timed delay-matched clock tree for the precharge and evaluation clock in a pipeline stage. The clock to gate is delayed until the input to the gate have been stabilized.
Relationship between clk and input signal for domino circuit
- Successive D2 gate, should start precharge after the previous domino D2 type finish precharging(guarantee input is 0)
- Evaluation clock edge must arrive before inputs have evaluated. Clock evaluation edge should go high and after that input should arrive
•Domino block: we can not combine static signal and domino signal, unless domino signal goes through a dynamic-to-static invertor.
Clock skew(clock misalignment) & Jitter
Clock skew: Spatial variation of the clock signal. Delay between the global clock signal arrive at different hierarchical part in a IC. Clock skew should be less than
0.1
T
c
y
c
l
e
0.1T_{cycle}
0.1Tcycle
Jitter: Different cycle time of each cycle. Cycle time is not constant
sequential circuit design { transparent latch { clock positive latch clock negative latch flip-flop { edge trigger master slave \text{sequential circuit design}\begin{cases} \text{transparent latch}\begin{cases}\text{clock positive latch}\\\text{clock negative latch}\end{cases}\\ \text{flip-flop}\begin{cases}\text{edge trigger}\\\text{master slave}\end{cases} \end{cases} sequential circuit design⎩ ⎨ ⎧transparent latch{clock positive latchclock negative latchflip-flop{edge triggermaster slave
master and slave
feedback inverter
I
2
I_2
I2 and
I
4
I_4
I4 should be “weak” (small size)
first stage:
CLK=0 | Q=Data |
---|---|
CLK=1 | Q(n)=Q(n-1) |
second stage:
CLK=0 | Q(n)=Q(n-1) |
---|---|
CLK=1 | Q=Data |
setup time and hold time in flip-flop
![setupandhold
setup time: the minimum amount of time before the clock’s active edge that the data must be stable for is to be latched correctly
hold time: minimum amount of time after the CLK’s active edge during which data must be stable
H-tree
a path for CLK signal to traverse the IC
clock buffer: Ideal clock arrival time is 0, adding buffer every where
clock skew causes:
- different VDD
- temperature variation
- inter-connect mismatch
- process variation
- clock loading different buffer drive different number of flip-flop
clock header: solve the CLK skew causes
- make all clock buffer orientation at the same time
- thermal analysis
- make wider buffer
clock grid structure (shield each CLK with DC signal between)
Power distribution:
- IR drop
- L d i d t L\frac{di}{dt} Ldtdi (conductor)
- Δ V D D \Delta V_{DD} ΔVDD: system power variation
- tested guard band: making sure if one part of design fails, the chip still can continue to operate
Phase Lock Loop (PLL)
synchronize the local clock an system clock
- local clock and system clock are compared using a phase detector, output UP(phase lag) or DOWN(phase lead) signal
- up and down signals are fed into a charge pump, which translates the digital encoded control information into analog voltage.
- Loop filter remove high frequency component, reduce jitter
- In voltage controlled oscillation, up signal speed up VCO, causes the local signal to catch up system clock. Down signal slow down VCO eliminating the phase lead of the local clock, until two clock synchronize
- goes to phase detector again
3-stage current starved VCO (can have more stages)
Timing: the time delay when signal propagates from flip-flop1 to flip-flop2
Sum of delay:
T
c
l
o
c
k
_
t
o
_
Q
+
T
w
i
r
e
+
T
l
o
g
i
c
+
T
w
i
r
e
+
T
s
e
t
u
p
≤
T
c
y
c
T_{clock\_to\_Q}+T_{wire}+T_{logic}+T_{wire}+T_{setup}\le T_{cyc}
Tclock_to_Q+Twire+Tlogic+Twire+Tsetup≤Tcyc
Latch based design VS flip-flop based design
Latch based | flip-flop bases |
---|---|
need data path(many data in parallel) | easy to design |
timing flexibility | strict timing |
time borrowing | |
timing verification(bad) |
can not use flip-flop and latch at the same part
time borrowing
Memory design - ROM RAM
ROM: read only memory
RAM: random access memory
both ROM and RAM are volatile memory
R O M { Mask programmed Programmable ROM Eraserable ROM(EOROM;EEROM) ROM\begin{cases} \text{Mask programmed}\\\text{Programmable ROM} \\\text{Eraserable ROM(EOROM;EEROM)}\end{cases} ROM⎩ ⎨ ⎧Mask programmedProgrammable ROMEraserable ROM(EOROM;EEROM)
R A M { static RAM dynammic RAM(high threshold voltage with body effect for less leakage but slower) RAM\begin{cases} \text{static RAM}\\\text{dynammic RAM(high threshold voltage with body effect for less leakage but slower)} \\\end{cases} RAM{static RAMdynammic RAM(high threshold voltage with body effect for less leakage but slower)
speed from high to low: flip-flop and latch > register file > cache(SRAM)
4bits*4bits=16 bits Rom
Column line is precharged(both RAM and ROM). Row input for both ROM and RAM, ROM column output, RAM column both input and output
read ROM data: active transistor will pull the whole column down, so that column output 0
Memory access time: row access time+ column access time (worst case)
Row access time (worst case): RC delay, from resistor R2 to R256(end of the row)
Column(bit line) access time: (worst case)
C
t
o
t
=
C
w
i
r
e
+
(
C
d
r
a
i
n
_
b
o
d
y
+
C
g
a
t
e
_
d
r
a
i
n
)
×
128
C_{tot}=C_{wire}+(C_{drain\_body}+C_{gate\_drain})\times 128
Ctot=Cwire+(Cdrain_body+Cgate_drain)×128
ROM compile: determine the number of row line and column line of the ROM
example:
32K ROM, =
2
15
=
32
,
168
=
2
1
×
2
14
=
2
2
×
2
13
=
2
3
×
2
11
=
.
.
.
=
2
7
×
2
18
2^{15}=32,168=2^1\times2^{14}=2^2\times2^{13}=2^3\times2^{11}=...=2^7\times2^{18}
215=32,168=21×214=22×213=23×211=...=27×218 any possible combination
Read and write RAM
two basic principle:
- we should be able to overwrite the existing data (if stored 1, overwrite to 0)
- when we read, we should be able to read without modifying the existing data
soft error: some outer IC α \alpha α particles have high energy, hit the IC and will change the data. To protect the IC from α \alpha α particles, we usually increase the cell size so decrease the density
store 0 and read 0:
- Stored 0: M1-on M5-off M2-off M6-on. Node N1=0.
- precharge phase will charge C to 1.
- Read 0: M4-on M3-on, C connect directly to Node N1. We want to keep the N1=0(do not modify existing data), so M1 still open, pull down the C to 0(successfully read 0).
Notice that M1 should be bigger(stronger) than M3 to pull down C
store 1 and write 0:
- store 1: M1-off M5-on M2-on M6-off, N1=0
- C C C and C ~ \tilde{C} C~ is charged to 1
- write 0: M3M4-off M1-on, M5-off, M2-off, M6-on, N1=0, N2=1.
data write circuit (DWC): W W W and W ~ \tilde{W} W~, W ~ = 1 \tilde{W}=1 W~=1, not write, both M1 M2 off
sense amp: magnify the small transition
SRAM:
- one memory cell has two task: input and output.
- read through column line
- The memory block is shared by many ALU and can be accessed at the same time
- multiple access transistor connect with different CPU. Multiple CPU can read synchronously and write same data synchronously.
- Memory control allow CPU write the same data at the same time- a device that detect what CPU write
SRAM | DRAM |
---|---|
fast | slow |
low voltage threshold | high voltage threshold for less leakage |
larger and low device density | smaller and high device density(only 1 transistor and 1 capacitor per cell) |
DRAM cell has two type:
- Trench cell: bigger area and deeper
- stack cell: cap above the gate
I/O
type of power source VDD:
- VDDD/GNDD: digital
- VDDA/GNDA: analog
- VDDM/GNDM: memory
Input protection device
the noise from power supply will cause latch up and ESD(electro static discharge), to prevent this, we need to use input protection device.
gate oxide break down voltage: approximate 40V to 100V
-
bounding wire and bounding pad:bounding wire is thick wire with large resistance, capacitance and inductance between CPU and mother board.
-
protect the gate oxide which will be affected by ESD: input pad. Schmitt trigger inside input pad can filter the noise. Input pad with ESD protection- a current limiting and diode clamps(large diodes)
-
output pad: require chain of successively large buffers.
-
bidirectional pad
two types of test: function test and electrical test
DFT: design for testability
Testing: the process by which a defect in the system can be exposed
input test vector into devices under test(DUT) then collect the response, the response has to be compared with reasonable output
BIST system: built in self test.
Faults:
- transmit fault — due to α \alpha α particle and power supply fluctuation
- intermittent fault — due to external reasons, like loose connection, humidity and temperature
the faults mention above are permanent - stuck-at 0
- stuck-at 1
- stuck-at open :output hold previous value, we have to initialize the value before test
- Delay fault\
- bridge fault ( use IDDQ test to check: check the current from VDD to GND, if internal circuit not activate(no input), but still current, something wrong.)