is CVC voltage drop.
The OVD circuit with typical parameters (See Table 1) has a
threshold charge value Qth =4.010-12 C. When C1 =C2 =CL , the minimal value
of CL providing OVD capacity for operation is about 1.010-12 F.
Influence of transistors M1 -M4 dimensions on LTD delay d is
determined by approximation [17]:
[pic]
where ~ is a sign of proportionality, Gn and Gp are the conductances of
NMOS and PMOS transistors respectively (CL =C1 =C2.)
Since [pic] and [pic] where W and L are width and length of
transistor channels of the corresponding conduction type, the LTD delay d
is proportional to [pic].
It has been obtained that for [pic], [pic], CL=1.0pF and Vdd-V=5.0V
the LTD delay d=7.6ns.
When LTD works jointly with the OVD in the speed-independent bus,
the real value of the LTD delay will increase by 30-40 percent due to
OVD's R1 effect on the effective power supply voltage.
To determine the appropriate value of R1 in the OVD circuit we must
know threshold input current Ith corresponding to threshold voltage drop
Vth recommended to be equal to 400mV.
Average input current Iav in transient state of one line is
determined by the expression Iav =CLv where v is the average rate of
increase in the output signal for an inverter included in LTD. For typical
values v=1.0109 Volts per second and CL =1.0pF, Iav =1.0mA. Accepting Ith
=0.4mA and Imax=2.0mA we obtain R1=1k and rb=100.
Simulation has shown that in this case OVD turning-on delay can be
approximated by an empirical expression:
ton[ns]=8.1+0.1n
where n is the address bus bit capacity. Total delay of recognizing address
transition ttot =dg+ton where g is a coefficient of the LTD delay increase
due to reducing power supply voltage. As we showed above g1.35. It can be
seen that if n=32, ttot=21.6ns.
4.4 Speed-independent adder
The circuit we use in this Section as a CL was a touch-stone for
many speed-independent circuit designers for about four decades. We mean a
ripple carry adder (RCA) which is actually a chain of one-bit full adders
(Fig.14).
Each full adder calculates two Boolean functions: sum si=aibici and
output carry ci+1=aibi+bici+aici where ai, bi are summands, ci is input
carry and stands for XOR operation.
In 1955 Gilchrist et al. proposed speed-independent RCA with carry
completion signal [18]. In 1960s that circuit was carefully analyzed and
improved [19-21]. In 1980 Seitz used RCA for illustrating his concept of
equipotential region and his approach to self-timed system design [4].
Now we use RCA as a CL for illustrating our approach to SIM design.
As it was shown in Section 4.2 the turn-on and turn-off delays of
the OVD circuit are proportional to the equivalent capacitance Ceq
associated with OVD circuit input. Capacitance Ceq depends linearly on a
number of gates N in CMOS CL. To speed up a SIM it is necessary to reduce a
number N. This can be reached by structural decomposition CMOS CL into
subcircuits CL1, CL2, etc. Each subcircuit CLi is connected to its own
detecting circuit OVDi or directly to the power supply if this subcircuit
transition does not affect the transition duration in CL as a whole. Each
detecting circuit OVDi generates its own OV signal which is combined with
other OVDs' output signals via a multi-input OR (NOR) element. The output
signal of that element serves as OV signal of the CMOS CL.
Multi-bit RCA computation time is determined by length of maximal
activated carry chain. A lot of papers were devoted to analysis of carry
generation and carry propagation in RCA [19-21], many of them contained
their own methods for estimation or calculation of average maximal
activated carry chain. We do not intend to add another one.
Let us have a look inside RCA. As it was mentioned above RCA
consists of one-bit full adders and each full adder consists of two
parts: forming sum si part and forming carry ci+1 part (Fig.16).
In multi-bit RCA all forming sum parts do not interact with each
other and do not affect on transition duration in RCA. Each forming carry
ci+1 part receives ci signal from preceding forming carry part and sends
ci+1 signal to consequent one.
To decompose RCA we use three heuristic tricks:
(i) All forming sum parts we connect directly to power supply.
(ii) We divide each forming carry part into three subcircuits denoted in
Fig.16 by numbers 1,2 and 3. All subcircuits 1 we connect directly to
power supply because they do not contain input ci and so do not contain
carry propagation path.
(iii) All subcircuits 2 we connect to OVD1 and all subcircuits 3 we
connect to OVD2. Outputs of OVD1 and OVD2 are connected to two-input
NOR-gate forming RCA OV signal in positive logic manner (Fig.17).
OVD1 and OVD2 input currents I1 and I2 curves for 6-bit RCA and
longest transition duration are shown in Fig.18.
Accepting Vth1,2=400mV we calculated the OVD circuits parameters. It
was obtained R11=5k, Ith1=0.08mA, R12=3k, Ith2=0.13mA. OVD1 and OVD2 delay
dependencies on a number of bits in RCA are shown in Fig.19.
4.5 Comparison of SIMs with synchronous counterparts
Transition duration in CL is a random variable. Probability of
transition with duration D is determined by implemented Boolean function
and distribution of input logical combinations. Domain of possible values
for variable D occupies the interval [0;Dmax]. Here Dmax is a length of
critical path in CL.
Let [pic] is a mathematical expectation of transition duration in CL
where Di is a length of i-th SPP in CL, pi is a probability of i-th path
being the longest activated SPP.
When CL works in the synchronous mode, the cycle duration Ts is
chosen with regard to maximal transition duration Dmax. Certain margin must
be added to Dmax to provide reliable operation of CL in the case of CL
parameter variations: Ts =kDmax where k is a margin coefficient.
In SIM cycle duration is a random variable with expectation Tsi =
gDme+toff+tif where g is a coefficient of CL delay increasing due to
reducing power supply voltage, toff is turn-off delay of the OVD circuit,
tif is an interface circuitry delay.
We determine efficiency E for speed-independent mode of CL operation
as relative increase of SIM performance in comparison to its synchronous
counterpart:[pic].
Generally, speed-independent mode is more efficient than synchronous
one if Ts >Tsi or, in other words, [pic].
In the case of RCA [pic] where tc is a delay of carry forming part,
n is a number of full adders in RCA.
It has been shown [19] that in n-bit RCA Dme tclog2(5n/4). Then, in
the case of speed-independent operation Tsi=gtclog2(5n/4)+toff+tif.
We have obtained dependencies of Ts , Tsi on a number of bits in
RCA that are shown in Fig.20. As it can be seen, speed-independent
operation of RCA is more efficient while n>8.
5.Conclusion
6.Acknowledgement
I would like to thank Igor Shagurin and Vlad Tsylyov of the Moscow
Physical Engineering Institute for helpful discussions of this work. I am
also grateful to Chris Jesshope of University of Surrey and Mark Josephs of
Oxford University who kindly provided the latest material on their research
in the area of delay-insensitive circuit design.
References
[1] Miller, R.E., Switching theory (Wiley, New York, 1965),
vol.2, Chapter 10.
[2] Unger, S.H., Asynchronous Sequential Switching Circuits
(Wiley, New York, 1969).
[3] Armstrong, D.B., A.D. Friedman, and P.R. Menon, Design of
Asynchronous Circuits Assuming Unbounded Gate Delays, IEEE
Trans.on Computers C-18 (12) (1969) 1110-1120.
[4] Seitz, C.L., System timing, in: C.A. Mead and L.A. Conway,
eds., Introduction to VLSI Systems (Addison-Wesley, New
York, 1980), Chapter 7.
[5] Izosimov, O.A., I.I. Shagurin, and V.V. Tsylyov, Physical
approach to CMOS module self-timing, Electronics Letters 26 (22)
(1990) 1835-1836.
[6] Veendrick, H.J.M., Short-circuit dissipation of static CMOS
circuit and its impact on the design of buffer circuits,
IEEE J. Solid-State Circuits SC-19 (4) (1984) 468-473.
[7] Chappell, B.A, T.I. Chappell, S.E. Schuster, H.M. Segmuller,
J.W. Allan, R.L. Franch, and P.J. Restle, Fast CMOS ECL
receivers with 100-mV worst-case sensitivity, IEEE J. Solid-State
Circuits SC-23 (1) (1988) 59-67.
[8] Chu, S.T., J. Dikken, C.D. Hartgring, F.J. List, J.G.
Raemaekers, S.A. Bell, B. Walsh, and R.H.W. Salters, A 25-ns
Low-Power Full-CMOS 1-Mbit (128K8) SRAM, IEEE J. Solid-State
Circuits SC-23 (5) (1988) 1078-1084.
[9] Frank, E.H., and R.F. Sproull, A Self-Timed Static RAM, in:
Proc. Third Caltech VLSI Conference (Springer-Verlag,
Berlin, 1983) pp.275-285.
[10] Donoghue, W.J., and G.E. Noufer, Circuit for address transition
detection, US Patent 4563599, 1986.
[11] Huang, J.S.T., and J.W. Schrankler, Switching characteristics
of scaled CMOS circuits at 77K, IEEE Trans. on Electron
Devices ED-34 (1) (1987) 101-106.
[12] Gilchrist, B., J.H. Pomerene, and S.Y. Wong, Fast Carry Logic
for Digital Computers, IRE Trans. on Electronic Computers EC-4
(4) (1955) 133-136.
[13] Hendrickson, H.C., Fast High-Accuracy Binary Parallel
Addition, IRE Trans. on Electronic Computers EC-9 (4) (1960)
465-469.
[14] Majerski, S., and M. Wiweger, NOR-Gate Binary Adder with Carry
Completion Detection, IEEE Trans. on Electronic Computers EC-16
(1) (1967) 90-92.
[15] Reitwiesner, G.W., The determination of carry propagation
length for binary addition, IRE Trans. on Electronic Computers
EC-9 (1) (1960) 35-38.
Appendix
SPICE2G.6: MOSFET model parameters
| | | | | | |
| | | | |VALUE | |
| |Name |Parameter |Units |PMOS |NMOS |
|1 |level |model index |- |3 |3 |
|2 |VTO |ZERO-BIAS THRESHOLD VOLTAGE |V |-1.337 |1.161 |
|3 |KP |TRANSCONDUCTANCE | | | |
| | |PARAMETER |A/V2 |2.310-5 |4.610-5 |
|4 |GAMMA |BULK THRESHOLD PARAMETER |[pic] |0.501 |0.354 |
|5 |PHI |SURFACE POTENTIAL |V |0.695 |0.660 |
|6 |RD |DRAIN OHMIC RESISTANCE |OHM |333 |85 |
|7 |RS |SOURCE OHMIC RESISTANCE |OHM |333 |85 |
|8 |CBD |ZERO-BIAS B-D JUNCTION | | | |
| | |CAPACITANCE |F |1.9810-14|6.910-15 |
|9 |CBS |ZERO-BIAS B-S JUNCTION | | | |
|10|IS |BULK JUNCTION SATURATION | | | |
| | |CURRENT |A |3.4710-15|9.2210-15|
|11|PB |BULK JUNCTION POTENTIAL |V |0.8 |0.8 |
|12|CGSO |GATE-SOURCE OVERLAP CAPACI- | | | |
| | |TANCE PER METER CHANNEL WIDTH|F/M |6.7010-10|3.3010-10|
|13|CGDO |GATE-DRAIN OVERLAP CAPACI- | | | |
|14|CGBO |GATE-BULK OVERLAP CAPACITANCE| | | |
| | | |F/M |1.9010-9 |2.6010-9 |
| | |PER METER CHANNEL LENGTH | | | |
|15|RSH |DRAIN AND SOURCE DIFFUSION | | | |
| | |SHEET RESISTANCE |OHM/SQ|55 |30 |
|16|CJ |ZERO-BIAS BULK JUNCTION | | | |
| | |BOTTOM | | | |
| | |CAPACITANCE PER SQ METER OF |F/M2 |3.5310-4 |1.2410-4 |
| | |JUNCTION AREA | | | |
|17|MJ |BULK JUNCTION BOTTOM GRADING | | | |
| | |COEFFICIENT |- |0.5 |0.5 |
|18|CJSW |ZERO-BIAS BULK JUNCTION SIDE-| | | |
| | |WALL CAPACITANCE PER METER OF|F/M |1.7110-10|3.2010-11|
| | |JUNCTION PERIMETER | | | |
Страницы: 1, 2, 3