1
Statistical Analysis and Optimization of Asynchronous Digital Circuits
Tsung-Te Liu and Jan M. Rabaey University of California, Berkeley
Outline
• Motivation
• Variability model of CMOS digital circuit
• Performance model for different timing schemes
• Performance comparison
• Conclusion
3
Variability Continues to Increase as Technology and Voltage Scales Down
Device variability vs. Technology node
-80% ~ +110% @0.3V -40% ~ +30% @1V
Normalized Delay
Delay spread due to process variations
Normalized Delay
Count Count
• Higher variability with finer design rules and larger wafers
• Higher variability with lower supply voltages
[Cao, ASU]
Circuit Performance Characteristics with Different Timing Schemes
Original circuit
Self-timed circuit
Conventional synchronous circuit
Computation Delay
Probability
• Self-timed circuit is a variation-monitoring circuit by itself
• Becomes advantageous when the variation is large (B>A)
• Statistical analysis framework is necessary
B: 3σ delay variation A: protocol circuit delay A
B
Statistical Analysis Framework
5
Circuit Variability Model
• Supply voltage
• Logic depth
• Width and length
• Body bias
Performance Model
• Computation overhead
• Communication overhead
• Delay and energy performance
Delay
Energy
0
Processors
Communications Sensors
Determine the optimal timing strategy in the presence of variability
Outline
• Motivation
• Variability model of CMOS digital circuit
• Performance model for different timing schemes
• Performance comparison
• Conclusion
Delay Model of CMOS Digital Circuit
7
• One unified current model across different operating regions
• Model error <2% from 0.3V to 1V
4-stage FO4 INV chain
0.2 0.4 0.6 0.8 1
100 101 102
Supply Voltage [V]
Delay [FO4(@V DD=1V)]
Simulation data Model
I ! (VDD "Vth)2
1+ VDD "Vth EsatL
#
$% &
'(
I !exp VDD "Vth S
#
$% &
'(
I !
ln 1+exp VDD "Vth 2S
#
$% &
'( )
*+ ,
-. /0
1
23 4
2
1+ln 1+exp VDD "Vth EsatL
#
$% &
'( )
*+ ,
-. /
05 15
2 35 45
0.2 0.4 0.6 0.8 1
ï1.5 ï1 ï0.5 0 0.5 1
Supply Voltage [V]
Error [%]
Delay Variability Model
Within die variation (WID)
“Local mismatch”
Die-to-die variation (DTD)
“Global variation”
!Td
µTd = STd
Vth
( )
2 ! !µVthVth
"
#
$$
%
&
''
2
+ ST
d
( )
K 2 ! !µKK
"
#$$ %
&
''
2 ST
d
Vth
=
!Vth Vth
!Td Td
0.2 0.4 0.6 0.8 1
0 5 10 15 20 25
Supply Voltage [V]
σ/μ [%]
Simulation data Model (WID)
Model (Threshold voltage) Model (Geometry)
0.2 0.4 0.6 0.8 1
0 5 10 15 20
Supply Voltage [V]
σ/μ [%]
Simulation data Model (DTD)
Model (Threshold voltage) Model (Geometry)
Threshold voltage Geometry
ST
d
K =
!K K
!Td Td
.
Delay Variability Model
9
0.2 0.4 0.6 0.8 1
0 5 10 15 20 25 30
Supply Voltage [V]
m/µ [%]
Simulation data Model (total) Model (DTD) Model (WID)
0.2 0.4 0.6 0.8 1
ï8 ï6 ï4 ï2 0 2 4
Supply Voltage [V]
Error [%]
!Td,total µTd,total =
!Td,DTD µTd,DTD
!
"
##
$
%
&
&
2
+ !Td,WID µTd,WID
!
"
##
$
%
&
&
2
• Model error <8% from 0.3V to 1V
• Local mismatch dominates at low supply voltages
0.2 0.4 0.6 0.8 1 5
10 15 20 25 30
Supply Voltage [V]
m/µ [%]
Simulation data (n=4) Model (n=4)
Simulation data (n=8) Model (n=8)
Simulation data (n=24) Model (n=24)
Delay Variability Model with Different Logic Depths
!Td,total_n µTd,total_n =
!Td,DTD_ 4 µTd,DTD_ 4
!
"
##
$
%
&
&
2
+ 4 n
!
"
# $
%&' !Td,WID_ 4 µTd,WID_ 4
!
"
##
$
%
&
&
2
0.2 0.4 0.6 0.8 1
ï10 ï5 0 5 10 15
Supply Voltage [V]
Error [%]
n=4 n=8 n=24
• Use 4-stage inverter chain model as baseline model
• Model error <13% for n=8 and <15% for n=24
Outline
• Motivation
• Variability model of CMOS digital circuit
• Performance model for different timing schemes
• Performance comparison
• Conclusion
11
Delay Overhead Evaluation
Original circuit
Dual-rail timing
Synchronous timing
Computation Delay
Probability
• Assumption: Process variation follows Gaussian distribution
• Dual-rail approach: have only protocol overhead but no delay overhead
• Synchronous approach: have only delay overhead
B: 3σ delay variation A: protocol circuit delay A
B
Dsync = 3!logic,total
µlogic,total
For 99.7% yield:
Bundled-Data Self-Timed Approach
13
Main data path
fdelay!line = N(µdelay!line,!delay!line2 ) Goal:
Assume main data path and replica delay line exhibit similar statistics:
Dbundled!data = µdelay!line !µlogic µlogic
where
flogic(t)= N
(
µlogic,!logic2)
P t
(
logic ! tdelay"line)
#1Dbundled!data = Dvariation2 " 0.5+ 0.25+ 2 Dvariation2
#
$%% &
'((
Dvariation = 3!logic,WID µlogic,WID
Replica delay line Probability
Computation Delay
Main data path Replica delay line
For 99.7% yield:
0 50 100 150 200 0
100 200 300 400 500 600
Process Variability [%]
Delay Overhead [%]
Bundled-Data Delay Overhead
O(n2)
O(n)
Dbundled!data "
2 #Dvariation, when Dvariation $ 0 Dvariation2 , when Dvariation $ %
.
&
'( )(
• Delay overhead becomes much larger as process variability increases!
Performance Model under Variations
15
Eleakage=VIleakageTdelay
Tcomp= Tdelay (1+P+D) Edynamic=αCswitchV2
Etotal=αCswitchV2
+VIleakageTdelay Tcomp= Tdelay
Eleakage=VIleakage(1+P)Tdelay (1+P+D) Edynamic=αCswitch(1+P)V2
Etotal=αCswitch(1+P)V2
+VIleakage(1+P)Tdelay (1+P+D) Original delay and energy model Statistical delay and energy model
Timing scheme Synchronous Bundled-Data Dual-Rail Delay Overhead (D) Dsync Dbundled-data 0 Protocol Overhead (P) 0 Pbundled-data Pdual-rail
• Evaluate computation delay and energy under variations
• Overhead changes with supply voltage and logic depth
Outline
• Motivation
• Variability model of CMOS digital circuit
• Performance model for different timing schemes
• Performance comparison
• Conclusion
17
• Global variation affects only synchronous approach
• Local mismatch dominates at low supply voltages
• Local mismatch has less impact on longer critical path
4-stage FO4 INV chain
Delay Overhead Comparison
24-stage FO4 INV chain
0.2 0.4 0.6 0.8 1
0 20 40 60 80 100 120
Supply Voltage [V]
Delay Overhead [%]
Synchronous Timing
BundledïData SelfïTiming
0.2 0.4 0.6 0.8 1
0 10 20 30 40 50 60 70
Supply Voltage [V]
Delay Overhead [%]
Synchronous Timing
BundledïData SelfïTiming
• Assumption: Pbundled-data = 1TFO4; Pdual-rail = 2TFO4
• Synchronous scheme is better for small critical path at high supply voltages
• Dual-rail scheme is better for large critical path at low supply voltages
Speed Performance Comparison
4-stage FO4 INV chain 24-stage FO4 INV chain
0.2 0.4 0.6 0.8 1
0.8 0.9 1 1.1 1.2 1.3
Supply Voltage [V]
Normalized Delay
DualïRail SelfïTiming BundledïData SelfïTiming
0.2 0.4 0.6 0.8 1
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Supply Voltage [V]
Normalized Delay
DualïRail SelfïTiming BundledïData SelfïTiming
19
Energy Performance Comparison
24-stage FO4 INV chain
0.2 0.4 0.6 0.8 1
0 10 20 30 40 50 60
Supply [V]
Energy [fJ]
Synchronous Timing (_ = 0.1) DualïRail SelfïTiming (_ = 0.1) BundledïData Selfïtiming (_ = 0.1)
0.2 0.4 0.6 0.8 1
20 30 40 50 60 70
Supply [V]
Energy [fJ]
EnergyïDelay Plot
Synchronous Timing (_ = 0.01) DualïRail SelfïTiming (_ = 0.01) BundledïData Selfïtiming (_ = 0.01)
• Synchronous scheme is better for high activity at high supply voltages
• Dual-rail scheme is better for low activity at low supply voltages
• Leakage dominates for low activity at low supply voltages
Conclusion
• A statistical analysis framework is proposed to evaluate performance of CMOS digital circuit in the presence of process variations.
• Designer can efficiently determine the optimal timing
strategy, pipeline depth and supply voltage based on the proposed variability and statistical performance models.
• Asynchronous design exhibits better energy and delay
characteristics for circuits with low activity and larger critical path delay under process variations
21
Acknowledgement
• Berkeley Wireless Research Center
• NSF Infrastructure Grant
• STMicroelectronics
• Multiscale System Center