May 10, 2012
University of Toronto
Fine-Grained Bandwidth
Adaptivity in Networks-on-Chip Using Bidirectional Channels
Robert Hesse, Jeff Nicholls, Natalie Enright Jerger
Motivation
Motivation
• NoCs are crucial for scaling CMPs
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static
BW
Time
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static –Bandwidth requirements are highly
dynamic
BW
Time
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static –Bandwidth requirements are highly
dynamic
• Current solution:
BW
Time
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static –Bandwidth requirements are highly
dynamic
• Current solution:
–Over-provisioned link BW
BW
Time
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static –Bandwidth requirements are highly
dynamic
• Current solution:
–Over-provisioned link BW
Average channel utilization: < 5%
BW
Time
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static –Bandwidth requirements are highly
dynamic
• Current solution:
–Over-provisioned link BW
• Our solution:
Average channel utilization: < 5%
BW
Time
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static –Bandwidth requirements are highly
dynamic
• Current solution:
–Over-provisioned link BW
• Our solution:
–Adapt link BW to demands
Average channel utilization: < 5%
BW
Time
Motivation
• NoCs are crucial for scaling CMPs
• Problem:
–NoC bandwidth resources are static –Bandwidth requirements are highly
dynamic
• Current solution:
–Over-provisioned link BW
• Our solution:
–Adapt link BW to demands
Average channel utilization: < 5%
Save up to 75% of BW resources BW
Time
Motivation - Static NoC
Motivation - Static NoC
• Static Topology
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
Motivation - Static NoC
• Static Topology
• Static Bandwidth
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
Motivation - Static NoC
• Static Topology
• Static Bandwidth
• Static workloads for evaluation
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
Uniform Random
Motivation - Static NoC
• Static Topology
• Static Bandwidth
• Static workloads for
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
01 23 4 5 6 7 8 9 10 111213 1415
Uniform Random
Motivation - Static NoC
• Static Topology
• Static Bandwidth
• Static workloads for evaluation
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
01 23 4 5 6 7 8 9 10 1112 13 1415
Transpose
Motivation - Static NoC
• Static Topology
• Static Bandwidth
• Static workloads for
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
01 23 4 5 6 7 8 9 10 1112 13 1415
Transpose
• Specified at design time for worst case scenario
Motivation - Static NoC
• Static Topology
• Static Bandwidth
• Static workloads for evaluation
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
01 23 4 5 6 7 8 9 10 1112 13 1415
Transpose
• Specified at design time for worst case scenario
• Static NoCs can handle temporally- and spatially- stable traffic well
Motivation - Real NoC Traffic
Blackscholes
01 23 4 5 6 7 8 9 10 1112 13 1415
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
Motivation - Real NoC Traffic
• Highly dynamic workloads
Blackscholes
01 23 4 5 6 7 8 9 10 1112 13 1415
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
Motivation - Real NoC Traffic
• Highly dynamic workloads
• Large temporal and spatial BW variance
Blackscholes
01 23 4 5 6 7 8 9 10 1112 13 1415
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
Motivation - Real NoC Traffic
• Highly dynamic workloads
• Large temporal and spatial BW variance
0 1 23 4 5 6 7 8 9 10 111213 1415
Streamcluster
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
Motivation - Real NoC Traffic
• Highly dynamic workloads
• Large temporal and spatial BW variance
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
0 1 23 4 5 6 7 8 9 10 111213 1415
Streamcluster
• Significant area and power overhead with traditional NoC implementation
Motivation - Real NoC Traffic
• Highly dynamic workloads
• Large temporal and spatial BW variance
IP
0
IP
4
IP
1
IP
5
IP
2
IP
6
IP
8
IP
9
IP
10
IP
3
IP
7
IP
11
IP
12
IP
13
IP
14
IP
15
0 1 23 4 5 6 7 8 9 10 111213 1415
Streamcluster
• Significant area and power overhead with traditional NoC implementation
• Channels are underutilized most of the time
Channel Utilization
Channel Utilization
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
Blackscholes"
Bodytrack"
Canneal"
Facesim"
Ferret"
Fluidanimate"
Raytrace"
Stream cluster"
Swap?ons"
Vips" Avg."
Channel'U)liza)on'(%)' Max."U?liza?on"
Avg."U?liza?on"
Channel Utilization
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
Blackscholes"
Bodytrack"
Canneal"
Facesim"
Ferret"
Fluidanimate"
Raytrace"
Stream cluster"
Swap?ons"
Vips" Avg."
Channel'U)liza)on'(%)' Max."U?liza?on"
Avg."U?liza?on"
3.42%
Channel Utilization
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
Blackscholes"
Bodytrack"
Canneal"
Facesim"
Ferret"
Fluidanimate"
Raytrace"
Stream cluster"
Swap?ons"
Vips" Avg."
Channel'U)liza)on'(%)' Max."U?liza?on"
Avg."U?liza?on"
R R
8
8 R R
4
4 R R
2 2
• Adjust channel width (flit width):
3.42%
Channel Utilization
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
Blackscholes"
Bodytrack"
Canneal"
Facesim"
Ferret"
Fluidanimate"
Raytrace"
Stream cluster"
Swap?ons"
Vips" Avg."
Channel'U)liza)on'(%)' Max."U?liza?on"
Avg."U?liza?on"
0"
10"
20"
30"
40"
50"
60"
70"
80"
Blackscholes"
Bodytrack"
Canneal"
Facesim"
Ferret"
Fluidanimate"
Raytrace"
Stream cluster"
SwapBons"
Vips" Avg."
Latency((cycles)(
8"Bytes"
4"Bytes"
2"Bytes"
Channel"
width:"
R R
8
R R
4
R R
2
• Adjust channel width (flit width):
3.42%
Channel Utilization
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
Blackscholes"
Bodytrack"
Canneal"
Facesim"
Ferret"
Fluidanimate"
Raytrace"
Stream cluster"
Swap?ons"
Vips" Avg."
Channel'U)liza)on'(%)' Max."U?liza?on"
Avg."U?liza?on"
0"
10"
20"
30"
40"
50"
60"
70"
80"
Blackscholes"
Bodytrack"
Canneal"
Facesim"
Ferret"
Fluidanimate"
Raytrace"
Stream cluster"
SwapBons"
Vips" Avg."
Latency((cycles)(
8"Bytes"
4"Bytes"
2"Bytes"
Channel"
width:"
R R
8
8 R R
4
4 R R
2 2
• Adjust channel width (flit width):
Reducing flit width leads to unacceptable latency increase
3.42%
Bidirectional Channels
• Bidirectional channels to share channel resources:
AB A+B
R A R
B
time
Bidirectional Channels
• Bidirectional channels to share channel resources:
AB A+B
R A R
B
time
Bidirectional Channels
• Bidirectional channels to share channel resources:
R A+B R
AB A+B
R A R
B
time
Bidirectional Channels
• Bidirectional channels to share channel resources:
R A+B R
AB A+B
R A R
B
time b
b b Keep flit size!
Bidirectional Channels
• Bidirectional channels to share channel resources:
R A+B R
AB A+B
R A R
B
time b
b b Keep flit size!
• Adding flexibility with fine-grained BW adaptivity
R R R R R R
Bidirectional Channels
• Bidirectional channels to share channel resources:
R A+B R
AB A+B
R A R
B
time b
b b Keep flit size!
• Adding flexibility with fine-grained BW adaptivity
R R R R R R
b/n b/n
b/n b/n
Need to sub-divide flits
b/n b/n
Decoupling Flit Width From Channel Width
• Conventionally in NoC, flit width is coupled to channel width
Flit (b)
Router 1 Router 2
b
Decoupling Flit Width From Channel Width
• Conventionally in NoC, flit width is coupled to channel width
Flit (b)
Router 1 Router 2
b
Phit-Serial Communication
• Conventionally in NoC, flit width is coupled to channel width
• Break flits (flow control units) into phits
(physical transfer units) to decouple channel width from flit width
Router 1 Router 2
b/n
(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit
b/n b/n Flit
(b)
serialize
Phit-Serial Communication
• Conventionally in NoC, flit width is coupled to channel width
• Break flits (flow control units) into phits
(physical transfer units) to decouple channel width from flit width
Router 1 Router 2
b/n
(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit
b/n b/n b/n
Phit-Serial Communication
• Conventionally in NoC, flit width is coupled to channel width
• Break flits (flow control units) into phits
(physical transfer units) to decouple channel width from flit width
Router 1 Router 2
b/n b/n b/n
Flit (b)
deserialize
Phit-Serial Communication
• Conventionally in NoC, flit width is coupled to channel width
• Break flits (flow control units) into phits
(physical transfer units) to decouple channel width from flit width
Router 1 Router 2
(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit
b/n b/n Flit
(b)
serialize
Phit-Serial Communication
• Conventionally in NoC, flit width is coupled to channel width
• Break flits (flow control units) into phits
(physical transfer units) to decouple channel width from flit width
Router 1 Router 2
(b/n)Phit (b/n)Phit (b/n)Phit
(b/n)Phit
b/n
Phit-Serial Communication
• Conventionally in NoC, flit width is coupled to channel width
• Break flits (flow control units) into phits
(physical transfer units) to decouple channel width from flit width
Router 1 Router 2
(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit
b/n b/n
Phit-Serial Communication
• Conventionally in NoC, flit width is coupled to channel width
• Break flits (flow control units) into phits
(physical transfer units) to decouple channel width from flit width
Router 1 Router 2
(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit
b/n
Flit (b)
deserialize
Microarchitecture
• Bandwidth-Adaptive Router (BAR): Only minimal modifications to standard VC router necessary
• Intra- & inter-router flow control is still flit- based
Bandwidth Allocation
• Pressure-based allocation of channels to directions
Bandwidth Allocation
• Pressure-based allocation of channels to directions
A0 A1 A2 A3
D0 D1 D2 D3
Bandwidth Allocation
• Pressure-based allocation of channels to directions
A0 A1 A2 A3
D0 D1
Bandwidth Allocation
• Pressure-based allocation of channels to directions
D0 D1 D2 D3
Example
Flit A Flit
B Flit
C
Flit D
Example
Flit A Flit
B Flit
C
Flit D
Example
A0 A1 A2 A3 Flit
B Flit
C
D0 D1 D2 D3
Example
A0 A1 A2
A3
Flit B Flit
C
D0 D1
D2 D3
Example
A0 Flit A1
C
D0 D1
D2 D3 A2
A3 B0 B1 B2 B3
Example
A0 A1
Flit C
D0 D1 D2
D3 A2 A3 B0 B1
B2 B3