• Ingen resultater fundet

Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels

N/A
N/A
Info
Hent
Protected

Academic year: 2022

Del "Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels"

Copied!
97
0
0

Indlæser.... (se fuldtekst nu)

Hele teksten

(1)

May 10, 2012

University of Toronto

Fine-Grained Bandwidth

Adaptivity in Networks-on-Chip Using Bidirectional Channels

Robert Hesse, Jeff Nicholls, Natalie Enright Jerger

(2)

Motivation

(3)

Motivation

• NoCs are crucial for scaling CMPs

(4)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

(5)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static

BW

Time

(6)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static Bandwidth requirements are highly

dynamic

BW

Time

(7)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static Bandwidth requirements are highly

dynamic

• Current solution:

BW

Time

(8)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static Bandwidth requirements are highly

dynamic

• Current solution:

Over-provisioned link BW

BW

Time

(9)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static Bandwidth requirements are highly

dynamic

• Current solution:

Over-provisioned link BW

Average channel utilization: < 5%

BW

Time

(10)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static Bandwidth requirements are highly

dynamic

• Current solution:

Over-provisioned link BW

• Our solution:

Average channel utilization: < 5%

BW

Time

(11)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static Bandwidth requirements are highly

dynamic

• Current solution:

Over-provisioned link BW

• Our solution:

Adapt link BW to demands

Average channel utilization: < 5%

BW

Time

(12)

Motivation

• NoCs are crucial for scaling CMPs

• Problem:

NoC bandwidth resources are static Bandwidth requirements are highly

dynamic

• Current solution:

Over-provisioned link BW

• Our solution:

Adapt link BW to demands

Average channel utilization: < 5%

Save up to 75% of BW resources BW

Time

(13)

Motivation - Static NoC

(14)

Motivation - Static NoC

Static Topology

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

(15)

Motivation - Static NoC

Static Topology

Static Bandwidth

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

(16)

Motivation - Static NoC

Static Topology

Static Bandwidth

Static workloads for evaluation

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

Uniform Random

(17)

Motivation - Static NoC

Static Topology

Static Bandwidth

Static workloads for

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

01 23 4 5 6 7 8 9 10 111213 1415

Uniform Random

(18)

Motivation - Static NoC

Static Topology

Static Bandwidth

Static workloads for evaluation

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

01 23 4 5 6 7 8 9 10 1112 13 1415

Transpose

(19)

Motivation - Static NoC

Static Topology

Static Bandwidth

Static workloads for

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

01 23 4 5 6 7 8 9 10 1112 13 1415

Transpose

Specified at design time for worst case scenario

(20)

Motivation - Static NoC

Static Topology

Static Bandwidth

Static workloads for evaluation

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

01 23 4 5 6 7 8 9 10 1112 13 1415

Transpose

Specified at design time for worst case scenario

Static NoCs can handle temporally- and spatially- stable traffic well

(21)

Motivation - Real NoC Traffic

Blackscholes

01 23 4 5 6 7 8 9 10 1112 13 1415

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

(22)

Motivation - Real NoC Traffic

Highly dynamic workloads

Blackscholes

01 23 4 5 6 7 8 9 10 1112 13 1415

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

(23)

Motivation - Real NoC Traffic

Highly dynamic workloads

Large temporal and spatial BW variance

Blackscholes

01 23 4 5 6 7 8 9 10 1112 13 1415

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

(24)

Motivation - Real NoC Traffic

Highly dynamic workloads

Large temporal and spatial BW variance

0 1 23 4 5 6 7 8 9 10 111213 1415

Streamcluster

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

(25)

Motivation - Real NoC Traffic

Highly dynamic workloads

Large temporal and spatial BW variance

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

0 1 23 4 5 6 7 8 9 10 111213 1415

Streamcluster

Significant area and power overhead with traditional NoC implementation

(26)

Motivation - Real NoC Traffic

Highly dynamic workloads

Large temporal and spatial BW variance

IP

0

IP

4

IP

1

IP

5

IP

2

IP

6

IP

8

IP

9

IP

10

IP

3

IP

7

IP

11

IP

12

IP

13

IP

14

IP

15

0 1 23 4 5 6 7 8 9 10 111213 1415

Streamcluster

Significant area and power overhead with traditional NoC implementation

Channels are underutilized most of the time

(27)

Channel Utilization

(28)

Channel Utilization

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Blackscholes"

Bodytrack"

Canneal"

Facesim"

Ferret"

Fluidanimate"

Raytrace"

Stream cluster"

Swap?ons"

Vips" Avg."

Channel'U)liza)on'(%)' Max."U?liza?on"

Avg."U?liza?on"

(29)

Channel Utilization

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Blackscholes"

Bodytrack"

Canneal"

Facesim"

Ferret"

Fluidanimate"

Raytrace"

Stream cluster"

Swap?ons"

Vips" Avg."

Channel'U)liza)on'(%)' Max."U?liza?on"

Avg."U?liza?on"

3.42%

(30)

Channel Utilization

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Blackscholes"

Bodytrack"

Canneal"

Facesim"

Ferret"

Fluidanimate"

Raytrace"

Stream cluster"

Swap?ons"

Vips" Avg."

Channel'U)liza)on'(%)' Max."U?liza?on"

Avg."U?liza?on"

R R

8

8 R R

4

4 R R

2 2

• Adjust channel width (flit width):

3.42%

(31)

Channel Utilization

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Blackscholes"

Bodytrack"

Canneal"

Facesim"

Ferret"

Fluidanimate"

Raytrace"

Stream cluster"

Swap?ons"

Vips" Avg."

Channel'U)liza)on'(%)' Max."U?liza?on"

Avg."U?liza?on"

0"

10"

20"

30"

40"

50"

60"

70"

80"

Blackscholes"

Bodytrack"

Canneal"

Facesim"

Ferret"

Fluidanimate"

Raytrace"

Stream cluster"

SwapBons"

Vips" Avg."

Latency((cycles)(

8"Bytes"

4"Bytes"

2"Bytes"

Channel"

width:"

R R

8

R R

4

R R

2

• Adjust channel width (flit width):

3.42%

(32)

Channel Utilization

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Blackscholes"

Bodytrack"

Canneal"

Facesim"

Ferret"

Fluidanimate"

Raytrace"

Stream cluster"

Swap?ons"

Vips" Avg."

Channel'U)liza)on'(%)' Max."U?liza?on"

Avg."U?liza?on"

0"

10"

20"

30"

40"

50"

60"

70"

80"

Blackscholes"

Bodytrack"

Canneal"

Facesim"

Ferret"

Fluidanimate"

Raytrace"

Stream cluster"

SwapBons"

Vips" Avg."

Latency((cycles)(

8"Bytes"

4"Bytes"

2"Bytes"

Channel"

width:"

R R

8

8 R R

4

4 R R

2 2

• Adjust channel width (flit width):

Reducing flit width leads to unacceptable latency increase

3.42%

(33)

Bidirectional Channels

Bidirectional channels to share channel resources:

AB A+B

R A R

B

time

(34)

Bidirectional Channels

Bidirectional channels to share channel resources:

AB A+B

R A R

B

time

(35)

Bidirectional Channels

Bidirectional channels to share channel resources:

R A+B R

AB A+B

R A R

B

time

(36)

Bidirectional Channels

Bidirectional channels to share channel resources:

R A+B R

AB A+B

R A R

B

time b

b b Keep flit size!

(37)

Bidirectional Channels

Bidirectional channels to share channel resources:

R A+B R

AB A+B

R A R

B

time b

b b Keep flit size!

Adding flexibility with fine-grained BW adaptivity

R R R R R R

(38)

Bidirectional Channels

Bidirectional channels to share channel resources:

R A+B R

AB A+B

R A R

B

time b

b b Keep flit size!

Adding flexibility with fine-grained BW adaptivity

R R R R R R

b/n b/n

b/n b/n

Need to sub-divide flits

b/n b/n

(39)

Decoupling Flit Width From Channel Width

• Conventionally in NoC, flit width is coupled to channel width

Flit (b)

Router 1 Router 2

b

(40)

Decoupling Flit Width From Channel Width

• Conventionally in NoC, flit width is coupled to channel width

Flit (b)

Router 1 Router 2

b

(41)

Phit-Serial Communication

• Conventionally in NoC, flit width is coupled to channel width

• Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

b/n

(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit

b/n b/n Flit

(b)

serialize

(42)

Phit-Serial Communication

• Conventionally in NoC, flit width is coupled to channel width

• Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

b/n

(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit

b/n b/n b/n

(43)

Phit-Serial Communication

• Conventionally in NoC, flit width is coupled to channel width

• Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

b/n b/n b/n

Flit (b)

deserialize

(44)

Phit-Serial Communication

• Conventionally in NoC, flit width is coupled to channel width

• Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit

b/n b/n Flit

(b)

serialize

(45)

Phit-Serial Communication

• Conventionally in NoC, flit width is coupled to channel width

• Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

(b/n)Phit (b/n)Phit (b/n)Phit

(b/n)Phit

b/n

(46)

Phit-Serial Communication

• Conventionally in NoC, flit width is coupled to channel width

• Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit

b/n b/n

(47)

Phit-Serial Communication

• Conventionally in NoC, flit width is coupled to channel width

• Break flits (flow control units) into phits

(physical transfer units) to decouple channel width from flit width

Router 1 Router 2

(b/n)Phit (b/n)Phit (b/n)Phit (b/n)Phit

b/n

Flit (b)

deserialize

(48)

Microarchitecture

Bandwidth-Adaptive Router (BAR): Only minimal modifications to standard VC router necessary













































 







• Intra- & inter-router flow control is still flit- based

(49)

Bandwidth Allocation

• Pressure-based allocation of channels to directions





















 





 





























































(50)

Bandwidth Allocation

• Pressure-based allocation of channels to directions





















 





 





























































A0 A1 A2 A3

D0 D1 D2 D3

(51)

Bandwidth Allocation

• Pressure-based allocation of channels to directions





















 





 





























































A0 A1 A2 A3

D0 D1

(52)

Bandwidth Allocation

• Pressure-based allocation of channels to directions





















 





 





























































D0 D1 D2 D3

(53)





























 







 

















































































Example

Flit A Flit

B Flit

C

Flit D

(54)





























 







 























































































Example

Flit A Flit

B Flit

C

Flit D

(55)





























 







 

















































































Example

A0 A1 A2 A3 Flit

B Flit

C

D0 D1 D2 D3

(56)





























 







 























































































Example

A0 A1 A2

A3

Flit B Flit

C

D0 D1

D2 D3

(57)





























 







 

















































































Example

A0 Flit A1

C

D0 D1

D2 D3 A2

A3 B0 B1 B2 B3

(58)





























 







 























































































Example

A0 A1

Flit C

D0 D1 D2

D3 A2 A3 B0 B1

B2 B3

Referencer

RELATEREDE DOKUMENTER

[r]

When performing delay matching of an asyn- chronous circuit a delay element is inserted in the request path to delay the request signal by an equal amount of time compared to the

This thesis investigates design of on-chip net- work links using asynchronous circuits, and presents three link designs of which two are providing virtual channels.. The link

Agenda Spintronics MTJ On-chip Buffer On-chip Crossbar Conclusion... Kungliga

Meanwhile, Private Bradley Manning, the Army in- telligence analyst based outside Baghdad who allegedly downloaded documents from the Secret Internet Protocol Router Network

specific case study, to which extent do the results of the generic DH grid modelling approach based on the effec- tive width concept comply with the results obtained from a

The proposed iterative state vector computation method (ISCM) is able to reduce the complexity needed to compute channel transfer matrices in multi-room indoor environments based on

computation communication Algorithm on Chip (ASIC) hardwired hardwired System on Chip (SoC) soft hardwired. Network on Chip (NoC)