• Ingen resultater fundet

Supercomputers TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and

N/A
N/A
Info
Hent
Protected

Academic year: 2022

Del "Supercomputers TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and"

Copied!
27
0
0

Indlæser.... (se fuldtekst nu)

Hele teksten

(1)

Pablo Abad, Pablo Prieto, Lucia Menezo, Adrian Colaso, Valentin Puente, Jose-Angel Gregorio

University of Cantabria

TOPAZ: An Open-Source

Interconnection Network Simulator for Chip Multiprocessors and

Supercomputers

(2)

Interconnects Research: Simulation Tool

What makes a Simulation Tool better than others?

-Heterogeneous field, from supercomputers to CMP.

- Highly Configurable

Flexibility

- Avoid slow simulations for first stages of research process.

- But provide accurate enough results at last stages.

Accuracy Vs. Comp. Effort

-Fast learning is essential.

- MAX: 1-day delay for user-mode

Ease-of-Use

TOPAZ

• Interfaz to Full-System Simulation.

• Multithreaded simulation for massive number of routers.

Simple & Complex models.

• Dynamic accuracy simulation.

•Many out-of-the-box components.

• Very modular code, easy to

understand.

(3)

Simulator Description

Out-of-the-Box

Utilization Examples

Support & Collaboration

Outline

(4)

Evolution of SICOSYS

Object-oriented Design

Different levels of detail

Support for parallel execution

Main Features

- Implemented in C++

- 100 classes / 50,000 lines of code

- High portability (C++ standard compiler)

[REF] V.Puente, J.A. Gregorio, R. Beivide, SICOSYS: an integrated framework for studying interconnection

network performance in multiprocessor systems. IEEE Comput. Soc, 2000.

(5)

Evolution of SICOSYS

Object-oriented Design

Different levels of detail

Support for parallel execution

Main Features

Injector Consumer

Buffer Crossbar

Rtg. &

Arb.

N

S W

SIMPLE ROUTER

-1-C++ class description - (+) Fast Simulation - (--) Accuracy

DETAILED ROUTER

-C++ class per component - (--) Slower Simulation - (++) Higher Accuracy

T1 T2 T3

(6)

Network.sgm Simula.sgm

Router.sgm

Using TOPAZ (Building)

>./TPZSimul –s SIMUL_DETAILED

TPZSimul.ini

<RouterFile id="../sgm/Router.sgm" >

<NetworkFile id="../sgm/Network.sgm" >

<SimulationFile id="../sgm/Simula.sgm" >

<Simulation id="SIMUL_DETAILED">

<Network id="TORUS">

<SimulationCycles id=1000000>

<DiscardTraffic id=10000>

<TrafficPattern id="MODAL" type=”RANDOM”>

<Load id=0.5>

<PacketLength id=2>

</Simulation>

<TorusNetwork id="TORUS" sizeX=8 sizeY=8 router="DETAILED" delay=1>

<MeshNetwork id="MESH" sizeX=8 sizeY=8 router="DETAILED" delay=1>

<Router id="DETAILED" inputs=5 outputs=5 bufferSize=64 bufferControl=CT routingControl="ROUTING_ALG">

<Injector id="INJ">

<Consumerid="CONS">

<Buffer id="BUF1" type="X+" headerDelay=2>

<Buffer id="BUF2" type="X-" headerDelay=2>

. . .

<Buffer id="BUF5" type="Node" headerDelay=2>

<Routing id="RTG1" type="X+" headerDelay=1>

. . .

<Routing id="RTG5" type="Node" headerDelay=1>

<Crossbar id="XBAR" inputs="5" outputs="5" type="CT">

<Input id=1 type="X+">

. . .

<Output id=5 type="Node">

</Crossbar>

<Connection id="C01" source="INJ" destination="BUF5">

. . .

<Connection id="C20" source="RTG.1" destination="XBAR.1">

. . .

</Router>

(7)

Using TOPAZ (Printing)

Standalone

Throughput/Latency curves

+ Orion + Gems (or Gem5)

0 0,2 0,4 0,6 0,8 1

0 0,2 0,4 0,6 0,8 1

Accepted Load (flit/cyc/rter)

Applied Load (flits/cycle/router) RR

ABR VCR

0 100 200 300 400 500

0 0,2 0,4 0,6 0,8 1

Total Latency (cycles)

Applied Load (flits/cycle/router)

0 0,01 0,02 0,03 0,04 0,05 0,06 0,07 0,08 0,09 0,1 0,11

0 50 100 150 200 250 300 350 400 450 500

Traffic fraction (%)

Network Latency (cycles) 6 Turns 1 Turn

0 1 2 3 4 5 6 7 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7

0 1 2 3 4 5

6 7 Y Position

Link Utilization

X Position 0,6-0,7 0,5-0,6 0,4-0,5 0,3-0,4

0 0,2 0,4 0,6 0,8 1

0 100 200 300 400 500

Throughput

Cycles simulated

Integer Sort

Latency Histogram Injection/Consumption/Link map

Throughput/Latency evolution

Link Crossbar Buffer Arbiter

Power Breakdown

(8)

Simulator Description

Out-of-the-Box

Utilization Examples

Support & Collaboration

Outline

(9)

Out of the Box

1. Configuration Parameters

Router

Buffer Size Buffer Delay

Packet Size

# Virtual Channels

#Physical networks Message Types Router Pipeline

Link Delay

Flow Control

Virtual Cut Through

Bubble Flow Control

Wormhole

Virtual Channel flow Control

Traffic

Random Bit-Reversal Perfect-Shuffle Transpose Matrix

Tornado Hot-Spot

Local Trace-Based

Topology

Ring

Mesh (2D & 3D)

Torus (2D & 3D)

Midimew (2D)

Square Midimew

(2D)

(10)

Out of the Box

2. Available Routers

Router REF Year Level of Detail

Adaptive Bubble Router Deterministic Bubble Router

[14]

[15]

2001 1998

Complex & simple Complex & simple

Rotary Router Bufferless Router Bidirectional Router

[19]

[21]

[22]

2007 2010 2009

Complex Simple Simple

Buffered Crossbar [23] 1987 Complex

Deterministic with VC (Dally) VCTM (Dally + MC Support)

[16][17]

[18]

2001 2008

Complex & simple Complex & simple

Pipeline Optimized [24] 2008 Complex & simple

(11)

Out of the Box

3. Integration with Full-System Simulation Tools

Simics Opal (processor)

M5 (processor) Ruby (Memory)

Topaz( Network)

Wisconsin Multifacet Gems: http://research.cs.wisc.edu/gems/

Gem5 simulator system: http://gem5.org/Main_Page

(12)

Simulator Description

Out-of-the-Box

Utilization Examples

Support & Collaboration

Outline

(13)

Increasing Full-System simulation accuracy

Main System Parameters

System Network

Cores 16 Cores, @4GHz, OOO, 4-wide issue,

64-entry IW, 16 outstanding Mem. Req L2 16 MB, SNUCA, Token(B) coherence

protocol, 6 msg. dependence chain Topology 4x4 Mesh L1 Independent I/D caches, 32KB, 4-way,

1 cycles L2 Bank 1MB, 16-way, 5 cycles, pseudo LRU Links 1 cycle, 128bits wide Memory 4GB, 320GB/s, 260 cycles OS Solaris 10

Broadcast Coherence Protocol (Execution Time)

0 0,5 1 1,5 2 2,5

RUBY Normalized Execution Time

RUBY

TOPAZ_SIMPLE TOPAZ_COMPLEX

(14)

Increasing Full-System simulation accuracy

Execution Time

0 0,5 1 1,5 2 2,5

RUBY Normalized Execution Time

RUBY

TOPAZ_SIMPLE TOPAZ_COMPLEX 0

0,2 0,4 0,6 0,8 1 1,2

Normalized Cycles Simualted/seccond RUBY TOPAZ_SIMPLE TOPAZ_COMPLEX

Simulation speed (cycles/second)

More Accuracy => Slower simulations

On average, Ruby is ≈ 2X faster

(15)

Improving Simulation Speed (I)

0 0,5 1 1,5 2 2,5

RUBY Normalized Execution Time

RUBY

TOPAZ_SIMPLE TOPAZ_COMPLEX (AI)TOPAZ_SIMPLE 0

0,2 0,4 0,6 0,8 1 1,2

Normalized Cycles Simualted/seccond RUBY TOPAZ_SIMPLE TOPAZ_COMPLEX (AI)TOPAZ_SIMPLE (AI)TOPAZ_COMPLEX

Execution Time

Simulation speed (cycles/second)

Adaptive Interface

RUBY T OP A Z RUBY

0 0,2 0,4 0,6 0,8 1

0 100 200 300 400 500

Throughput

M Cycles simulated

Integer Sort

(16)

0 0,2 0,4 0,6 0,8 1 1,2 1,4

Normalized Cycles Simualted/seccond RUBY TOPAZ_SIMPLE TOPAZ_COMPLEX (AI)TOPAZ_SIMPLE (AI)TOPAZ_COMPLEX (P)TOPAZ_COMPLEX (P)TOPAZ_SIMPLE

Improving Simulation Speed (II)

0 0,5 1 1,5 2 2,5

RUBY Normalized Execution Time

RUBY

TOPAZ_SIMPLE TOPAZ_COMPLEX (AI)TOPAZ_SIMPLE

Execution Time

Simulation speed (cycles/second)

2-Thread Simulation

T1 T2

(17)

Simulating thousand-node Networks

0E+00 1E+05 2E+05 3E+05 4E+05 5E+05 6E+05

1 3 5 7 9 11

Simulation Time (Seconds)

Number of Cores

32K Rotuers 128K Routers 256K Routers 512K Routers 1M Routers

12-Core ( Xeon E5645) server with 54GBytes of main memory.

1.5GB

5.5GB 12GB

24GB

49GB

3D Torus, Bubble Router (simple), similar to IBM Blue Gene.

Multithreaded implementation takes advantage of multicore server

• Good speedup for 1 Million routers

(18)

Simulator Description

Out-of-the-Box

Utilization Examples

Support & Collaboration

Outline

(19)

Support & Collaboration

code.google.com/p/tpzsimul

(20)

Support & Collaboration

(21)

Thanks for your attention Questions?

http://www.atc.unican.es/galerna/index.html

(22)

GARNET

0 0,5 1 1,5 2 2,5 3

RUBY Normalized Execution Time RUBY GARNET TOPAZ_SIMPLE TOPAZ_COMPLEX

(23)

T1 T2 T3 T4

(24)

Using TOPAZ

BUILDING RUNNING PRINTING

(25)

Using TOPAZ (Building)

Router.sgm

Crossbar

-Router & Crossbar Ports - Buffer Size

- Routing & Flow Control Policies

Network.sgm

(26)

Using TOPAZ (Building)

Router.sgm

-Router & Crossbar Ports - Buffer Size

- Routing & Flow Control Policies

Network.sgm

-Network Size

- Network Topology - Link Delay

Simula.sgm

-Traffic Pattern

- Message Size

- Simulation Cycles

(27)

Using TOPAZ (Running)

No need to re-compile

- Only need to add new configutations at sgml files.

- Each configuration identified by a tag at Simula.sgm. Option –s at command line to choose a specific configuration.

Different Execution Modes:

- Run your simulation for XXX Cycles.

- Run your simulation until YYY Messages reach their destination.

Command Line Options:

- Many sgml parameters can be overwritten through command line options.

- Example:

- Useful for scripting.

Referencer

RELATEREDE DOKUMENTER

The relation between scheduling parameters and the control performance is complex and can be studied through analysis

Waste Energy can be collected and re-used... The

„ I come originally from the refrigeration and heat pump business where the heat source often is at much lower temperature (exergy) levels than what can be realized in many

5 Close and re-open RStudio, open the R file previously created and execute the commands it contains.

I A single bit can be stored in 6 transistors I That is how larger memories are built I FPGAs have this type of on-chip memories I Usually many of them in units of 2 KiB or 4 KiB I

computation communication Algorithm on Chip (ASIC) hardwired hardwired System on Chip (SoC) soft hardwired. Network on Chip (NoC)

Examples include operating systems, software programs, and file formats.(“Proprietary Software”) Many involved in the Free and Open Source Software movement, share

The Danish IR system has been finely tuned through negotiations over many years, but the results from this investigation indicate that even an IR system as strong as this can