Additional constraints

A largenumberof extra\minor"constraintsfromreal-lifeapplications canbe

incorporatedwithoutproblems.

Anupperorlowerlimitonthelength(incostortime)oftheroutescanbe

modelled using additionalresources. Using additional resourcesalso makesit

possibleto set a limitonthe numberof customersthat canbe serviced. It is

alsopossibletoonlyallowspecicvehiclestousecertainarcsorservicecertain

customers.

Note that it is also possible to introduce time-dependent travel speed for

examplein ordertotakeaccountofrush hourtraÆc. Iftime-dependenttravel

Parallel Implementation

In this chapter the eorts undertaken and the results thereof with respect to

developing a parallel algorithm based upon the sequential Branch-and-Price

code described in chapter 3 are described. First, a short introduction to the

eld of parallel computingis presented,and thenthe descriptionand analysis

ofmakingtheparallelprogramfollows.

6.1 Parallel Programming

Parallelcomputing enjoys signicantinterest bothin the scienticcommunity

andindustriallyduetoanincreasingdemandforcomputationalpowerandspeed

andthedecreasingcost/performanceratioforparallelcomputers.

Alas, thenew possibilities donot come withoutaprice: parallel

program-ming isinherentlymorediÆcult.

In contrast to sequential programming where almost everyone can agree

on how a sequential computer is viewed, the world of parallel programming

is not so clear-cut. The sequential paradigm views the computer as a single

processor which can access a certain amount of memory. This paradigm is

in the literature also called RAM { Random Access Machine. The RAM is

equipped with an instructionset that includes basic operationsfor writing to

andreadingfromthememoryandanumberofarithmeticandlogicoperations

(depictedingure6.1(a)). Thesuccessofthismodelisduetoitssimplicityand

thegreatsimilaritywithreal-lifevonNeumann-typecomputers.

Eventhoughmoderncomputersallowmorethanoneuseratthesametime

onthesamemachine,theviewoftheprogrammerisstillthesame;he/sheviews

thecomputerasasingle-usermachine.

Inparallelprogrammingthesituationisunfortunatelynotthatsimple;here

there doesnotexistacommonlyacceptedmodel(seeforexample[vD96]).

There exists theoretical models, but the theoretical models and the

real-worldparallel computersareveryfarapartfromeachother. Themostgeneral

modelis thePRAM- ParallelRandomAccess Machine. Thismodelis purely

communicatewitheachotherthroughthesharedmemory,whichcanbeaccessed

inunit time,that is,justasfastase.g.amultiplicationoperation. Thismodel

isoftenusedforevaluatingparallelalgorithmswithrespecttotheirasymptotic

time-complexity.

In [KLK89] Kindervater et al. givesthree reasons why parallel computers

have not broken throughyet. The main obstacle is the lack of uniformity in

availablearchitectureasanumberofnewaspectsareintroducedbyparallelism.

Asmentionedbelowthisobstaclehasnowbeenremovedtoalargeextent.

Theremaining twoobstacles arethat morerealismwillberequiredin

the-oretical models of parallel computation. Weneed models that include factors

likefor examplelocation ofmemory and processortopology(some current

ef-fortsareLogP [CKP +

96]andtheDistributedMemory Model(DMM)described

in[MDF +

95]),andnallymoreformaltechniquesforthedesignand

implemen-tationofeÆcientparallelalgorithmsareneeded.

The performance and usability of the dierent parallel models depend on

factorslikecomputationalconcurrency,processortopology,communication,

lo-cationofmemory(localorshared),schedulingandsynchronization. These

fac-torsalso highly inuencewhichkindof algorithms/programsthatare suitable

foraspecicparallelcomputer.

Ageneralintroductiontoparallelprogrammingwithatheoreticalapproach

canbefoundin[JaJ92],whileamorepracticalapproachcanbeseenin[Fos94,

CT96].

The dierent qualities of the parallel computers have led to several

dif-ferent classication schemes for parallel computers, where the Flynn's

taxon-omy [Dun90, KL86] must be regarded as the most widely used. In Flynn's

taxonomycomputersaredividedinto fourgroups:

SISD orSingle Instruction-streamSingle Data-stream computers: Thisis the

normalstand-alonesequentialcomputeralsocommonlyknownasthevon

Neumann architecture. Thistypeisdepictedin gure6.1(a).

SIMD orSingleInstruction-streamMultipleData-streamcomputers: Thistype

of computersare sometimes also denoted vector computers. An arrayof

(normallyfairlysimple) processorsarecontrolled byanexternalhostvia

a controller. Here only one instruction at a time is performed, but on

dierentdata-elements. Eitheraprocessorperformstheinstructiongiven

or it sits idle. This type of computers normally operates on problem

instances wherep2(n). As thistypeof computercanbeemulatedby

the moregeneralMIMD classcomputer(see below)these computers are

moreoftenseenasspecialcomponentsinsteadofindependentcomputers.

Real-lifeexamplesaretheCM-2(ConnectionMachine2),theMasParand

MIMD orMultiple Instruction-streamMultiple Data-stream computers:

The MIMD is the most versatile architecture. Here the operation

per-formedbyeachprocessorisindependentofeachother.

Thisclasscanbesubdividedaccordingtothecharacterofthememory:

SharedMemory This subclass isoften alsocalled multiprocessors. All

processorssharethe samememoryspace. As in thePRAM model,

communicationbetweentheprocessorsarecarriedoutviathe

mem-ory. It can easily beseen that thememory access quicklybecomes

a sever bottleneck of the system as the number of processors are

increased. Thereforesomesharedmemorysystemsareahybrid

be-tweenthissubclassandthedistributed memorysubclass(seebelow)

asthememoryisphysicallydistributed (asin thedistributed

mem-orysubclass)but hardwareemulatesagloballyaddressablememory

space(as in this subclass). This typeof parallel computeris

some-times called virtual shared memory and is normally placed in the

sharedmemorysubclassasthisisthewayitactsseenfromthe

pro-grammersview(depicted ingure6.1(b)).

Anexampleof thevirtualsharedmemoryistheKSR-1(theredoes

also exists softwarelibraries likeTreadMarks [ACD +

96] that

simu-latessharedmemoryonadistributedmemorycomputer). Examples

of\real"sharedmemorymachinesincludesCrayJ90-andT90-series,

ConvexC4seriesandtheSiliconGraphicsPowerChallenge.

DistributedMemory This subclass is sometimes also referred to as

multicomputers. Processorshave their own local memory.

Access-ingthememoryof anotherprocessorcanthereforeonlybedoneby

message-passing (depicted in gure6.1(c)). Examples hereinclude

CrayT3E,theParagon,theMeikoCS-1andCS-2,andtheParsytec.

AnetworkofworkstationscouldalsobeclassiedasaMIMDDistributed

Memoryarchitecture. Inordertodistinguishthemfrom\dedicated"

par-allel computer systems with a very fast highly specialized network the

network of workstationsis usually referred to asloosely coupledand the

\dedicated"parallelcomputerastightlycoupled.

Thephysical topologyof thenetworkused to beamajorissue when

de-signingparallel algorithms. Processorsthat had to communicateduring

the execution of a parallel algorithm should preferably be neighbouring

processorsinorderto obtaingoodperformance. Theprogrammerhadto

rememberhowtheprocessorsphysicallywere connected,andspend time

consideringhow tosetup the parallel program. But in systemsof today

almost anytopology canvirtuallybeimposed, andthe performance

dif-ference between exploiting the physical topologyand using the imposed

virtual topology is, if not insignicant, then very small [Ant96]. This

ismainly due to dedicated communication processorsin today's parallel

everbeenbuilt.

Acomprehensiveoverviewofreal-lifeparallelcomputerscanbefoundin[vD96,

DS96].

Asstatedearlierthedierentsetofcommunicationprocedures,setup

com-mandsetc.foreachparallelcomputerusedtobeamajorobstacle.Thismadeit

verycostlytoportaprogramfromoneparallelcomputertoanotherevenifthe

underlying architecture wasidentical. This impediment hasto a great extent

beenremovedbydierentinterfacesusingthemessagepassingparadigm,where

thedierentprocessorsexchangeinformation bysending messagesaroundthe

communicationnetwork.

Among anumberofdierentinterfacestwohaveemergedasdefacto

stan-dards: MPI (Message-Passing Interface)andPVM(Parallel VirtualMachine).

Eorts to merge them are describedin [FD96]. Presentlywork is in progress

tomeltthetwostandardsintoone(codename: \PVMPI").WehaveusedMPI

([ABMS94, Mal96, MMHB, Pac95])in our eortsto parallelizethe sequential

code.

In document aa (Sider 91-96)