A largenumberof extra\minor"constraintsfromreal-lifeapplications canbe
incorporatedwithoutproblems.
Anupperorlowerlimitonthelength(incostortime)oftheroutescanbe
modelled using additionalresources. Using additional resourcesalso makesit
possibleto set a limitonthe numberof customersthat canbe serviced. It is
alsopossibletoonlyallowspecicvehiclestousecertainarcsorservicecertain
customers.
Note that it is also possible to introduce time-dependent travel speed for
examplein ordertotakeaccountofrush hourtraÆc. Iftime-dependenttravel
Parallel Implementation
In this chapter the eorts undertaken and the results thereof with respect to
developing a parallel algorithm based upon the sequential Branch-and-Price
code described in chapter 3 are described. First, a short introduction to the
eld of parallel computingis presented,and thenthe descriptionand analysis
ofmakingtheparallelprogramfollows.
6.1 Parallel Programming
Parallelcomputing enjoys signicantinterest bothin the scienticcommunity
andindustriallyduetoanincreasingdemandforcomputationalpowerandspeed
andthedecreasingcost/performanceratioforparallelcomputers.
Alas, thenew possibilities donot come withoutaprice: parallel
program-ming isinherentlymorediÆcult.
In contrast to sequential programming where almost everyone can agree
on how a sequential computer is viewed, the world of parallel programming
is not so clear-cut. The sequential paradigm views the computer as a single
processor which can access a certain amount of memory. This paradigm is
in the literature also called RAM { Random Access Machine. The RAM is
equipped with an instructionset that includes basic operationsfor writing to
andreadingfromthememoryandanumberofarithmeticandlogicoperations
(depictedingure6.1(a)). Thesuccessofthismodelisduetoitssimplicityand
thegreatsimilaritywithreal-lifevonNeumann-typecomputers.
Eventhoughmoderncomputersallowmorethanoneuseratthesametime
onthesamemachine,theviewoftheprogrammerisstillthesame;he/sheviews
thecomputerasasingle-usermachine.
Inparallelprogrammingthesituationisunfortunatelynotthatsimple;here
there doesnotexistacommonlyacceptedmodel(seeforexample[vD96]).
There exists theoretical models, but the theoretical models and the
real-worldparallel computersareveryfarapartfromeachother. Themostgeneral
modelis thePRAM- ParallelRandomAccess Machine. Thismodelis purely
communicatewitheachotherthroughthesharedmemory,whichcanbeaccessed
inunit time,that is,justasfastase.g.amultiplicationoperation. Thismodel
isoftenusedforevaluatingparallelalgorithmswithrespecttotheirasymptotic
time-complexity.
In [KLK89] Kindervater et al. givesthree reasons why parallel computers
have not broken throughyet. The main obstacle is the lack of uniformity in
availablearchitectureasanumberofnewaspectsareintroducedbyparallelism.
Asmentionedbelowthisobstaclehasnowbeenremovedtoalargeextent.
Theremaining twoobstacles arethat morerealismwillberequiredin
the-oretical models of parallel computation. Weneed models that include factors
likefor examplelocation ofmemory and processortopology(some current
ef-fortsareLogP [CKP +
96]andtheDistributedMemory Model(DMM)described
in[MDF +
95]),andnallymoreformaltechniquesforthedesignand
implemen-tationofeÆcientparallelalgorithmsareneeded.
The performance and usability of the dierent parallel models depend on
factorslikecomputationalconcurrency,processortopology,communication,
lo-cationofmemory(localorshared),schedulingandsynchronization. These
fac-torsalso highly inuencewhichkindof algorithms/programsthatare suitable
foraspecicparallelcomputer.
Ageneralintroductiontoparallelprogrammingwithatheoreticalapproach
canbefoundin[JaJ92],whileamorepracticalapproachcanbeseenin[Fos94,
CT96].
The dierent qualities of the parallel computers have led to several
dif-ferent classication schemes for parallel computers, where the Flynn's
taxon-omy [Dun90, KL86] must be regarded as the most widely used. In Flynn's
taxonomycomputersaredividedinto fourgroups:
SISD orSingle Instruction-streamSingle Data-stream computers: Thisis the
normalstand-alonesequentialcomputeralsocommonlyknownasthevon
Neumann architecture. Thistypeisdepictedin gure6.1(a).
SIMD orSingleInstruction-streamMultipleData-streamcomputers: Thistype
of computersare sometimes also denoted vector computers. An arrayof
(normallyfairlysimple) processorsarecontrolled byanexternalhostvia
a controller. Here only one instruction at a time is performed, but on
dierentdata-elements. Eitheraprocessorperformstheinstructiongiven
or it sits idle. This type of computers normally operates on problem
instances wherep2(n). As thistypeof computercanbeemulatedby
the moregeneralMIMD classcomputer(see below)these computers are
moreoftenseenasspecialcomponentsinsteadofindependentcomputers.
Real-lifeexamplesaretheCM-2(ConnectionMachine2),theMasParand
MIMD orMultiple Instruction-streamMultiple Data-stream computers:
The MIMD is the most versatile architecture. Here the operation
per-formedbyeachprocessorisindependentofeachother.
Thisclasscanbesubdividedaccordingtothecharacterofthememory:
SharedMemory This subclass isoften alsocalled multiprocessors. All
processorssharethe samememoryspace. As in thePRAM model,
communicationbetweentheprocessorsarecarriedoutviathe
mem-ory. It can easily beseen that thememory access quicklybecomes
a sever bottleneck of the system as the number of processors are
increased. Thereforesomesharedmemorysystemsareahybrid
be-tweenthissubclassandthedistributed memorysubclass(seebelow)
asthememoryisphysicallydistributed (asin thedistributed
mem-orysubclass)but hardwareemulatesagloballyaddressablememory
space(as in this subclass). This typeof parallel computeris
some-times called virtual shared memory and is normally placed in the
sharedmemorysubclassasthisisthewayitactsseenfromthe
pro-grammersview(depicted ingure6.1(b)).
Anexampleof thevirtualsharedmemoryistheKSR-1(theredoes
also exists softwarelibraries likeTreadMarks [ACD +
96] that
simu-latessharedmemoryonadistributedmemorycomputer). Examples
of\real"sharedmemorymachinesincludesCrayJ90-andT90-series,
ConvexC4seriesandtheSiliconGraphicsPowerChallenge.
DistributedMemory This subclass is sometimes also referred to as
multicomputers. Processorshave their own local memory.
Access-ingthememoryof anotherprocessorcanthereforeonlybedoneby
message-passing (depicted in gure6.1(c)). Examples hereinclude
CrayT3E,theParagon,theMeikoCS-1andCS-2,andtheParsytec.
AnetworkofworkstationscouldalsobeclassiedasaMIMDDistributed
Memoryarchitecture. Inordertodistinguishthemfrom\dedicated"
par-allel computer systems with a very fast highly specialized network the
network of workstationsis usually referred to asloosely coupledand the
\dedicated"parallelcomputerastightlycoupled.
Thephysical topologyof thenetworkused to beamajorissue when
de-signingparallel algorithms. Processorsthat had to communicateduring
the execution of a parallel algorithm should preferably be neighbouring
processorsinorderto obtaingoodperformance. Theprogrammerhadto
rememberhowtheprocessorsphysicallywere connected,andspend time
consideringhow tosetup the parallel program. But in systemsof today
almost anytopology canvirtuallybeimposed, andthe performance
dif-ference between exploiting the physical topologyand using the imposed
virtual topology is, if not insignicant, then very small [Ant96]. This
ismainly due to dedicated communication processorsin today's parallel
everbeenbuilt.
Acomprehensiveoverviewofreal-lifeparallelcomputerscanbefoundin[vD96,
DS96].
Asstatedearlierthedierentsetofcommunicationprocedures,setup
com-mandsetc.foreachparallelcomputerusedtobeamajorobstacle.Thismadeit
verycostlytoportaprogramfromoneparallelcomputertoanotherevenifthe
underlying architecture wasidentical. This impediment hasto a great extent
beenremovedbydierentinterfacesusingthemessagepassingparadigm,where
thedierentprocessorsexchangeinformation bysending messagesaroundthe
communicationnetwork.
Among anumberofdierentinterfacestwohaveemergedasdefacto
stan-dards: MPI (Message-Passing Interface)andPVM(Parallel VirtualMachine).
Eorts to merge them are describedin [FD96]. Presentlywork is in progress
tomeltthetwostandardsintoone(codename: \PVMPI").WehaveusedMPI
([ABMS94, Mal96, MMHB, Pac95])in our eortsto parallelizethe sequential
code.