1. Sending themostrecentlygeneratedBranch-and-Boundnodes.
2. Sending the most recentlygenerated Branch-and-Bound nodes with the
lowestlowerbounds.
3. Sending theBranch-and-Boundnodeswiththelowestlowerbounds.
4. SendingBranch-and-Boundnodeswith\low"(notnecessarilythelowest)
lowerboundsamong allBranch-and-Boundnodesintheworkpool.
5. AllnewlygeneratedBranch-and-Boundnodesaresenttootherprocessors.
As we do not have a designated master in the distributed approach the
termination detectionbecomesmorediÆcultthanin thecentralizedapproach.
Anumberofdierentterminationdetectionstrategieshavebeensuggested(see
forexample[Tel94,chapter8]).Wehaveimplementedtheterminationdetection
algorithmbyDijkstra,FeijenandvanGasteren[DFvG83],whichwillbebriey
described.
Intheterminationdetectionalgorithmoneprocessor(let'ssayprocessor0)
isresponsiblefordetectingthestablestatewereallprocessorshaveterminated
(i.e.arepassive). Oftenprocessor0isdenotedthe\master",butnotethatitis
notsupervisingthesolutionofsubspacesasthemasterprocessorinthe
master-slaveparadigmdid. Eachprocessorhasanassociatedcolour{itiseitherwhite
orblack. Additionallywehaveatokenthatalsohasacolour(againeitherwhite
or black). Initially allprocessorsarewhite, but aprocessorchangesits colour
to blackas ittransmitsoneormoremessages,therebyindicatingto thetoken
thattheprocessorhasbeenactivesinceitwaspreviouslyvisitedbythetoken.
Uponreceivingthetokenanactiveprocessorkeepsit untilitbecomespassive,
andthen itsendsitalongto thenextprocessorin thering. Iftheprocessoris
blackthetokeniscolouredblackbeforeitissentotherwisethetokenispassed
on without changing the colour. As the token is passed on, the transmitting
processorbecomeswhite.
Thedetectionschemeisinitiatedbyprocessor0. Ittransmitsawhitetoken
tothenextprocessorinthering(therebymakingitselfwhite).
Bythis scheme atokenreturningto processor0still beingwhiteindicates
that allprocessorsarepassive. Ifthetokenreturnsto processor0being black
anew\probe"isinitiated.
6.4.1 The initial phase
Intheinitialphaseonlythemasterprocessorisactive. Theremainingprocessors
arewaitingtogettheirinitialsetofunexploredsubspaces.Themasterprocessor
runs the sequential algorithm for the VRPTW. The process is stopped if the
problemissolved(asignalisthenbroadcastedfromthemastertotheremaining
processorstostopworkonallprocessors)orifkpunexploredsubspaceshave
been generated, where k is a pre-specied constant and p is the number of
processors.
Soin theinterestingcases, kproblems foreach processorare generated. It
should be fairly obviousthat it is of paramount importance that therunning
time spent in thisinitial phaseis keptassmall as possible in order to get all
processorsworkingontheproblemasquicklyaspossible. Stillassumingthatthe
problem is notsolvedthe master processorbroadcastsproblem characteristics
(numberofnodes,thegeographicpositionofthenodes,thetimewindowsetc.)
andgeneratedcutstotheotherprocessors. Hereaftereachprocessorreceivesk
unsolvedproblemsfromthemaster. Thisendstheinitialphase.
6.4.2 The parallel phase
In this part of the execution of the algorithm all processors solve dierent
Branch-and-Bound nodes. The master processor is in charge of termination
detectionandreturningtheresultattheendoftheexecutionofthealgorithm.
Otherwisethereisnodierencebetweenthemasterandtheremaining
proces-sors. Everyprocessor is running a slightlymodied version of the sequential
algorithmfortheVRPTW.
Parallelismintroduces onemajordierenceto the sequentialalgorithm for
the VRPTW. Via the message-passing interface new unexplored
Branch-and-Boundnodesarriveandnewglobal upperboundsmayalsobepassedalong.
Subspaceswithavaluehigherthanthevalueoftheglobalupperboundare
denotedpassive,whiletheremainingsubspacesarecalledactive. Astheheapof
unsolved subspacesduring computation might contain subspacesthat are not
longer necessary (as their value is greater than or equal to the global upper
bound) thenumberof nodesin the heapis notgivingacorrectpicture ofthe
workloadoneachprocessor. Thiscanbedealtwithin severalways:
1. Donothing. Wetakecarenottosendpassivesubspacestootherprocessors
butotherwiseweonlyadjustthenumberofnodeswhentheheapnolonger
containsactivesubspaces(inthiscasetheheapisemptied).
2. Every timethe global upper bound gets updatedsome subspacesin the
heap might change type from active to passive. Therefore the number
ofpassivesubspacesis countedand subtractedfrom thetotalnumberof
subspacesintheheap. Thiswillgiveanaccuratemeasureoftheworkload
oftheheap,asnonewpassivesubspaceswill beputontheheap.
3. Asan extension ofthe previousideawe canactuallydelete passive
sub-Wehaveoptedforsetupnumber1inordertospendaslittletimeaspossible
in the loadbalancing phase (see nextsection). When the heapdoes not
con-tainanymoreBranch-and-Boundnodeswithaselectionvaluesmallerthanthe
present globalupperbound itis aneasytask to deletetheremaining
Branch-and-Boundnodesandcontinuewithanemptyheap. Itisreasonabletoassume
that the number of \bad" subspaces is evenly spread throughout the
proces-sors,whichmeansthateachheaphasapproximatelythesamenumberof\bad"
subspaces.
Theloadbalancingphaseisonlyenterediftheprocessoriscurrentlyoutof
work or after aconstantnumberof subspaceshave been processed. It should
happenoftenenoughtokeeptheworkloadfairlydistributed,ensuringthatevery
processorisdoingmeaningfulwork,butnottoooften. Ifloadbalancingisdone
toooftentime thatcouldhavebeenspendbettersolvingsubspacesisused for
insignicantloadbalancingoperations.
6.4.3 The message passing/load balancing framework
This part of the parallel algorithm is identical for each processorand is only
runafter a certainnumberof subspaceshave been processed ortheprocessor
isoutofwork.Ittakescareofreceiving,handlingandsendingmessagestothe
neighbouringprocessors.
Theoveralldesignisbasedonthedesignofamessagepassing/loadbalancing
frameworkpresentedbyPerregaardandClausenin [PC98].
Eachprocessorallocatesmemoryfortwobuersforeachneighbour: onefor
incoming messagesand onefor outgoingmessages. Everymessagereceivedis
acknowledged by returning a conrmation message to thesending neighbour.
Byallocatingonebuerforincoming messagesforeach neighbourand bynot
sendingmoremessagesbefore aconrmation isreceived,theprocessorsdonot
havetositidlewhiletransferringmessages. Ifamessagehasnotbeen
acknowl-edged, theprocessorsimply ignoresthat particular neighbourwith respect to
sendingnewinformationalong.
Deadlock occurs when two processors wait for a message from each other
therebyhaltingthem. Thiscannotoccurinourimplementation. Ifaprocessor
has sent message to one of its neighbours it does not halt until an
acknowl-edgementarrives. Insteaditregularlycheckswhethertheacknowledgementhas
arrivedfromthereceivingprocessor. Ifthisisnotthecasetheprocessors
con-tinuestoworkonothertasks(loadbalancingorsolvingsubspaces). Meanwhile
nofurther messageis sent in thedirection of thereceivingprocessors. Sooner
or lateran acknowledgementwill arrive(unlessthe network breaks down, but
recovering from network/link breakdown is entirely outside the scope of this
project).
Thereare3typesofmessages: newsubspaces,updateofglobalupperbound
passes the termination message to the next processor and terminates. The
rst terminationmessageissentfromthe masterassoon asitdetectsthat all
processorsareidle.
Amessagewithanewglobalupperboundconsistsofthevalueoftheglobal
upperboundand theroutesthat constitutetheglobal upperbound. Onlythe
master problem recordsthe routes. Theremaining processorssendthe routes
along,but donotkeeptrackofthem.
Finallyamessagewithnewunsolvedsubspacescanbereceived. Themessage
might actually not contain any message. Such a message indicates that the
sending processorhas no subspacesleft to process. In order notto ood the
network withthese \outofwork"messages,aagis setsothat thenext\out
ofwork"messagecannotbesentbeforetheprocessorhasreceivedatleastone
unsolved Branch-and-Boundnode. Incase the messagecontains
Branch-and-Bound nodesthese areextracted from thebuer,checkedagainst thevalueof
theglobalupperboundandinsertedintheheap. Witheachbatchofsubspaces,
the senders heap size is included, and the local estimate on thesenders heap
size is henceonly updatedwhen the receiver received abatch of
Branch-and-Boundnodesoraspecial\outofwork"message. Thetokeninthetermination
detectionprocessisintegratedin themessageswithnewunsolvedsubspaces.
Sending unsolved Branch-and-Bound nodes to a neighbour is thus based
on anestimate of thesize ofthe neighboursheap. This estimate is thelatest
receivedsize.Letnbetheheapsizeofthesendingprocessorandn 0
theestimate
of theheapsizeof thereceivingprocessor. Ifthereceiving neighbourisoutof
work,orn>2andn n 0
>
n
2
atleastonesubspaceistransfered. Thenumber
ofsubspacestransferedisdeterminedbytheformula
maxf
n n
0
3
;1g:
This formula is also used by Perregaard and Clausen in [PC98]. Among a
number of dierent load balancing criteria this formula resulted in the best
performanceforthejob-shopschedulingproblem. Whetherthiscriteriaisalso
bestforVRPTWhasnotbeeninvestigated,butwouldbeinterestingto study.