Implementational details

1. Sending themostrecentlygeneratedBranch-and-Boundnodes.

2. Sending the most recentlygenerated Branch-and-Bound nodes with the

lowestlowerbounds.

3. Sending theBranch-and-Boundnodeswiththelowestlowerbounds.

4. SendingBranch-and-Boundnodeswith\low"(notnecessarilythelowest)

lowerboundsamong allBranch-and-Boundnodesintheworkpool.

5. AllnewlygeneratedBranch-and-Boundnodesaresenttootherprocessors.

As we do not have a designated master in the distributed approach the

termination detectionbecomesmorediÆcultthanin thecentralizedapproach.

Anumberofdierentterminationdetectionstrategieshavebeensuggested(see

forexample[Tel94,chapter8]).Wehaveimplementedtheterminationdetection

algorithmbyDijkstra,FeijenandvanGasteren[DFvG83],whichwillbebriey

described.

Intheterminationdetectionalgorithmoneprocessor(let'ssayprocessor0)

isresponsiblefordetectingthestablestatewereallprocessorshaveterminated

(i.e.arepassive). Oftenprocessor0isdenotedthe\master",butnotethatitis

notsupervisingthesolutionofsubspacesasthemasterprocessorinthe

master-slaveparadigmdid. Eachprocessorhasanassociatedcolour{itiseitherwhite

orblack. Additionallywehaveatokenthatalsohasacolour(againeitherwhite

or black). Initially allprocessorsarewhite, but aprocessorchangesits colour

to blackas ittransmitsoneormoremessages,therebyindicatingto thetoken

thattheprocessorhasbeenactivesinceitwaspreviouslyvisitedbythetoken.

Uponreceivingthetokenanactiveprocessorkeepsit untilitbecomespassive,

andthen itsendsitalongto thenextprocessorin thering. Iftheprocessoris

blackthetokeniscolouredblackbeforeitissentotherwisethetokenispassed

on without changing the colour. As the token is passed on, the transmitting

processorbecomeswhite.

Thedetectionschemeisinitiatedbyprocessor0. Ittransmitsawhitetoken

tothenextprocessorinthering(therebymakingitselfwhite).

Bythis scheme atokenreturningto processor0still beingwhiteindicates

that allprocessorsarepassive. Ifthetokenreturnsto processor0being black

anew\probe"isinitiated.

6.4.1 The initial phase

Intheinitialphaseonlythemasterprocessorisactive. Theremainingprocessors

arewaitingtogettheirinitialsetofunexploredsubspaces.Themasterprocessor

runs the sequential algorithm for the VRPTW. The process is stopped if the

problemissolved(asignalisthenbroadcastedfromthemastertotheremaining

processorstostopworkonallprocessors)orifkpunexploredsubspaceshave

been generated, where k is a pre-specied constant and p is the number of

processors.

Soin theinterestingcases, kproblems foreach processorare generated. It

should be fairly obviousthat it is of paramount importance that therunning

time spent in thisinitial phaseis keptassmall as possible in order to get all

processorsworkingontheproblemasquicklyaspossible. Stillassumingthatthe

problem is notsolvedthe master processorbroadcastsproblem characteristics

(numberofnodes,thegeographicpositionofthenodes,thetimewindowsetc.)

andgeneratedcutstotheotherprocessors. Hereaftereachprocessorreceivesk

unsolvedproblemsfromthemaster. Thisendstheinitialphase.

6.4.2 The parallel phase

In this part of the execution of the algorithm all processors solve dierent

Branch-and-Bound nodes. The master processor is in charge of termination

detectionandreturningtheresultattheendoftheexecutionofthealgorithm.

Otherwisethereisnodierencebetweenthemasterandtheremaining

proces-sors. Everyprocessor is running a slightlymodied version of the sequential

algorithmfortheVRPTW.

Parallelismintroduces onemajordierenceto the sequentialalgorithm for

the VRPTW. Via the message-passing interface new unexplored

Branch-and-Boundnodesarriveandnewglobal upperboundsmayalsobepassedalong.

Subspaceswithavaluehigherthanthevalueoftheglobalupperboundare

denotedpassive,whiletheremainingsubspacesarecalledactive. Astheheapof

unsolved subspacesduring computation might contain subspacesthat are not

longer necessary (as their value is greater than or equal to the global upper

bound) thenumberof nodesin the heapis notgivingacorrectpicture ofthe

workloadoneachprocessor. Thiscanbedealtwithin severalways:

1. Donothing. Wetakecarenottosendpassivesubspacestootherprocessors

butotherwiseweonlyadjustthenumberofnodeswhentheheapnolonger

containsactivesubspaces(inthiscasetheheapisemptied).

2. Every timethe global upper bound gets updatedsome subspacesin the

heap might change type from active to passive. Therefore the number

ofpassivesubspacesis countedand subtractedfrom thetotalnumberof

subspacesintheheap. Thiswillgiveanaccuratemeasureoftheworkload

oftheheap,asnonewpassivesubspaceswill beputontheheap.

3. Asan extension ofthe previousideawe canactuallydelete passive

sub-Wehaveoptedforsetupnumber1inordertospendaslittletimeaspossible

in the loadbalancing phase (see nextsection). When the heapdoes not

con-tainanymoreBranch-and-Boundnodeswithaselectionvaluesmallerthanthe

present globalupperbound itis aneasytask to deletetheremaining

Branch-and-Boundnodesandcontinuewithanemptyheap. Itisreasonabletoassume

that the number of \bad" subspaces is evenly spread throughout the

proces-sors,whichmeansthateachheaphasapproximatelythesamenumberof\bad"

subspaces.

Theloadbalancingphaseisonlyenterediftheprocessoriscurrentlyoutof

work or after aconstantnumberof subspaceshave been processed. It should

happenoftenenoughtokeeptheworkloadfairlydistributed,ensuringthatevery

processorisdoingmeaningfulwork,butnottoooften. Ifloadbalancingisdone

toooftentime thatcouldhavebeenspendbettersolvingsubspacesisused for

insignicantloadbalancingoperations.

6.4.3 The message passing/load balancing framework

This part of the parallel algorithm is identical for each processorand is only

runafter a certainnumberof subspaceshave been processed ortheprocessor

isoutofwork.Ittakescareofreceiving,handlingandsendingmessagestothe

neighbouringprocessors.

Theoveralldesignisbasedonthedesignofamessagepassing/loadbalancing

frameworkpresentedbyPerregaardandClausenin [PC98].

Eachprocessorallocatesmemoryfortwobuersforeachneighbour: onefor

incoming messagesand onefor outgoingmessages. Everymessagereceivedis

acknowledged by returning a conrmation message to thesending neighbour.

Byallocatingonebuerforincoming messagesforeach neighbourand bynot

sendingmoremessagesbefore aconrmation isreceived,theprocessorsdonot

havetositidlewhiletransferringmessages. Ifamessagehasnotbeen

acknowl-edged, theprocessorsimply ignoresthat particular neighbourwith respect to

sendingnewinformationalong.

Deadlock occurs when two processors wait for a message from each other

therebyhaltingthem. Thiscannotoccurinourimplementation. Ifaprocessor

has sent message to one of its neighbours it does not halt until an

acknowl-edgementarrives. Insteaditregularlycheckswhethertheacknowledgementhas

arrivedfromthereceivingprocessor. Ifthisisnotthecasetheprocessors

con-tinuestoworkonothertasks(loadbalancingorsolvingsubspaces). Meanwhile

nofurther messageis sent in thedirection of thereceivingprocessors. Sooner

or lateran acknowledgementwill arrive(unlessthe network breaks down, but

recovering from network/link breakdown is entirely outside the scope of this

project).

Thereare3typesofmessages: newsubspaces,updateofglobalupperbound

passes the termination message to the next processor and terminates. The

rst terminationmessageissentfromthe masterassoon asitdetectsthat all

processorsareidle.

Amessagewithanewglobalupperboundconsistsofthevalueoftheglobal

upperboundand theroutesthat constitutetheglobal upperbound. Onlythe

master problem recordsthe routes. Theremaining processorssendthe routes

along,but donotkeeptrackofthem.

Finallyamessagewithnewunsolvedsubspacescanbereceived. Themessage

might actually not contain any message. Such a message indicates that the

sending processorhas no subspacesleft to process. In order notto ood the

network withthese \outofwork"messages,aagis setsothat thenext\out

ofwork"messagecannotbesentbeforetheprocessorhasreceivedatleastone

unsolved Branch-and-Boundnode. Incase the messagecontains

Branch-and-Bound nodesthese areextracted from thebuer,checkedagainst thevalueof

theglobalupperboundandinsertedintheheap. Witheachbatchofsubspaces,

the senders heap size is included, and the local estimate on thesenders heap

size is henceonly updatedwhen the receiver received abatch of

Branch-and-Boundnodesoraspecial\outofwork"message. Thetokeninthetermination

detectionprocessisintegratedin themessageswithnewunsolvedsubspaces.

Sending unsolved Branch-and-Bound nodes to a neighbour is thus based

on anestimate of thesize ofthe neighboursheap. This estimate is thelatest

receivedsize.Letnbetheheapsizeofthesendingprocessorandn 0

theestimate

of theheapsizeof thereceivingprocessor. Ifthereceiving neighbourisoutof

work,orn>2andn n 0

atleastonesubspaceistransfered. Thenumber

ofsubspacestransferedisdeterminedbytheformula

maxf

n n

;1g:

This formula is also used by Perregaard and Clausen in [PC98]. Among a

number of dierent load balancing criteria this formula resulted in the best

performanceforthejob-shopschedulingproblem. Whetherthiscriteriaisalso

bestforVRPTWhasnotbeeninvestigated,butwouldbeinterestingto study.

In document aa (Sider 104-107)