An Introduction to the Space Mapping Technique

(1)

S u r r o g a t e M o d e l s

by the Space Mapping Technique

Jacob Søndergaard

Ph.D. Thesis

InformaticsandMathematicalModelling

TechnicalUniversityofDenmark

Kgs. Lyngby2003

(2)

Building 321,DK-2800Lyngby,Denmark

Phone +4545253351, Fax+45 45882673

reception@imm.dtu.dk

www.imm.dtu.dk

IMM-PHD-2003-111

(3)

Thisdissertationissubmitted to Informatics andMathematical Modelling at

theTechnical UniversityofDenmarkinpartialfulllmentoftherequirements

forthe degree of Doctorof Philosophy.

Theworkhasbeen supervisedbyProfessor,dr.techn. KajMadsen, Associate

ProfessorHansBruun Nielsenand Poul Erik Frandsen.

Kgs.Lyngby,January 31,2003.

Jacob Søndergaard

(4)

(5)

Surrogatemodellingandoptimizationtechniquesareintendedfor engineering

designinthecase where anexpensive physical model isinvolved.

Thisthesis provides a literature overview of the eld of surrogate modelling

and optimization.The space mapping technique isone suchmethod for con-

structingandoptimizing asurrogatemodelbasedon acheapphysical model.

The space mapping surrogate is the cheap model composed with a parame-

termapping,theso-calledspacemapping,connectingsimilarresponsesofthe

cheap and theexpensive model.

Thethesispresentsa theoreticalstudy ofthe spacemapping technique. The-

oreticalresults arederived whichcharacterize thespacemapping under some

idealconditions.Iftheseconditionsaremet,thesolutionsprovidedbytheorig-

inalspacemappingtechniqueareminimizersoftheexpensivemodel.However,

inpracticewecannotexpectthattheseidealconditionsaresatised.Sohybrid

methods, combining the spacemapping technique withclassical optimization

methods, shouldbe usedifconvergence tohigh accuracyiswanted.

Approximation abilities of the space mapping surrogate are compared with

thoseofaTaylormodeloftheexpensivemodel.Thespacemappingsurrogate

hasa lower approximationerror for longsteps. For short steps, however, the

Taylormodeloftheexpensivemodelisbest,dueto exactinterpolationatthe

modelorigin.

Fivealgorithmsforspacemappingoptimizationarepresentedandthenumer-

icalperformanceis evaluated.Three ofthe algorithms arehybrid algorithms.

Convergence of aclassof hybrid space mapping algorithmsis proved.

(6)

(7)

Surrogat modellerings- og optimeringsteknikker er rettet mod ingeniør de-

sign idettilfældehvor enmeget dyr fysiskmodelerinvolveret.

Afhandlingindeholderetlitteraturstudie omhandlendesurrogatmodellerings-

og optimeringsteknikker. Space mapping teknikken er en sådan metode til

optimeringafensurrogatmodel,somerbaseretpåenbilligfysiskmodel.Space

mappingsurrogatetbestårafden billigemodelsammensatmedenparameter

afbildning,den såkaldte spacemapping,derforbinder sammeresponsfra den

billigemed dendyre model.

Afhandlingenbeskriverteoretiske undersøgelser af space mapping teknikken.

Der udledes teoretiske resultater, som karakteriserer space mappingen under

nogleideelle betingelser. Såfremt dissebetingelsereropfyldt, vilde løsninger

somspacemappingteknikkennder,væreløsningertilminimeringsproblemet

forden dyre model. Detkandog ikke forventes, at disseideelle betingelserer

opfyldtipraksis.Derfor bør hybrid metoder, som kombinererspace mapping

teknikkenmedklassiskeoptimeringsmetoder,anvendeshviskonvergenstilhøj

nøjagtighed ønskes.

Approksimationsegenskaberneafspacemappingsurrogatetsammenlignesmed

enTaylormodelafdendyremodel.Spacemappingsurrogatetharenlavereap-

proksimationsfejlforstoreskridt.Forkorteskridtderimod,erTaylormodellen

afdendyremodelbedst,hvilketskyldeseksaktinterpolationiudviklingspunk-

tet.

Femalgoritmerforspacemapping optimeringpræsenteres ogderesnumeriske

egenskabererafprøvet. Tre afalgoritmerne erhybrid algoritmer. Konvergens

afen klasseaf space mapping hybrid algoritmer bevises.

(8)

(9)

IwishtothankmymainsupervisorProfessor,dr.techn.KajMadsenforgiving

metheopportunitytoengageinthisstudy.Thank youfor theencouragement

andmotivationthatcarriedmethroughthestudy.Ialsothankmysupervisors

Associate Professor Hans Bruun Nielsen and Poul Erik Frandsen, Ticra En-

gineeringConsultants,for their guidance.SpecialthankstoHanswho helped

meresolve manymathematical puzzles and always hasbeen available for ex-

change of ideas.Onthe same note Ialso wish to thank Ph.D. student Frank

Pedersen for manyrewarding discussions.

Iwantto thank for the hospitalityshownto mebyProfessor JohnE. Dennis

and his Ph.D. student Mark Abramson during my stay at Rice University.

Special thanksto John who inspired mewith his eager passion for unveiling

mathematical questionsinaconstant searchfor thetruth.

I also want to thank for the hospitality shown to me by Professor John W.

BandlerandhisPh.D.studentsMohamedBakr,JoséRayas-Sanchez,Mostafa

Ismail during my stays at McMaster University. Special thanks to John for

sharing his many captivating ideas and his devotion to the space mapping

technique. Also special thanks to Mohamed for his friendliness and his in-

volvement inthisstudy.

Iwouldlike tothank mycolleagues at IMMformaking thelast threeyearsa

pleasant experience.

Finally,I sincerelythank my deargirlfriend Janna andthe rest ofmy family

andfriendsfor their constant support.

(10)

(11)

1 Introduction 1

1.1 Outline of the Thesis . . . 2

References . . . 4

2 Surrogate Modelling & Optimization 5 2.1 Introduction to Surrogate Models . . . 5

2.1.1 Surrogate Models inthe Literature . . . 6

2.1.2 Our Approach . . . 9

2.2 Surrogates Based onFunctional Models . . . 12

2.2.1 Regression Models . . . 12

2.2.2 Radial Functions . . . 17

2.2.3 Single Point Models . . . 25

2.2.4 Summary . . . 27

2.3 Surrogates Based onPhysical Models . . . 28

2.3.1 ResponseCorrection . . . 29

2.3.2 The Multipoint Method . . . 31

(12)

2.3.4 Summary . . . 33

2.4 OptimizationUsing Surrogate Models . . . 34

2.4.1 ResponseSurfaceMethodology . . . 34

2.4.2 Trust RegionApproach . . . 36

2.4.3 Pattern Search Approach . . . 38

2.4.4 SpaceMappingOptimization . . . 40

2.4.5 Summary . . . 42

2.5 Conclusion. . . 44

Symbols . . . 45

References . . . 46

3 An Introduction to the Space Mapping Technique 53 3.1 Introduction . . . 54

3.2 Spacemapping details . . . 58

3.3 Combining withclassical methods. . . 59

3.4 Examples . . . 61

3.5 Conclusions . . . 68

Acknowledgments . . . 68

References . . . 68

4 Space Mapping Theory and Practice 71 4.1 SpaceMappingTheory . . . 72

4.1.1 Theoretical Introduction . . . 72

4.1.2 Example: SpaceMappingImages . . . 74

4.1.3 Theoretical Results . . . 82

4.1.4 TheUsual SpaceMappingDenition . . . 85

4.1.5 ExampleWithScalar Functions . . . 88

(13)

4.2 Alternative SpaceMappingDenitions . . . 94

4.2.1 Regularization Using z . . . 94

4.2.2 Regularization Using x . . . 96

4.2.3 Regularization Using Gradients . . . 98

4.2.4 Multiple Points . . . 102

4.2.5 Summary . . . 105

4.3 ApproximationError . . . 106

4.3.1 Summary . . . 112

Symbols . . . 115

References . . . 116

5 Space Mapping Optimization Algorithms 119 5.1 Introduction . . . 119

5.2 Space MappingOptimizationAlgorithms. . . 122

5.2.1 Original SpaceMappingAlgorithms . . . 122

5.2.2 Hybrid SpaceMappingAlgorithms . . . 127

5.3 Numerical Tests. . . 134

5.3.1 TestRuns . . . 135

5.3.2 Space MappingDenitions . . . 144

5.3.3 Optimization Trajectories . . . 146

Symbols . . . 152

6 Convergence of Hybrid Space Mapping Algorithms 155 6.1 Introduction . . . 156

(14)

6.2.1 Detailsofthe SMTA . . . 158

6.2.2 Summaryof theSMTA . . . 160

6.3 Proof of Convergence . . . 160

6.3.1 Prerequisites . . . 161

6.3.2 Proofof Convergence . . . 162

6.4 Conclusions . . . 167

7 Conclusion 169 Complete List of References 173 A Space Mapping Toolbox 183 A.1 SpaceMappingOptimizationTestProblems . . . 184

A.1.1 Interface . . . 184

A.1.2 TheTest Problems . . . 186

A.2 SpaceMappingOptimizationAlgorithms . . . 191

A.2.1 Interface . . . 191

A.2.2 Theoretical Overview. . . 193

A.2.3 TheOptimization Algorithms . . . 196

A.2.4 Auxiliary Functions . . . 198

A.3 Examples . . . 200

A.3.1 QuickRun . . . 200

A.3.2 Examining aProblem . . . 202

(15)

Introduction

In engineering design it is often encountered thattraditional optimization is

notfeasiblebecausethemodelunderinvestigationistooexpensivetocompute.

Surrogatemodellingtechniqueshavebeendevelopedtoaddressthisimportant

issue.Surrogatemodels areintendedto taketheplace oftheexpensivemodel

for the purpose of modelling or optimization of the latter. In optimization

usingsurrogate models,a sequenceof subproblemsissolved inthesearch for

the optimizer of the expensive model. In the optimization process, most of

themodelevaluationsareperformedwiththesurrogatemodel.Theexpensive

modelisonly scarcelyevaluated inorderto re-calibrate thesurrogate model.

The space mapping technique is one such method for constructing and opti-

mizinga surrogate model. The technique relies on the existenceof a cheaper

model, modelling the same system as the expensive model under investiga-

tion. A space mapping surrogate model isthe cheaper modelcomposedwith

aparameter mapping, the so-called space mapping. Thespace mapping con-

nectssimilar responses of thecheaper model andthe expensive model. Here,

responsesaretheoutputreturnedfromamodelprovidedagivensetofparam-

etersand statevariables,which e.g.isa setofsample pointsinthefrequency

ortime domain.

The basic formulation of the space mapping technique is not convergent, in

(16)

model. Therefore,provably convergent hybrid methods have been developed,

combining the space mappingmethod withaclassical optimization method.

Thisthesisconcernsoptimizationofexpensivefunctionsusingsurrogatemod-

els. The focal point of the study is the space mapping technique. The thesis

addressvemainareas:First,thethesispresentsaliteratureoverviewofsur-

rogate modelling and optimization. Second, the thesis provides a motivation

andintroductiontothespacemappingtechnique.Third,thethesisprovidesa

theoreticalstudy ofthespace mapping technique.Fourth,thethesis presents

space mapping optimization algorithms and numerical tests of these algo-

rithms. Fifth, the thesis presents a convergence proof for a class of hybrid

spacemapping algorithms.

The ve areas mentioned are covered in separate chapters of the thesis, as

described inthe following outline.

1.1 Outline of the Thesis

This thesis is divided into six chapters. Each chapter is intended to be self-

contained,thoughtheChapters4,5and6areeasierconceivable,ifthereader

isfamiliarwithChapter 3.

Chapter 2 contains a literature overview of surrogate modelling and op-

timization. For a review specically of space mapping methods, refer

to [1,3], two papers co-authoredbythisauthor.

Chapter 3 is an included paper [2],introducing and motivating spacemap-

pingmethodology tothe engineeringand mathematical communities.

Chapter 4 treats theoreticalaspects ofspace mapping.

Chapter 5 considers formulationof space mapping optimization algorithms

andthe numerical performance of these.

Chapter 6 is anincluded paper[4],formulatingandproving convergence of

hybrid space mapping algorithms.

Chapter 7 is asummaryof theconclusionsof thestudy.

Appendix A serves as a manual for a Matlab toolbox with space mapping

optimization routines and test problems, developed as a part of the

(17)

Eachchaptercontainsaseparatelistofreferences,acompletelistofreferences

isprovided before the appendix.We remark thatthenotation is not uniform

throughoutthethesis,listsofsymbolsareprovidedfortheChapters2,4and5

right before thelistof referencesinthese chapters.

(18)

References

[1] M.H. Bakr, J.W. Bandler, K. Madsen, J. Søndergaard, Review of the

SpaceMappingApproachtoEngineeringOptimizationandModelling,Op-

timization and Engineering, vol. 1,no.3,pp.241276, 2000.

[2] M.H. Bakr, J.W. Bandler, K. Madsen, J. Søndergaard, An Introduction

to the Space Mapping Technique, Optimization and Engineering, vol. 2,

no.4,pp. 369384,2001.

[3] J.W. Bandler, Q. Cheng, S. Dakroury, A.S. Mohamed, M.H. Bakr, K.

Madsen,J.Søndergaard,SpaceMapping:TheStateoftheArt,submitted,

IEEE Trans. Microwave Theory Tech., 2004.

[4] K. Madsen,J.Søndergaard,Convergence of a HybridSpace Mapping Al-

gorithm, submitted,Optimization anEngineering, 2003.

(19)

Surrogate Modelling &

Optimization

Thischapterprovidesanoverviewofliteratureconcernedwithsurrogatemod-

ellingandoptimization. Thechapterisdivided into foursections.Section 2.1

is an introduction to the terminology and the general aspects of the eld.

Section 2.2 concerns the so-called functional models which are generic, and

non-specic to a given problem. Section 2.3 concerns the so-called physical

modelswhicharespecicto agivenproblem. Thelast section, Section 2.4,is

apresentation ofalgorithms for optimization usingsurrogate models.

2.1 Introduction to Surrogate Models

Inthe context of engineering modelling and optimization, a surrogate model

isamathematical orphysicalmodelwhichcan take theplaceofanexpensive

modelforthepurposeofmodellingoroptimizationofthelatter.Theexpensive

model may arise from numerical solution of large systems of e.g. integral or

dierential equations describing a physical system, or it could be an actual

physical system. The surrogate model may be a simplication physically or

numericallyoftheexpensivemodel;oritcouldbeapurelyempiricalconstruct,

(20)

Inotherwords,surrogates arecheapapproximationswhichareespeciallywell

suitedforactinginplaceofexpensivemodelsinconnectionwithmodellingand

optimization. A surrogate model is often so cheap that many repeated runs

of themodel maybe conducted within the expenseof running theexpensive

modelonce.Insomecasesthesurrogateevenprovidesamathematicallymore

tractable formulation, e.g. in the context of optimization, than that of the

expensive model. Oftenasurrogate modelislessaccurate than theexpensive

model, asthere tendsto be adualitybetween expensiveand cheapermodels.

Expensive models being of higher delity and cheaper model being of lower

delity.

Surrogatesarewidelyusedformodellingandoptimizationinmanyengineering

elds.In surrogate modellingthe engineercan usethe surrogateinextensive

analysisofdierentcongurations ofthemodelparameters,andtherebygain

insight into the workings ofthe physical systematlowexpense.

Similar to classical Taylor basedoptimization, thesearch for an optimizer in

surrogatebasedoptimization ismost oftenconducted byan iterativemethod

relying on sequentially generated surrogate models. In each iteration, the it-

erative methodperformsasearch ona surrogatemodelonly occasionallyref-

erencing the expensive model for validation and correction of the surrogate.

A classical Taylor based method requiresfrequent validation and correction,

andthus istoo expensive for theproblems thatsurrogate basedoptimization

isintended for.

Inthefollowingwesurveymethodsforgeneratingsurrogatemodelsandmeth-

ods foroptimizationusingsurrogates.Westart byexaminingtheterminology

of surrogate modelling and optimization, thereafter we dene a set of terms

and a problem formulation, from which we will develop the presentation of

themethods.

2.1.1 Surrogate Models in the Literature

Surrogate models and optimization methods using surrogate models are an

active areaofresearch.For example,inthecombinedeldofmathematicians

andengineersseveralconferences,schoolsandjournalissueshavebeendevoted

to the subjectinthe last years,see e.g.[2 , 9,21,43 , 58,59 ].

Unfortunatelythewideadoption ofsurrogatebasedmethods intheengineer-

ing community has lead to an ambiguous terminology in the literature. For

(21)

infact metamodel is often considered a synonym for a surrogate model, see

e.g.[32].

In case of surrogates for expensive computer models, i.e. numerical models,

MyersandMontgomery[42 ]usesthetermmetamodelaboutasomewhatsim-

pliedmathematicalapproximationofthefunctioncalculated bydesignanal-

ysis.Inotherwords,the expensivecomputermodelisreplacedbyacheaper,

possiblysimpler, model ofthe model,hencea metamodel.

Metamodels arecharacterized bythealgebraic or empiricalformulation,con-

structed independently of the underlying physics of the system, thus some

formof calibrationisneededinorder to usethem assurrogates.

Metamodels arecommonlyusedassynonym for thepopular responsesurface

models.Aresponsesurfacemodelisaresultofalinearor nonlinearregression

of(usuallysimple) algebraicfunctionswithdata.Forthis reason,they belong

toabroaderclassofmodelscalledregressionmodels.Responsesurfacemodels

arebasedonsampling inachosensetofexperimentaldesignpoints,aprocess

calleddesignof experiments isusedfor choosing these points.

For the purpose of optimization, surrogates are often managed in a model

management framework,see e.g.[1, 12,24 , 27 ,33 ,42, 55,63]. Amodel man-

agement framework enforceconditions onthemodels suchasadaption tothe

expensive modelina localregion or ina more global sense.Themost widely

adoptedof theseframeworksistheResponseSurface Methodology (RSM),see

e.g. [42], which is a framework for optimization using sequentiallygenerated

responsesurface models.

Onlyfewoftheframeworksreferencedabovehaveaprovenconvergenceprop-

erty,andaccordingto[1],someof theseframeworksactuallyfocusonconver-

genceto theproblemdenedbythesurrogatemodel,rather thantheoriginal

problem.

Anothertermusedincontext ofsurrogatemodelsisvariable-complexity mod-

elling, seee.g. [15].Thisterm covers cases where both theinexpensive model

and the expensive model are based on the underlying physics of the system

underconsideration.The termsvariable physicsand multi-delityphysicsare

then used to denote that within this system there exist a range of possible

physical models.

Themathematicaleldofconstructing approximationstoexpensivefunctions

hasbeen actively researched for several decades. So, other authors have pre-

sentedreviewsandother retrospective contributions,inwhichthey have par-

(22)

Thereviewpaperonapproximationconcepts foroptimumstructural design

by Barthelemy and Haftka [10 ] groups approximations to expensive models

intothreecategories: local,medium-rangeand globalapproximations,eachof

thesesubdivided againinto functional andproblemapproximation methods.

BarthelemyandHaftka usethetermfunctionalapproximation incases where

analternative, explicitexpression issought for theobjective functionand/or

the constraints of the problem. Further, they use the term problem approxi-

mation incases where the focusis on replacing theoriginal statement of the

problembyonewhichisapproximatelyequivalentbutwhichiseasiertosolve.

In his thesis Serani [50] suggested to divide engineering models into two

classes: physical and functional models. Where physical models arebased on

numerical solution of governing equations of physical systems and embodies

knowledgeofthephysicalsystemat allpoints; andwherethefunctionalmod-

els arealgebraic approximations ofthe solutions oftheequations constructed

without resortto knowledgeof the physical systems. Hence, functional mod-

els are purely mathematical constructs and they embody knowledge of the

behaviour of the function it is approximating, only at the points for which

function values aregiven.

Seranipointsoutthathisdistinctionbetweenphysicalandfunctionalmodels

isnotabsolute.SpecicallyheusesTaylormodels,i.e.truncated Taylorseries,

asexampleofhybridsbetween physical andfunctionalmodels,sincethey are

purelyamathematicalconstruct,butatthesametimetheydescribethesame

physical systemasthe governing equations.

In our presentation we will adopt Serani's classication into physical and

functionalmodels.However,wesuggesttointerpretthetermsabitdierently.

In contrast to Serani, we have no qualms in classifying Taylor models as

purefunctionalmodels,astheymaybeconstructedwithoutknowledgeofthe

actual governing equations, e.g. bynite dierence approximations. Even in

caseswheretheusersuppliesgradientorhigherderivativeinformation,Taylor

models are still considered functional models, as the models themselves can

be constructed independently of theunderlying physics. Thereby we discern

between the act of deriving a model and that of assigning parameter values

to amodel.

Ona side note, otherreferences, see e.g. [13 , 42 ], refer to physical models as

mechanisticmodels.

Torczon and Trosset [53] distinguish between surrogate models and surrogate

(23)

physicallyfaithful,butalsolesscomputationallyexpensive,thantheexpensive

modelittakestheplaceof.Further,asurrogatemodelexistsindependentlyof

theexpensivesimulation,andcanprovidenewinformationaboutthephysical

systemofinterestwithoutrequiringadditionalrunsoftheexpensivemodel.On

the other hand, surrogate approximations are algebraic summaries obtained

frompreviousrunsoftheexpensivemodel.Theseapproximationsaretypically

inexpensive to evaluate; they could e.g. use radial basis functions, kriging,

neuralnetworks,(low-order)polynomials,waveletsandsplines.Wewillreview

thementioned approximationmethods later oninthis presentation.

We will not adoptTorczon and Trosset'sdistinction between models andap-

proximations, asthese termsto a great extent areusedas synonyms inmost

otherreferences.Infact, wewill interchangeably useboth termsfor thesame

meaning.

Recalling the variable-complexity concept presented above, we observe how

physical models may be available in varying delity. In cases where several

physical models,one beingmore expensive thananother, areusedinan opti-

mizationprocess theterm multi-delity models issometimes used.

Below we dene the classication and the problemdescription we will usein

thispresentation of methods for surrogate modellingand optimization.

2.1.2 Our Approach

In our presentation we will distinguish between methods that are used to

generatesurrogatemodels,andmethods(sometimescalledmodelmanagement

frameworks)thatsearch for theoptimumusing surrogatemodels.

We classify surrogate models in the two categories: functional models and

physical models, dened asfollows.

Functionalmodels aremodels constructed withoutanyparticular knowl-

edgeof the physical system or governing equations. They are basedon alge-

braic expressions and empirical data; inoptimization context this dataarise

fromthe current iterateand possiblysome points either visitedbeforeinthe

process or foundbysampling the parameter space. Hence, functional models

existonly inthecontext ofsampled data obtained fromaphysical model.

Functional models are generic, and therefore applicable to a wide class of

(24)

regardtothattheyundercertainconditions,andgivenenough datapoints

eventuallywillinterpolate theunderlyingmodelofthedatapointsinsome

sense.Inpractice functionalmodels areoftenvery cheap to evaluate.

Themethods we consider,which can generate surrogatesbased onfunctional

modelsare: radial basisfunctions,kriging,neural networks, (low-order) poly-

nomials,waveletsand splines.

Thesurrogateoptimizationmethodsweconsider,whichcanemployfunctional

modelsassurrogates,are:responsesurfacemethodology,trustregionapproach

andpattern search.

Physicalmodels aremodelsbasedonknowledgeabouttheparticularphys-

ical systemin question. Determining a response from a physical model may

e.g.involve numerical solution ofdierential or integral equations.Butinthe

extreme case, aphysical modelcouldbe actualmeasurements ofthephysical

system. Ranges of physical models may exist for the same system, as in the

conceptofvariable-or multi-delityphysics.Physicalmodels arenot generic,

aseachofthemisrelatedtoaspecicphysicalsystem.Hence,reuseofphysical

modelsacrossdierent problemsisrare.Inpracticephysicalmodelsareoften

expensive to evaluate, except in cases where they are based on a signicant

simplicationof the physical system.

The methods we consider, which can generate surrogates based on physical

models,are: responsecorrection, multipoint methodand spacemapping.

Thesurrogateoptimizationmethodsweconsider,whichcanemployphysically

based surrogates, are: response surface methodology, trust region approach,

pattern search and spacemapping.

Thephysical andfunctional modelspresent extremes. Physical models inthe

one extreme, where a great deal is known about the system, and functional

models inthe otherextreme, where theonly assumption is thattheresponse

islocally smooth.

Inrealitya physical systemis not completelydetermined bygoverning equa-

tions, so the practical physical model may contain some empirical elements

e.g. asparameters determined byregressionto experimental data. Sincesuch

a modelis strongly coupledto theunderlying physics we wouldstill call it a

(25)

ProblemDenition

The optimization problem we consider, in surrogate optimization, involves a

real valued function f : IR n

7! IR , which represents an expensive model. We

callf aresponsefunctionof theexpensivemodel, itis theobjective function

to be minimized. We will focus on the case where f stems from a physical

model, though most of the optimization methods we consider can deal with

thecasewhere f stems froma functionalmodel.

We seekapointx2IR n

which isa solution totheproblem

min

x

f(x): (2.1)

We will denote an optimizer of this problem by x

. We assume that f is

bounded belowand uniform continuouslydierentiable.

In surrogate optimization f is to be replaced by s : IR n

7! IR , a surrogate

model,in the search for anoptimizer off.The search for x

isconducted by

amethod combiningsequentiallygenerated surrogates s

k

,k=1;2;:::, andan

iterative scheme that performs a search on s

k

to obtain thenext iterate. We

will denote the iterates by x

k

, k = 1;2;:::. When we say that a method is

convergent we implythat the sequencefx

k

g converges to x

,hence x

k

! x

fork !1.

The methods for generating functional models rely on pre-sampled data, in

ordertoconstructthemodel.Wewillassumethatpsamplepointst

i 2IR

n

,i=

1;:::;p,havebeenchosenandthatf hasbeenevaluatedatthesepoints.So,we

have a pre-sampleddataset (t

i

;y

i

),i=1;:::;p,where y

i

=f(t

i

).Webriey

discussstatisticalstrategies for placing these samplepoints inSection 2.2.1.

Themethods forgenerating physically basedsurrogates relyontheexistence

ofoneormoreuserprovidedcheap(lower-delity)physicalmodelsrepresented

bythe response functionsc

i :IR

n

7!IR, i=1;:::;q.A cheap physical model

maysometimes byitself actasasurrogate for theexpensive model.

In the parts of this presentation about space mapping, namely Section 2.3.3

and 2.4.4, we consider vector valued response functions, e.g. f : IR n

7! IR m

,

m>n. For thispurpose we introduce aconvex meritfunction H :IR m

7!IR,

usuallya norm. So(2.1) becomes

min

x

H(f(x)): (2.2)

Suchvectorvaluedresponsefunctionsarisee.g.inelectricalengineeringwhere

(26)

design problemisdened asminimizing theresidual between thesignals and

given design specications in some norm, sothat the residues constitute the

response functions.

Inthefollowingwepresent someofthe mostpopularapproachestosurrogate

modelling and optimization, and show how they relate. We aim to keep the

exposition clear by using a simple consistent notation throughout. However,

bycommitting toasimple notationwemustacceptthatweat thesame time

cannotcapturethedetailsofthemorespecializedapproachespresentedinthe

literature.

2.2 Surrogates Based on Functional Models

There isalarge numberof methods for generatingsurrogates inthecategory

of functionalmodels.So,we have limited this presentation to an overviewof

the most commonly used approximations for expensive models, namely the

regressionmodels,radialfunctions andsinglepoint approximations.

The regression models cover thebroadest and most widely used class of ap-

proximations. They are based on algebraic expressions, the so-called basis

functions, that aretted to the pre-sampled data. We will in particular deal

with polynomials, response surface models (including methods for design of

experiments) and wavelets.

Theradialfunctionscoveraclassofapproximationswhicharebasedoncombi-

nations ofbasisfunctionslocalizedaround thepre-sampledpoints. Wewill in

particular deal withkriging (including DACE), radial basisfunctions, neural

networksand aspecialclassof splines.

Thesinglepointapproximationsisaspecialclassoflocalmodels,thatincludes

Taylor models (i.e. truncated Taylor series). These approximations are usu-

ally not very attractive for approximating expensive functions, due to their

local nature. But certain types of approximations in this class have found

use in the structural engineering community. We will in particular consider

the reciprocal approximations, conservative approximations and posynomial

approximations.

2.2.1 Regression Models

Regressionis theprocessof ttingsome regressionmodel, representedbythe

function s(x;),to the pre-sampled data (t;y), i=1;:::;p of theresponse

(27)

function f fromthe expensive model. The parameters solve theregression

ordata ttingproblem

min

^

X

(y

i s(t

i

;

^

)) 2

: (2.3)

Here the Euclidean norm is used, but other norms or merits may be used

as well. Another widely used formulation, augmenting problem (2.3) , is the

weighted least-squares problem. For simplicity, we will only consider the un-

weightedproblem.

A number of statistical methods have been developed to help determine if a

regressionmodelyieldsagoodttothedata.Thesemethodsinclude residual

analysis, testing for lack of t, analysis of variance (ANOVA). To overcome

problemswithlackoft, statisticians relyon methods like transformation of

thevariables.See[13 ,42]forathoroughtreatmentofthesestatisticalmethods.

Further,in[11 , 19 ,33]practical usage ofthese ideas areillustrated.

Theregressionmodelmaytake manyforms.We startbydiscussingthelinear

regressionmodels, that is, theregression models where s(x;) islinear in.

We write s(x;) = T

v(x), where v : IR n

7! IR u

, is a vector with u basis

functionsv

j

(x), j=1;:::;u.

Response Surface Models

Themost frequentlyusedbasisfunctionsinlinearleast-squaresregressionare

loworderpolynomial regressionmodels.Hence, basisfunctionsofthe form

v(x)= 8

<

: 1

x (i)

x (j)

i;j2f1;:::;ng

yielding approximations like

s(x)= 8

>

<

>

:

0 +

T

1

x (i)

0 +

T

1 x+

n

X

i=1 n

X

j=1

j6=i

(i;j)

2 x

(i)

x (j)

(ii)

0 +

T

x+x T

2

x (iii)

(2.4)

(28)

where

k

, k = 0;1;2, The models are called (i) rst-order model, (ii)

rst-order modelwith interaction termsand (iii) second-order, or quadratic,

model.

Itisobviousthatthemodelsin(2.4)aresosimplethattheygenerallywillnot

interpolatethedata,especiallyasitusuallyrequiredthattheregressionprob-

lem is over-determined. In fact we mayonly expectthese models to describe

aglobal trend inthe functionalbehaviour.

Exceptions, where the simple models may be adequate, are for instance in

cases where thesampled response is very noisy,see [26 ] for an example with

a noisy aircraft wing design problem, and in cases where the response is so

smooth thatevenalinearmodelisavalidapproximationforalarge regionof

thedesign space.

Moresophisticatedregressionmodelsareoftenused,butchoosingawellsuited

regression model for a particular problem requires specic knowledge about

the expected behaviour of the system from which the data originate, or at

leastextensiveanalysisofthesampleddata. Wenotethatmoresophisticated

models mayenlarge the region of acceptable approximationcompared to the

simplemodels presentedabove.However,a drawbackofintroducingmoreso-

phisticatedregressionmodelsisthateventhoughsuchamodelmayinterpolate

thegiven data,itisnot necessarilygoodatdescribing thebehaviourbetween

knowndatapoints,orinextrapolatingoutsidethe regionspannedbythedata

points. Thisfactshows, e.g. forhigher order polynomial approximations.

Inthestatisticsliteratureregressionmodelsasthosein(2.4) ,inparticularthe

quadraticmodel(iii),areassociatedwiththetermresponse surface models.A

very popular optimization framework,employing response surface models,is

theresponse surface methodology, which we present inSection 2.4.1.

First we will discuss some statistical strategies on how to choose the pre-

sampledpoints.

Design of Experiments

The functional models we have considered so far are constructed on basisof

pre-samplesdata(t

i

;y

i

),i=1;:::;p,where t

i 2IR

n

arecalleddatapointsor

designsites, andy

i

=f(t

i

)aretheresponsesfromtheexpensivemodelat the

designsites. Theprocessofdeterminingwhere toplacethedesignsitesinthe

parameter space is in the statistical literature called Design of Experiments

(29)

In design of experiments the eort is on laying out experiments, i.e. the ac-

tualplacement ofthet

i

's, incertain optimal ways. D-optimalityis apopular

measurerelatedto the determinant of thecovariance matrix inquestion, but

othermeasuresexist.Frequently usedexperimental designsarefactorial,cen-

tralcompositeandBox-Behnken designs(seereferencesbelow).Traditionally

muchofthefocusinDOEhasbeenonreducingnoiseintheexperiment,byre-

peated runswithblocking of factors. However, more and moreexperiments

are conducted using computer models, and DOE for computer experiments

needspecialattention sincethecomputer models

are deterministic, i.e.always returnthe same resultwhen evaluated for

the same parameters (the numerical noise is assumed to be negligible),

sorepeated runsarenot needed,

areunaectedinresponsebytheorderingoftheexperiment, henceex-

perimental blocking is not needed,

oftenaredenedoverwideparameterranges,andofteninmanyparam-

eters.

Regardingthelastpoint,computerexperimentersoftenseektondacomplex

approximationextendingoverawiderangeofthedesignvariables,hencelarge

regionsneed to be coveredby theexperimental design.

Thesecharacteristicscallforaspecialtypeofexperimentaldesigncalledspace-

llingdesigns,whichaimtoexercisealltheparametersovertheirentireranges.

McKayetal.[41]presentedanintuitiveapproach,whichhasbecomeverypop-

ular, called Latin hypercube sampling from which stochastic space-lling de-

signsareeasilyobtained,evenwhenalargenumberofdesignsitesarerequired.

There also existdeterministic experimental designswithspace-lling proper-

ties.Experimentaldesignsarecoveredthoroughlyin[13 ,30 ,42 ],where[13,42]

havespecialfocusonresponsesurfacemethodology,thetopicofSection2.4.1.

Areviewof methods for design of optimal experiments aregiven in[28].

Wavelets

Wavelets provide a large number of orthonormal basisfunctions that can be

usedin linearleast-squares regression. The wavelet basis functionsare avail-

able at dierent scales, and each of these basis functions has a localized re-

(30)

Haar, Daubechies and symmlet. We present here thediscrete wavelet formu-

lationinonedimension,x2IR,andtheparticularsimpleHaarfamilyofbasis

functions; ourpresentation is basedon[29] and [61 ].

First consider the piecewise constant function (called the father wavelet or

scaling function)

^ v(x)=

1 forx2[0;1]

0 otherwise;

andthefunctions

v

j;k

(x)=2 j=2

^ v(2

j

x k) ; k;j2Z

+ :

The functions v

0;k

form an orthonormal 1

basis for functions with jumps at

the integers. Let the space spanned by this basis be denoted V

0

. Similarly

thefunctionsv

1;k

formanorthonormal basisfor aspace V

1 V

0

offunctions

piecewiseconstantonintervalsoflength 1

2

.MoregenerallywehaveV

2

V

1 V

0

,where V

j

isspanned bythe 2 j

functionsv

j;k

,k=0;1;:::;2 j

.

Wemight representafunction inV

j+1

bya componentinV

j

plusthecompo-

nentinthe orthogonalcomplementW

j ofV

j toV

j+1

,writtenV

j+1

=V

j W

j .

Fromthe mother wavelet

^

w(x)=^v(2x) v(2x^ 1)

we can generate thefunctions

w

j;k

(x)=2 j=2

^ w(2

j

x k) ; k;j 2Z

+ :

Thenit can be shown [61] thatthefunctionsw

j;k

form anorthonormal basis

for W

j .

Noticethatsince these spaces areorthogonal, all thebasisfunctionsv

j;k and

w

j;k

areorthonormal.NowV

j+1

=V

j

W

j

=V

j 1

W

j 1

W

j

=,sowe

canmake arepresentation oftheformV

j

=V

0

W

0

W

1

W

j 1

.Assume

that we were to construct an interpolation at 2 j

data points, so at most 2 j

basisfunctionsareneededfor interpolation. We coulduse the2 j

functionsin

V

j

,oralternativelythe2 j

1functionsinW

0

W

1

W

j 1

andoneinV

0 .

If a non-interpolating approximation is needed, one can chose to use only a

subset ofthe basis functions,e.g. approximation at levelj starting at leveli,

i<j,gives the 2 j

2 i

basisfunctionsinV

i

W

i

W

i+1

W

j 1 .

1

(31)

If we assume that the parameters x of our problem (2.1) is scaled to the

interval[0;1], theresulting wavelet approximationcan bewritten as

s(x)= T

0

B

@ v

i;0 (x)

.

v

i;2 i(x)

w

i;0 (x)

.

w

i;2 i(x)

.

w

j 1;0 (x)

.

w

j 1;2 j 1

(x) 1

C

A

;

for the levelj approximationstarting at level i.The regressioncoecients

areusuallyfound by least-squares.

Acommonprocedureistoapplyathresholdtodiscardorlteroutthesmaller

coecients, and thereby reduce the number of basis functions in the nal

approximation. Choosingagoodlevelof approximationisdicultingeneral,

andout of the scope ofthis presentation.

We should note that in the context of engineering design, the Haar family

ofbasis functionsmay not be the best suited, due to their non-dierentiable

form,henceoneoftheothermoresmoothfamiliesofwaveletsshouldbechosen

instead. The concept of wavelets is thoroughly dealt with in[61 ], a classical

referenceonwavelets is[20].

2.2.2 Radial Functions

Aparticularsuccessfulclassofinterpolationmethodsarebasedonamodelof

theform

s(x)=v(x) T

+b(x) T

; (2.5)

where v is a vector with basis functions (as above) and is the solution to

ageneralized linearleast-squaresregression problem,which we willintroduce

later. The second part is a radial model, which consists of the function b :

IR n

7!IR p

that isavectorof radial functions

b (x)=(kx t k); j=1;:::;p; (2.6)

(32)

andthecoecients 2IR p

.

Aradialfunction dependsonly onthe radial distancefromtheorigin,or in

ourcasedistancetodesignpoints.Wewillusethenotation

j

(x)=(kx t

j k),

j = 1;:::;p. Various distance measures are used in practice, but the most

popular is the Euclidean norm. The nature of and will be more clear to

thereader as we present some radial function approximationmethods in the

following.

Kriging

The method of kriging [40] is very popular inthe geo-statistical community.

Thegeneral approximationis s(x)=(x) T

y,where thefunction is derived

inthefollowing, and y isthe vector withpre-sampledvalues off.

Kriging models decompose the function f into two parts. The rst part is a

linearregression modelrelated to a global trend inthedata, as theresponse

surface models in Section 2.2.1. The second part is a function z(x), being

the deviation between f and the regression model. Hence the interpolation

conditions forthekriging model are

f(t

i

) = y

i

= s(t

i )

= (t

i )

T

y

= v(t

i )

T

+z(t

i );

for i=1;:::;p.

Instatisticalterms, see[49],z isastochasticfunction,withmeanE[z(x)]=0

and variance E[z(x) 2

] =1, sampled along a suitable path. We consider z to

be a residual function, which we will approximate using a radial model, i.e.

thelast term in(2.5) .

LetV bethematrixwheretheithcolumnistheithbasisfunctionv

i

evaluated

at the design sites, v

i (t

j

), i = 1;:::;u, j = 1;:::;p. Let Z be the vector

containingthe residualsat the samplepoints, z(t

j

),j=1;:::;p.

For anyx,we have

s(x) f(x) = (x) T

y f(x)

= (x) T

(V+Z) (v(x) T

+z(x))

T T T

(33)

Wecannowdeterminethemean squarederror,whichistheexpectedapprox-

imationvariance,

E[(s(x) f(x)) 2

] = E[((x) T

Z z(x)) 2

]

= E[z(x) 2

+(x) T

ZZ T

(x) 2(x) T

Zz(x)]

=

2

1+(x) T

C(x) 2(x) T

b(x)

: (2.7)

Where 2

istheprocess variance, C2R pp

isa symmetricmatrixwhere the

(i;j)thelementis(kt

i t

j

k),i;j=1;:::;p.Instatisticaltermstheelements

ofC describethe covariance between design sites.

Now we determine thefunction (x), for xedx, bythe quadratic program-

mingproblem, minimizing theexpectedapproximationvariance (2.7) ,

min

(x) 1

2 (x)

T

C(x) (x) T

b(x)

s:t: V T

(x)=v(x) :

(2.8)

Usingthisformulationtoderive(x),theapproximationiscalledkriging with

atrend.Whentheproblem(2.8)isunconstrained,theapproximationiscalled

simplekriging and the solutionis (x) =C 1

b(x). When v(x)=1 for all x,

theapproximationiscalled ordinarykriging.

TheLagrangian function correspondingto (2.8) is

L((x);) = 1

2 (x)

T

C(x) (x) T

b(x)+ T

(V T

(x) v(x))

where aretheLagrangemultipliers.Thenecessaryconditionsforanoptimal

solution are

r

(x)

L((x);)=C(x) b(x)+ T

V T

= 0

r

L((x);)=V T

(x) v(x) = 0:

Fromthese equations we maynd thesolutionbysolvinga linearsystem,

C V

V T

0

(x)

=

b(x)

v(x)

(2.9)

)

(x)

=

C 1

(1 VUV T

C 1

) C 1

VU

T 1

b(x)

;

(34)

whereU =(V T

C 1

V) 1

,andassumingthatCissymmetricpositivedenite.

Hencewe can write

(x)

=

R W

W T

U

b(x)

v(x)

;

where R andW aredened bythesolution above.

Thekriging approximationthenis

s(x) = (x) T

y

= b(x) T

R y+v(x) T

W T

y

= b(x) T

+v(x) T

;

(2.10)

where

= (V

T

C T

V) 1

V T

C T

y

= C

T

(y V)

areindependent of x.Note that isthe generalized least-squares solution to

thelinearregressionproblem.Havingthisformulation,weneedonlycalculate

v(x) andb(x) and thesumof two dot products for everyevaluation ofs(x).

We could have derived and in another way, namely by considering the

problem

min

1

2

T

C

T

y

s:t: V T

=0 ;

(2.11)

We should note thatwe have not been able to motivate the quadratic prob-

lem (2.11) in the same way as (2.8) , which minimizes the expected approxi-

mationvariance.The correspondingLagrangian function to (2.11) is

L(;)= 1

2

T

C

T

y+ T

V T

wherearetheLagrangemultipliers.Thenecessaryconditionsforanoptimal

solutionare

r

L(;) =C y+ T

V T

= 0

r L(;)=V T

= 0:

(35)

Fromthese equations we maynd thesolutionbysolvinga linearsystem,

C V

V T

0

=

y

0

(2.12)

)

=

R W

W T

U

y

0

againassumingthatC issymmetricpositivedenite;R ,W and U dened as

above.Derivingtheformulation inthiswayisbysomecalleddual kriging,the

simplecase withV =0is e.g. described in[48 ].

Whenv

i

isat most rst order polynomials, theconstraint V T

=0in(2.11)

correspondsto therequirement thatjs(x)j=O(jxj) [46 ].

For certain choice of C,the kriging approach relates to approximation using

natural cubic splines, [62] shows this relation in one dimension. The relation

between the thin-plate spline formulation and kriging is shown in [25], and

presentedinSection 2.2.2 below.

First we will discuss how statisticians have usedthe kriging approach inap-

proximation ofcomputer basedmodels.

DACE DesignandAnalysis ofComputer Experiments(DACE), namedaf-

ter a seminal paper by Sacks et al. [49] in 1989, is a statistical framework

for dealing with kriging approximations to (complex or expensive)computer

models.Kriging, inparticular theDACE framework, hasgained wide accep-

tance in many engineering communities, e.g. in mechanical, aerospace and

electricalengineering,asamethodfor approximatingexpensivefunctions,see

e.g.[32, 33,49 , 50,57].

IntheDACEframeworkthekrigingcorrelationmodelis oftenpresentedasa

radialfunctionof theform

j (x)=

n

Y

i=1 (;jx

(i)

t (i)

j

j): (2.13)

Hencea product of radial functions or, instatistical terms, correlation func-

(36)

Examples offrequently usedkriging correlationfunctions are[49 ]

(;jx (i)

t (i)

j j)=

8

>

<

>

:

exp(

(i)

(x (i)

t (i)

j )

2

) (i)

max f0; 1 (i)

jx (i)

t (i)

j

jg (ii)

max f0; 1 (i)

(x (i)

t (i)

j )

2

+ (i+n)

jx (i)

t (i)

j j

3

g (iii) :

(2.14)

Eachofthesearealsoavailableinanisotropic version,i.e.where isconstant

for allcoordinate directions.

TheDACEframework isimplementedina Matlabtoolbox,see[36, 37]. This

particular implementation takes great care in solving the system (2.9) in a

safe numerical way. In many other implementations the matrix C is often

naivelyinverted,andsinceitisoftenill-conditioned,numericalerrorsarelikely

to dominate the results. Further, the implementation includes a method for

tting the radialfunctions to data, i.e. nding minimizing a certain merit,

namely maximum likelihood estimation, i.e. least-squares when assuming a

Gaussian process.

Often,when using maximum likelihoodestimation, a Gaussian process isas-

sumed, then it is vital for the approximation to determine basis functions v

suchthattheresidualsy

i

V followsthenormal distribution,[49 ,18,36,37]

elaborate furtheron this subject.

From the viewpoint of Bayesian statistics the choice of correlation function

correspondstoaBayesianpriorontheshape orsmoothnessofthefunction.

InthisviewKriging,andtherebyDACE,isaBayesianmethod,seee.g.[49,18 ].

Radial Basis Function Approximations

Theradialbasisfunctionapproximationisasin(2.5) ,andisthus identicalto

kriging.However,intheliteraturethereisadierenceinthewaythefunctions

v and b are chosen. In radial basis function approximations v is a vector of

polynomials of at most order n, and b isa vector of radial (basis) functions,

using the Euclidean norm as distance measure. The coecients and are

(37)

Examplesof commonlyusedradial basisfunctions

j (x)=

8

>

<

>

: kx t

j k

2

(i)

kx t

j k

3

2

(ii)

kx t

j k

2

logkx t

j k

2

(iii)

p

kx t

j k

2

2 +

2

(iv)

exp( kx t

j k

2

) (v)

(2.15)

where isaxedconstant,provided bytheuser. AsintheDACEframework

mentionedintheprevioussection,maybettedtothedata,providedasuit-

ablemeritfunction.Thesetofradialfunctionsin(2.15)includebothfunctions

which grow with distance and functions which vanish with distance. Unlike

the krigingcorrelation functionsin(2.14) whichall vanishwithdistance.

Some of these radial functions are related to certain Green's functions for

partial dierential equations. Specically, the partial dierential equations

L

j

(x) = Æ(x t

j

) for the operators L = r 2

and L = r 4

. In Table 2.1

the Green's functions are presented for the one to three dimensional cases.

Theassociationwiththeradialfunctionsin(2.15) isevident.Forexamplethe

L=r 2

L=r 4

1D jx t

j

j jx t

j j(x t

j )

2

2D logkx t

j k

2

kx t

j k

2

logkx t

j k

2

3D kx t

j k

1

2

kx t

j k

2

Table 2.1: Green'sfunctions

j

(x), solutionstoL

j

(x)=Æ(x t

j )

Green'sfunction

j

(x)=kx t

j k

2

logkx t

j k

2

isthatsplinewhichsolvesthe

minimal surface problem for a thin plate, with a point load at t

j

, hence its

name thin-plate spline. Roach [47] provides a thorough treatment of Green's

functions.

Guttmann [27] presents a method for global optimization using radial ba-

sis function approximations. Guttmann states thatthe two main advantages

of radialbasis function approximations are an available measure of so-called

bumpiness, and that uniqueness of an interpolant is achieved under very

(38)

bumpiness maybe usedina merit functionfor determining theplacement of

succeedingdesign sites.

Note that, choosing the function (v) in (2.15) makes the radial basis func-

tion approximation the same as the DACE kriging approximation, with the

isotropicversionofthecorrelationfunction(i)in(2.14).However,theconstant

isprovided bythe userinthe radial basisfunctioncase, andoftenfoundby

maximumlikelihood estimationinthe DACEcase, asdescribedpreviously.

Powell [46]providesathorough treatment ofradialbasisfunctionapproxima-

tions,and contains manyreferences toother works.

Multivariate Adaptive Regression Splines

Anotherformofradialbasisfunctionapproximationsisthemultivariateadap-

tive regression splines, see e.g. [29]. We will present the approximation here

for the onedimensional case, x2IR.

Theapproximation is almost asin(2.5) . However, v is always the zeroorder

polynomial and b is a vector function b : IR 7! IR q

, and 2 IR q

. Usually

q<p,hencetheapproximation isnot necessarilyinterpolatingthedata.The

elementsof b,b

i

=

i

(x),i=1;:::;q,areradial functions,

i (x)=

Y

j2M

i

j

(x); M

i

f1;:::;pg

with

j

(x)2fjx t

j j

+

;jt

j xj

+

g; j=1;:::;p;

where jj

+

=maxf;0g. So, each

i

is a linear radialfunction or a product

oftwo or more suchfunctions.

The reader should note the structural similarity with the DACE kriging ap-

proximations,usingtheradialfunction(ii)in(2.14) .Thetrickypartis,asfor

other approximation methods, inconstructing the functions

i

, i=1;:::;q,

one strategyis suggestedin[29 , p.284].

Neural Networks

Neural networks are nonlinear regression models [29]. The most widely used

(39)

hidden layer back-propagation network and single layer perceptron. The ap-

proximation is

s(x)=(

0 +

1 x)

T

2 +

3

where the vector function : IR n

7! IR u

contains the so-called activation

functions. The large number of constants

0 2 IR

n

,

1 2 IR

nn

,

2 2 IR

u

and

3

2IR are determined by nonlinearleast-squares regression, asin(2.3) ,

assumingthat thedataset issuciently large.

Apopularchoice ofactivation functionis the sigmoidfunction

(z)= 1

1+e z

:

Other popular choices of activation functions include the radial basis func-

tions (2.15) . The latter case is called a radial basis function neural network,

and is exactly the same approximation as the formulation in (2.5) , with v

being the zero order polynomial. However, instead of the procedure derived

in2.2.2 for determining the model parameters, the model parameters of the

radialbasisfunction neural networksare determined by least-squares regres-

sion[31].

Forthepurposeofdeterminingthe'sbyregression,i.e.solving(2.3) ,several

well-known optimization algorithms has been reinvented in the neural net-

work community. One of the rst methods to re-appear, and probably still

themostwidelyused,isthesteepest-descentalgorithm,whichneuralnetwork

advocates have named back-propagation learning. Of course, a range of stan-

dard optimization algorithms, see [23 ], can be used to solve the regression

problem(2.3) ,and steepest-descentis not the most obvious inthat respect.

There is no reason to believe that neural networks should be able to pro-

vide better approximations than other methods mentioned in this work. In

fact, we should stress that thefrequently used viewpoint of neural networks

as convenient, magic, black-box approximations, may mislead the user into

overlooking the possibility of posing ill-conditioned problems where the

number ofparameters to bedetermined byregression(the's) islarger than

theprovideddatasetnotto mentionarelatedproblemnamelyseriousrisk

ofover-tting.

2.2.3 Single Point Models

We will now discuss a class of models that is based on information obtained

from a single point, usually the current iterate x in an optimization proce-

(40)

dure.Taylormodels,i.e.truncatedorapproximateTaylorseries,areprominent

membersofthisclass.However,thelow-orderTaylormodels,thatarefeasible

to construct in practice, are only valid as approximations in a small region

aroundx

k .

Inthe following weconsider some alternative models,also basedon informa-

tionfromthe current iterateonly.Undercertainconditionstheyhavealarger

region of validity, than Taylor based models, which make them tractable as

surrogates for expensive models.

Reciprocal and conservative approximations

Consider the approximation

s

k

(x)=f(x

k )+

n

X

i=1 r

x (i)

f(x

k )(x

(i)

x (i)

k )

i (x

(i)

;x (i)

k )

where x=(x (1)

;:::;x (n)

) T

.Wesee thatenforcing therstorderrequirement

s 0

k (x

k )=f

0

(x

k

) it follows that

i (x

(i)

k

;x (i)

k

) =1.The choice

i (x

(i)

;x (i)

k ) =1

yields the (linear)rst order Taylorseries approximation.

In structural optimization the alternative form (x (i)

;x (i)

k ) = x

(i)

k

=x (i)

is of-

tenused, the approximation is thencalled a reciprocal approximation [10].A

signicant class of constraints in structural engineering can in this way be

transformedfromnonlinearto linearequations (attheexpenseofintroducing

nonlinearityintotheobjectivefunction).Asthereciprocalapproximationmay

becomeunbounded ifanyofthevariablesapproachzero.Analternativeisthe

modiedreciprocal approximation,(x (i)

;x (i)

k

)=(x (i)

k +c

(i)

)=(x (i)

+c (i)

),where

thevaluesofc (i)

'saretypicallysmallcomparedto representativevaluesofthe

corresponding x (i)

's. Another alternative is the conservative approximation,

having

(x (i)

;x (i)

k )=

(

1 ifx (i)

k r

x (i)f(x

k )>0

x (i)

k

=x (i)

otherwise :

(2.16)

Following[1 ]theconservativeapproximationhastheattractivefeatureoflead-

ing to a convex programming problem and thus is amenable to solution by

nonlinear programming techniques that take advantage of thedual problem,

(41)

We notethatthereciprocaland conservative approximations destroythe lin-

earity of theapproximation, and thus thepossibility ofdirectly useit witha

sequential linearprogrammingalgorithm.

Posynomialapproximation

Theposynomial approximation ofthe form

s

k

(x)=f(x

k )

n

Y

i=1 x

(i)

x (i)

k

!

(i)

;

where

= f

0

(x

k )

f(x

k )

;

can be treated using geometric programming techniques, which actually re-

quires this form. According to [1], geometric programming techniques will,

underappropriate conditions and when applied to a posynomial approxima-

tion of the original problem, converge to a stationary point of the original

problem(2.1) .

2.2.4 Summary

Inthelastsectionswehavereviewedmethodsforgeneratingfunctionalmodels

that can be used as surrogate models for expensive functions. We have pre-

sentedthemostcommonly usedofthese methods, namelyregressionmodels,

radialfunctionsandsingle point models.

The regression models consist of regression functions and parameters. The

parameters are found by tting the regression functionsto pre-sampleddata

usingleast-squares.Theparticularmethodswehavediscussedaretheresponse

surface models, wavelets and neural networks. The problem of positioning

the pre-sampled data has only briey been covered in this presentation. In

fact, it is a discipline initself, often referredto inthe literature as design of

experiments.

The radial functions are used in a class of models called kriging models or

radial basis function approximations. Here, a regression model is combined

(42)

consistsofradialfunctionsandparameters.Inthepresentationwederivedhow

to simultaneously determine the parameters of the regression model and the

radialmodel.Wealso brieydiscusseda relatedmethod,namely themethod

ofmultivariate adaptive regressionsplines.

Thesinglepointmodelsareaclassofmodelsvalidonlyinalocalregionaround

asinglepoint.Wehave presentedthe reciprocalapproximation,theconserva-

tive approximation and the posynomial approximation. In theliterature the

validarea ofthese approximations are claimedto be widercompared to that

of acorresponding Taylormodel. Models ofthis type have gained popularity

inthemechanical engineering community.

2.3 Surrogates Based on Physical Models

We will now turn the attention to surrogate models which are specic for

the particular physical system in question. We will assume that besides the

responsefunction f fromthe expensive model,a userprovided,cheap(lower-

delity), physical modelwiththeresponse functionc:IR n

7!IR isavailable.

For some problems, the cheap model may itself act as a surrogate for the

expensive model. However, we cannotingeneral expect thatany given cheap

modelapproximatesthe expensive modelwell. Often,thedeviationsbetween

two physical models can be referred to problems with incorrect alignment of

theresponses or the parameter spaces.

Inthecaseofincorrectlyalignedresponsefunctions,wecouldapplyaresponse

correction g on c and obtain the surrogate g(c(x)). The response correction

coulde.g.beasimplescalingfunction.Byimposingconditionsontheformand

behaviouroftheresponsecorrection g,wecanmakethesurrogateinterpolate

f and its gradients at given points. Two methods ofthis type is presented in

thenextsection.

One method for performing response scaling, called the multipoint method,

is presented in Section 2.3.2. Here, a number of cheap models, related to

subsystemsoftheoriginalphysical system,areusedasregressionfunctionsin

adata tting problem. Theresulting regression parameters can be viewed as

scalingparameters of the cheap models.

Inthe caseofincorrectlyaligned parameterspaces,wecouldapplyatransfor-

mationofthecheapmodelparameterspace,sayamappingP,andobtainthe

(43)

we can make the surrogate approximate f and its gradients at given points.

Spacemapping techniquesarediscussed inSection 2.3.3.

2.3.1 Response Correction

Scaling of response functions is a mean of correcting physical models which

lackapproximationabilitieslikeinterpolation.Consider thesurrogateswhich

isaresponsecorrectiong:IR7!IRoftheresponsefunctioncfromthephysical

model,hence

s(x)=g(c(x)): (2.17)

Such a response correction g could be identied by imposing conditions on

s, e.g. at a given point x

k

the function values and the gradients match, i.e.

s(x

k

)=f(x

k

) and s 0

(x

k )=f

0

(x

k

). Two methods of this type are considered

below, therst method,the -correlation method, performs a simple scaling

ofc,the secondmethodismoregeneral and seekto approximate anassumed

functiong for which s(x)=f(x).

The -correlation Method

We now consider the case of response correction (2.17) where g is a simple

scalingofc.Henceg(c(x))=a(x)c(x),wherea(x)isthescalingfunction.Such

aresponsescalingmethod,calledthe-correlationmethod,ispresentedin[16]

asagenericapproachto correctingalower-delitymodelresponsebyscaling.

Themethodassumes theexistenceof asmooth functiona(x) forwhich

a(x)c(x)=f(x) and r(a(x)c(x))=f 0

(x):

Taylor's theoremprovidesthefollowing approximation,

g(c(x+h))' a(x)+a 0

(x) T

h

c(x+h): (2.18)

Using (2.18) ,at a given point x

k

, weobtain theapproximation,

s

k

(x) = a(x

k )+a

0

(x

k )

T

(x x

k )

c(x)

=

f(x

k )

c(x ) +

f 0

(x

k )c(x

k ) c

0

(x

k )f(x

k )

(c(x )) 2

T

(x x

k )

!

c(x);

(44)

which performs a scaling of c such that s

k and s

0

k

interpolate f and f 0

at

x

k .If f

0

is too expensive to calculate, an approximation can be used[1 ], e.g.

obtainedusingasecant method.Alternatively,thederivativesa 0

(x

k

)couldbe

approximated directlyby asecantmethod.

A shortcoming of the -correlation method is that if c vanishes at a point

x

k

thescalingis undened atthis point.Themore general method presented

nextmends this shortcoming.

General Response Correction

Amore general approach to responsecorrection (2.17) isto assume existence

ofa smooth function g for which

f(x)=g(c(x)); (2.19)

and thenapproximate this function. We will showhow to make a rst order

approximationtogassumingknowledgeoftherstderivativesoff,thereafter

we show how a secant method can be usedinstead, not requiring knowledge

off 0

.

Taylor'stheoremprovidesthe following approximation

g(c(x+h))'g(c(x))+g 0

(c(x))[c(x+h) c(x)]: (2.20)

Ata point x

k

,applying the interpolation conditionss(x

k

)=f(x

k ) and

s 0

(x

k )=f

0

(x

k

)on (2.20) ,weobtain thesurrogate

s

k

(x)=f(x

k )+g

0

(c(x

k

))[c(x) c(x

k

)]; (2.21)

withg 0

(c(x

k ))c

0

(x

k )=f

0

(x

k

).Inpracticeweonlyexpect(2.19)toholdapprox-

imately,wecanthenchooseg 0

(c(x

k

))asthesolutionofthelinearleast-squares

problem

min

a2IR kac

0

(x

k ) f

0

(x

k )k

2 :

Similartothe-correlationmethoddescribedabove,iff 0

(x

k

)istooexpensive

to calculate, an approximation, e.g. obtained using a secant method, can be

used.

An alternative approach is to sequentially approximate thegradient g 0

(c(x))

by scalars a

k

2 IR, k = 0;1;:::, which, for a given sequence of points fx

k g,

obey the secant condition

f(x ) f(x )=a [c(x ) c(x )]; (2.22)

(45)

wherea

0

2IR is auser provided initial approximationto g 0

(c(x

0 )).

Thesurrogatebecomes

s

k

(x)=f(x

k )+a

k

[c(x) c(x

k

)]; (2.23)

wherea

k

isfound using (2.22) .Theuser hastoprovide an initial approxima-

tiona

0

,themost obviouschoice isto assumeno scaling, hencelet a

0

=1.

Theadvantageofthesurrogatein(2.23)over(2.21)isobvious,sincefor most

practicalproblemswecannotrelyontheexpensiveresponsederivativesf 0

(x

k )

beingavailable.

2.3.2 The Multipoint Method

The multipoint method proposed by Toropov in [55], and further described

in [54 , 56], is a method for creating physics based regression functions, for

systemsthatcan be partitioned into individual subsystems.

Themultipointmethodconstructsanapproximationbasedonpartitioningthe

physical systeminto,sayq,individualsubsystems, whichagain aredescribed

by empirical expressions or known analytical solutions c

i

, i = 1;:::;q. The

approximation inits simplestform is

s(x)= (0)

+ (1)

c

1

(x)+:::+ (q)

c

q (x)+

X

j>1

(j+q+1)

x (j)

s

(2.24)

where c

i

isbased on thephysics of the ith subsystem.The parameters are

modelparameters,whicharedeterminedbyleast-squaresregressionusingpre-

sampled data, as in(2.3) . The vector x

s

, is a subset of the design variables,

x

s

x, that has a global inuence on the physical system. The idea is that

eachsubsystemmaydependonlyonasubsetofthedesignvariables,andthat

onlyafew(global)variablesarerelatedwiththesystembehaviourasawhole.

Insteadof usinga simplelinear combination (2.24) , whichwecan write as

s(x)= (0)

+ X

j>1

(j)

v

j (x);

Toropov suggestedthree possible nonlinearformulations: multiplication,

s(x)= (0)

q

Y

v

j (x)

(j)

;