S u r r o g a t e M o d e l s
by the Space Mapping Technique
Jacob Søndergaard
Ph.D. Thesis
InformaticsandMathematicalModelling
TechnicalUniversityofDenmark
Kgs. Lyngby2003
Building 321,DK-2800Lyngby,Denmark
Phone +4545253351, Fax+45 45882673
reception@imm.dtu.dk
www.imm.dtu.dk
IMM-PHD-2003-111
Thisdissertationissubmitted to Informatics andMathematical Modelling at
theTechnical UniversityofDenmarkinpartialfulllmentoftherequirements
forthe degree of Doctorof Philosophy.
Theworkhasbeen supervisedbyProfessor,dr.techn. KajMadsen, Associate
ProfessorHansBruun Nielsenand Poul Erik Frandsen.
Kgs.Lyngby,January 31,2003.
Jacob Søndergaard
Surrogatemodellingandoptimizationtechniquesareintendedfor engineering
designinthecase where anexpensive physical model isinvolved.
Thisthesis provides a literature overview of the eld of surrogate modelling
and optimization.The space mapping technique isone suchmethod for con-
structingandoptimizing asurrogatemodelbasedon acheapphysical model.
The space mapping surrogate is the cheap model composed with a parame-
termapping,theso-calledspacemapping,connectingsimilarresponsesofthe
cheap and theexpensive model.
Thethesispresentsa theoreticalstudy ofthe spacemapping technique. The-
oreticalresults arederived whichcharacterize thespacemapping under some
idealconditions.Iftheseconditionsaremet,thesolutionsprovidedbytheorig-
inalspacemappingtechniqueareminimizersoftheexpensivemodel.However,
inpracticewecannotexpectthattheseidealconditionsaresatised.Sohybrid
methods, combining the spacemapping technique withclassical optimization
methods, shouldbe usedifconvergence tohigh accuracyiswanted.
Approximation abilities of the space mapping surrogate are compared with
thoseofaTaylormodeloftheexpensivemodel.Thespacemappingsurrogate
hasa lower approximationerror for longsteps. For short steps, however, the
Taylormodeloftheexpensivemodelisbest,dueto exactinterpolationatthe
modelorigin.
Fivealgorithmsforspacemappingoptimizationarepresentedandthenumer-
icalperformanceis evaluated.Three ofthe algorithms arehybrid algorithms.
Convergence of aclassof hybrid space mapping algorithmsis proved.
Surrogat modellerings- og optimeringsteknikker er rettet mod ingeniør de-
sign idettilfældehvor enmeget dyr fysiskmodelerinvolveret.
Afhandlingindeholderetlitteraturstudie omhandlendesurrogatmodellerings-
og optimeringsteknikker. Space mapping teknikken er en sådan metode til
optimeringafensurrogatmodel,somerbaseretpåenbilligfysiskmodel.Space
mappingsurrogatetbestårafden billigemodelsammensatmedenparameter
afbildning,den såkaldte spacemapping,derforbinder sammeresponsfra den
billigemed dendyre model.
Afhandlingenbeskriverteoretiske undersøgelser af space mapping teknikken.
Der udledes teoretiske resultater, som karakteriserer space mappingen under
nogleideelle betingelser. Såfremt dissebetingelsereropfyldt, vilde løsninger
somspacemappingteknikkennder,væreløsningertilminimeringsproblemet
forden dyre model. Detkandog ikke forventes, at disseideelle betingelserer
opfyldtipraksis.Derfor bør hybrid metoder, som kombinererspace mapping
teknikkenmedklassiskeoptimeringsmetoder,anvendeshviskonvergenstilhøj
nøjagtighed ønskes.
Approksimationsegenskaberneafspacemappingsurrogatetsammenlignesmed
enTaylormodelafdendyremodel.Spacemappingsurrogatetharenlavereap-
proksimationsfejlforstoreskridt.Forkorteskridtderimod,erTaylormodellen
afdendyremodelbedst,hvilketskyldeseksaktinterpolationiudviklingspunk-
tet.
Femalgoritmerforspacemapping optimeringpræsenteres ogderesnumeriske
egenskabererafprøvet. Tre afalgoritmerne erhybrid algoritmer. Konvergens
afen klasseaf space mapping hybrid algoritmer bevises.
IwishtothankmymainsupervisorProfessor,dr.techn.KajMadsenforgiving
metheopportunitytoengageinthisstudy.Thank youfor theencouragement
andmotivationthatcarriedmethroughthestudy.Ialsothankmysupervisors
Associate Professor Hans Bruun Nielsen and Poul Erik Frandsen, Ticra En-
gineeringConsultants,for their guidance.SpecialthankstoHanswho helped
meresolve manymathematical puzzles and always hasbeen available for ex-
change of ideas.Onthe same note Ialso wish to thank Ph.D. student Frank
Pedersen for manyrewarding discussions.
Iwantto thank for the hospitalityshownto mebyProfessor JohnE. Dennis
and his Ph.D. student Mark Abramson during my stay at Rice University.
Special thanksto John who inspired mewith his eager passion for unveiling
mathematical questionsinaconstant searchfor thetruth.
I also want to thank for the hospitality shown to me by Professor John W.
BandlerandhisPh.D.studentsMohamedBakr,JoséRayas-Sanchez,Mostafa
Ismail during my stays at McMaster University. Special thanks to John for
sharing his many captivating ideas and his devotion to the space mapping
technique. Also special thanks to Mohamed for his friendliness and his in-
volvement inthisstudy.
Iwouldlike tothank mycolleagues at IMMformaking thelast threeyearsa
pleasant experience.
Finally,I sincerelythank my deargirlfriend Janna andthe rest ofmy family
andfriendsfor their constant support.
1 Introduction 1
1.1 Outline of the Thesis . . . 2
References . . . 4
2 Surrogate Modelling & Optimization 5 2.1 Introduction to Surrogate Models . . . 5
2.1.1 Surrogate Models inthe Literature . . . 6
2.1.2 Our Approach . . . 9
2.2 Surrogates Based onFunctional Models . . . 12
2.2.1 Regression Models . . . 12
2.2.2 Radial Functions . . . 17
2.2.3 Single Point Models . . . 25
2.2.4 Summary . . . 27
2.3 Surrogates Based onPhysical Models . . . 28
2.3.1 ResponseCorrection . . . 29
2.3.2 The Multipoint Method . . . 31
2.3.4 Summary . . . 33
2.4 OptimizationUsing Surrogate Models . . . 34
2.4.1 ResponseSurfaceMethodology . . . 34
2.4.2 Trust RegionApproach . . . 36
2.4.3 Pattern Search Approach . . . 38
2.4.4 SpaceMappingOptimization . . . 40
2.4.5 Summary . . . 42
2.5 Conclusion. . . 44
Symbols . . . 45
References . . . 46
3 An Introduction to the Space Mapping Technique 53 3.1 Introduction . . . 54
3.2 Spacemapping details . . . 58
3.3 Combining withclassical methods. . . 59
3.4 Examples . . . 61
3.5 Conclusions . . . 68
Acknowledgments . . . 68
References . . . 68
4 Space Mapping Theory and Practice 71 4.1 SpaceMappingTheory . . . 72
4.1.1 Theoretical Introduction . . . 72
4.1.2 Example: SpaceMappingImages . . . 74
4.1.3 Theoretical Results . . . 82
4.1.4 TheUsual SpaceMappingDenition . . . 85
4.1.5 ExampleWithScalar Functions . . . 88
4.2 Alternative SpaceMappingDenitions . . . 94
4.2.1 Regularization Using z . . . 94
4.2.2 Regularization Using x . . . 96
4.2.3 Regularization Using Gradients . . . 98
4.2.4 Multiple Points . . . 102
4.2.5 Summary . . . 105
4.3 ApproximationError . . . 106
4.3.1 Summary . . . 112
4.4 Conclusion. . . 114
Symbols . . . 115
References . . . 116
5 Space Mapping Optimization Algorithms 119 5.1 Introduction . . . 119
5.2 Space MappingOptimizationAlgorithms. . . 122
5.2.1 Original SpaceMappingAlgorithms . . . 122
5.2.2 Hybrid SpaceMappingAlgorithms . . . 127
5.3 Numerical Tests. . . 134
5.3.1 TestRuns . . . 135
5.3.2 Space MappingDenitions . . . 144
5.3.3 Optimization Trajectories . . . 146
5.4 Conclusion. . . 151
Symbols . . . 152
References . . . 153
6 Convergence of Hybrid Space Mapping Algorithms 155 6.1 Introduction . . . 156
6.2.1 Detailsofthe SMTA . . . 158
6.2.2 Summaryof theSMTA . . . 160
6.3 Proof of Convergence . . . 160
6.3.1 Prerequisites . . . 161
6.3.2 Proofof Convergence . . . 162
6.4 Conclusions . . . 167
References . . . 167
7 Conclusion 169 Complete List of References 173 A Space Mapping Toolbox 183 A.1 SpaceMappingOptimizationTestProblems . . . 184
A.1.1 Interface . . . 184
A.1.2 TheTest Problems . . . 186
A.2 SpaceMappingOptimizationAlgorithms . . . 191
A.2.1 Interface . . . 191
A.2.2 Theoretical Overview. . . 193
A.2.3 TheOptimization Algorithms . . . 196
A.2.4 Auxiliary Functions . . . 198
A.3 Examples . . . 200
A.3.1 QuickRun . . . 200
A.3.2 Examining aProblem . . . 202
References . . . 203
Introduction
In engineering design it is often encountered thattraditional optimization is
notfeasiblebecausethemodelunderinvestigationistooexpensivetocompute.
Surrogatemodellingtechniqueshavebeendevelopedtoaddressthisimportant
issue.Surrogatemodels areintendedto taketheplace oftheexpensivemodel
for the purpose of modelling or optimization of the latter. In optimization
usingsurrogate models,a sequenceof subproblemsissolved inthesearch for
the optimizer of the expensive model. In the optimization process, most of
themodelevaluationsareperformedwiththesurrogatemodel.Theexpensive
modelisonly scarcelyevaluated inorderto re-calibrate thesurrogate model.
The space mapping technique is one such method for constructing and opti-
mizinga surrogate model. The technique relies on the existenceof a cheaper
model, modelling the same system as the expensive model under investiga-
tion. A space mapping surrogate model isthe cheaper modelcomposedwith
aparameter mapping, the so-called space mapping. Thespace mapping con-
nectssimilar responses of thecheaper model andthe expensive model. Here,
responsesaretheoutputreturnedfromamodelprovidedagivensetofparam-
etersand statevariables,which e.g.isa setofsample pointsinthefrequency
ortime domain.
The basic formulation of the space mapping technique is not convergent, in
model. Therefore,provably convergent hybrid methods have been developed,
combining the space mappingmethod withaclassical optimization method.
Thisthesisconcernsoptimizationofexpensivefunctionsusingsurrogatemod-
els. The focal point of the study is the space mapping technique. The thesis
addressvemainareas:First,thethesispresentsaliteratureoverviewofsur-
rogate modelling and optimization. Second, the thesis provides a motivation
andintroductiontothespacemappingtechnique.Third,thethesisprovidesa
theoreticalstudy ofthespace mapping technique.Fourth,thethesis presents
space mapping optimization algorithms and numerical tests of these algo-
rithms. Fifth, the thesis presents a convergence proof for a class of hybrid
spacemapping algorithms.
The ve areas mentioned are covered in separate chapters of the thesis, as
described inthe following outline.
1.1 Outline of the Thesis
This thesis is divided into six chapters. Each chapter is intended to be self-
contained,thoughtheChapters4,5and6areeasierconceivable,ifthereader
isfamiliarwithChapter 3.
Chapter 2 contains a literature overview of surrogate modelling and op-
timization. For a review specically of space mapping methods, refer
to [1,3], two papers co-authoredbythisauthor.
Chapter 3 is an included paper [2],introducing and motivating spacemap-
pingmethodology tothe engineeringand mathematical communities.
Chapter 4 treats theoreticalaspects ofspace mapping.
Chapter 5 considers formulationof space mapping optimization algorithms
andthe numerical performance of these.
Chapter 6 is anincluded paper[4],formulatingandproving convergence of
hybrid space mapping algorithms.
Chapter 7 is asummaryof theconclusionsof thestudy.
Appendix A serves as a manual for a Matlab toolbox with space mapping
optimization routines and test problems, developed as a part of the
Eachchaptercontainsaseparatelistofreferences,acompletelistofreferences
isprovided before the appendix.We remark thatthenotation is not uniform
throughoutthethesis,listsofsymbolsareprovidedfortheChapters2,4and5
right before thelistof referencesinthese chapters.
References
[1] M.H. Bakr, J.W. Bandler, K. Madsen, J. Søndergaard, Review of the
SpaceMappingApproachtoEngineeringOptimizationandModelling,Op-
timization and Engineering, vol. 1,no.3,pp.241276, 2000.
[2] M.H. Bakr, J.W. Bandler, K. Madsen, J. Søndergaard, An Introduction
to the Space Mapping Technique, Optimization and Engineering, vol. 2,
no.4,pp. 369384,2001.
[3] J.W. Bandler, Q. Cheng, S. Dakroury, A.S. Mohamed, M.H. Bakr, K.
Madsen,J.Søndergaard,SpaceMapping:TheStateoftheArt,submitted,
IEEE Trans. Microwave Theory Tech., 2004.
[4] K. Madsen,J.Søndergaard,Convergence of a HybridSpace Mapping Al-
gorithm, submitted,Optimization anEngineering, 2003.
Surrogate Modelling &
Optimization
Thischapterprovidesanoverviewofliteratureconcernedwithsurrogatemod-
ellingandoptimization. Thechapterisdivided into foursections.Section 2.1
is an introduction to the terminology and the general aspects of the eld.
Section 2.2 concerns the so-called functional models which are generic, and
non-specic to a given problem. Section 2.3 concerns the so-called physical
modelswhicharespecicto agivenproblem. Thelast section, Section 2.4,is
apresentation ofalgorithms for optimization usingsurrogate models.
2.1 Introduction to Surrogate Models
Inthe context of engineering modelling and optimization, a surrogate model
isamathematical orphysicalmodelwhichcan take theplaceofanexpensive
modelforthepurposeofmodellingoroptimizationofthelatter.Theexpensive
model may arise from numerical solution of large systems of e.g. integral or
dierential equations describing a physical system, or it could be an actual
physical system. The surrogate model may be a simplication physically or
numericallyoftheexpensivemodel;oritcouldbeapurelyempiricalconstruct,
Inotherwords,surrogates arecheapapproximationswhichareespeciallywell
suitedforactinginplaceofexpensivemodelsinconnectionwithmodellingand
optimization. A surrogate model is often so cheap that many repeated runs
of themodel maybe conducted within the expenseof running theexpensive
modelonce.Insomecasesthesurrogateevenprovidesamathematicallymore
tractable formulation, e.g. in the context of optimization, than that of the
expensive model. Oftenasurrogate modelislessaccurate than theexpensive
model, asthere tendsto be adualitybetween expensiveand cheapermodels.
Expensive models being of higher delity and cheaper model being of lower
delity.
Surrogatesarewidelyusedformodellingandoptimizationinmanyengineering
elds.In surrogate modellingthe engineercan usethe surrogateinextensive
analysisofdierentcongurations ofthemodelparameters,andtherebygain
insight into the workings ofthe physical systematlowexpense.
Similar to classical Taylor basedoptimization, thesearch for an optimizer in
surrogatebasedoptimization ismost oftenconducted byan iterativemethod
relying on sequentially generated surrogate models. In each iteration, the it-
erative methodperformsasearch ona surrogatemodelonly occasionallyref-
erencing the expensive model for validation and correction of the surrogate.
A classical Taylor based method requiresfrequent validation and correction,
andthus istoo expensive for theproblems thatsurrogate basedoptimization
isintended for.
Inthefollowingwesurveymethodsforgeneratingsurrogatemodelsandmeth-
ods foroptimizationusingsurrogates.Westart byexaminingtheterminology
of surrogate modelling and optimization, thereafter we dene a set of terms
and a problem formulation, from which we will develop the presentation of
themethods.
2.1.1 Surrogate Models in the Literature
Surrogate models and optimization methods using surrogate models are an
active areaofresearch.For example,inthecombinedeldofmathematicians
andengineersseveralconferences,schoolsandjournalissueshavebeendevoted
to the subjectinthe last years,see e.g.[2 , 9,21,43 , 58,59 ].
Unfortunatelythewideadoption ofsurrogatebasedmethods intheengineer-
ing community has lead to an ambiguous terminology in the literature. For
infact metamodel is often considered a synonym for a surrogate model, see
e.g.[32].
In case of surrogates for expensive computer models, i.e. numerical models,
MyersandMontgomery[42 ]usesthetermmetamodelaboutasomewhatsim-
pliedmathematicalapproximationofthefunctioncalculated bydesignanal-
ysis.Inotherwords,the expensivecomputermodelisreplacedbyacheaper,
possiblysimpler, model ofthe model,hencea metamodel.
Metamodels arecharacterized bythealgebraic or empiricalformulation,con-
structed independently of the underlying physics of the system, thus some
formof calibrationisneededinorder to usethem assurrogates.
Metamodels arecommonlyusedassynonym for thepopular responsesurface
models.Aresponsesurfacemodelisaresultofalinearor nonlinearregression
of(usuallysimple) algebraicfunctionswithdata.Forthis reason,they belong
toabroaderclassofmodelscalledregressionmodels.Responsesurfacemodels
arebasedonsampling inachosensetofexperimentaldesignpoints,aprocess
calleddesignof experiments isusedfor choosing these points.
For the purpose of optimization, surrogates are often managed in a model
management framework,see e.g.[1, 12,24 , 27 ,33 ,42, 55,63]. Amodel man-
agement framework enforceconditions onthemodels suchasadaption tothe
expensive modelina localregion or ina more global sense.Themost widely
adoptedof theseframeworksistheResponseSurface Methodology (RSM),see
e.g. [42], which is a framework for optimization using sequentiallygenerated
responsesurface models.
Onlyfewoftheframeworksreferencedabovehaveaprovenconvergenceprop-
erty,andaccordingto[1],someof theseframeworksactuallyfocusonconver-
genceto theproblemdenedbythesurrogatemodel,rather thantheoriginal
problem.
Anothertermusedincontext ofsurrogatemodelsisvariable-complexity mod-
elling, seee.g. [15].Thisterm covers cases where both theinexpensive model
and the expensive model are based on the underlying physics of the system
underconsideration.The termsvariable physicsand multi-delityphysicsare
then used to denote that within this system there exist a range of possible
physical models.
Themathematicaleldofconstructing approximationstoexpensivefunctions
hasbeen actively researched for several decades. So, other authors have pre-
sentedreviewsandother retrospective contributions,inwhichthey have par-
Thereviewpaperonapproximationconcepts foroptimumstructural design
by Barthelemy and Haftka [10 ] groups approximations to expensive models
intothreecategories: local,medium-rangeand globalapproximations,eachof
thesesubdivided againinto functional andproblemapproximation methods.
BarthelemyandHaftka usethetermfunctionalapproximation incases where
analternative, explicitexpression issought for theobjective functionand/or
the constraints of the problem. Further, they use the term problem approxi-
mation incases where the focusis on replacing theoriginal statement of the
problembyonewhichisapproximatelyequivalentbutwhichiseasiertosolve.
In his thesis Serani [50] suggested to divide engineering models into two
classes: physical and functional models. Where physical models arebased on
numerical solution of governing equations of physical systems and embodies
knowledgeofthephysicalsystemat allpoints; andwherethefunctionalmod-
els arealgebraic approximations ofthe solutions oftheequations constructed
without resortto knowledgeof the physical systems. Hence, functional mod-
els are purely mathematical constructs and they embody knowledge of the
behaviour of the function it is approximating, only at the points for which
function values aregiven.
Seranipointsoutthathisdistinctionbetweenphysicalandfunctionalmodels
isnotabsolute.SpecicallyheusesTaylormodels,i.e.truncated Taylorseries,
asexampleofhybridsbetween physical andfunctionalmodels,sincethey are
purelyamathematicalconstruct,butatthesametimetheydescribethesame
physical systemasthe governing equations.
In our presentation we will adopt Serani's classication into physical and
functionalmodels.However,wesuggesttointerpretthetermsabitdierently.
In contrast to Serani, we have no qualms in classifying Taylor models as
purefunctionalmodels,astheymaybeconstructedwithoutknowledgeofthe
actual governing equations, e.g. bynite dierence approximations. Even in
caseswheretheusersuppliesgradientorhigherderivativeinformation,Taylor
models are still considered functional models, as the models themselves can
be constructed independently of theunderlying physics. Thereby we discern
between the act of deriving a model and that of assigning parameter values
to amodel.
Ona side note, otherreferences, see e.g. [13 , 42 ], refer to physical models as
mechanisticmodels.
Torczon and Trosset [53] distinguish between surrogate models and surrogate
physicallyfaithful,butalsolesscomputationallyexpensive,thantheexpensive
modelittakestheplaceof.Further,asurrogatemodelexistsindependentlyof
theexpensivesimulation,andcanprovidenewinformationaboutthephysical
systemofinterestwithoutrequiringadditionalrunsoftheexpensivemodel.On
the other hand, surrogate approximations are algebraic summaries obtained
frompreviousrunsoftheexpensivemodel.Theseapproximationsaretypically
inexpensive to evaluate; they could e.g. use radial basis functions, kriging,
neuralnetworks,(low-order)polynomials,waveletsandsplines.Wewillreview
thementioned approximationmethods later oninthis presentation.
We will not adoptTorczon and Trosset'sdistinction between models andap-
proximations, asthese termsto a great extent areusedas synonyms inmost
otherreferences.Infact, wewill interchangeably useboth termsfor thesame
meaning.
Recalling the variable-complexity concept presented above, we observe how
physical models may be available in varying delity. In cases where several
physical models,one beingmore expensive thananother, areusedinan opti-
mizationprocess theterm multi-delity models issometimes used.
Below we dene the classication and the problemdescription we will usein
thispresentation of methods for surrogate modellingand optimization.
2.1.2 Our Approach
In our presentation we will distinguish between methods that are used to
generatesurrogatemodels,andmethods(sometimescalledmodelmanagement
frameworks)thatsearch for theoptimumusing surrogatemodels.
We classify surrogate models in the two categories: functional models and
physical models, dened asfollows.
Functionalmodels aremodels constructed withoutanyparticular knowl-
edgeof the physical system or governing equations. They are basedon alge-
braic expressions and empirical data; inoptimization context this dataarise
fromthe current iterateand possiblysome points either visitedbeforeinthe
process or foundbysampling the parameter space. Hence, functional models
existonly inthecontext ofsampled data obtained fromaphysical model.
Functional models are generic, and therefore applicable to a wide class of
regardtothattheyundercertainconditions,andgivenenough datapoints
eventuallywillinterpolate theunderlyingmodelofthedatapointsinsome
sense.Inpractice functionalmodels areoftenvery cheap to evaluate.
Themethods we consider,which can generate surrogatesbased onfunctional
modelsare: radial basisfunctions,kriging,neural networks, (low-order) poly-
nomials,waveletsand splines.
Thesurrogateoptimizationmethodsweconsider,whichcanemployfunctional
modelsassurrogates,are:responsesurfacemethodology,trustregionapproach
andpattern search.
Physicalmodels aremodelsbasedonknowledgeabouttheparticularphys-
ical systemin question. Determining a response from a physical model may
e.g.involve numerical solution ofdierential or integral equations.Butinthe
extreme case, aphysical modelcouldbe actualmeasurements ofthephysical
system. Ranges of physical models may exist for the same system, as in the
conceptofvariable-or multi-delityphysics.Physicalmodels arenot generic,
aseachofthemisrelatedtoaspecicphysicalsystem.Hence,reuseofphysical
modelsacrossdierent problemsisrare.Inpracticephysicalmodelsareoften
expensive to evaluate, except in cases where they are based on a signicant
simplicationof the physical system.
The methods we consider, which can generate surrogates based on physical
models,are: responsecorrection, multipoint methodand spacemapping.
Thesurrogateoptimizationmethodsweconsider,whichcanemployphysically
based surrogates, are: response surface methodology, trust region approach,
pattern search and spacemapping.
Thephysical andfunctional modelspresent extremes. Physical models inthe
one extreme, where a great deal is known about the system, and functional
models inthe otherextreme, where theonly assumption is thattheresponse
islocally smooth.
Inrealitya physical systemis not completelydetermined bygoverning equa-
tions, so the practical physical model may contain some empirical elements
e.g. asparameters determined byregressionto experimental data. Sincesuch
a modelis strongly coupledto theunderlying physics we wouldstill call it a
ProblemDenition
The optimization problem we consider, in surrogate optimization, involves a
real valued function f : IR n
7! IR , which represents an expensive model. We
callf aresponsefunctionof theexpensivemodel, itis theobjective function
to be minimized. We will focus on the case where f stems from a physical
model, though most of the optimization methods we consider can deal with
thecasewhere f stems froma functionalmodel.
We seekapointx2IR n
which isa solution totheproblem
min
x
f(x): (2.1)
We will denote an optimizer of this problem by x
. We assume that f is
bounded belowand uniform continuouslydierentiable.
In surrogate optimization f is to be replaced by s : IR n
7! IR , a surrogate
model,in the search for anoptimizer off.The search for x
isconducted by
amethod combiningsequentiallygenerated surrogates s
k
,k=1;2;:::, andan
iterative scheme that performs a search on s
k
to obtain thenext iterate. We
will denote the iterates by x
k
, k = 1;2;:::. When we say that a method is
convergent we implythat the sequencefx
k
g converges to x
,hence x
k
! x
fork !1.
The methods for generating functional models rely on pre-sampled data, in
ordertoconstructthemodel.Wewillassumethatpsamplepointst
i 2IR
n
,i=
1;:::;p,havebeenchosenandthatf hasbeenevaluatedatthesepoints.So,we
have a pre-sampleddataset (t
i
;y
i
),i=1;:::;p,where y
i
=f(t
i
).Webriey
discussstatisticalstrategies for placing these samplepoints inSection 2.2.1.
Themethods forgenerating physically basedsurrogates relyontheexistence
ofoneormoreuserprovidedcheap(lower-delity)physicalmodelsrepresented
bythe response functionsc
i :IR
n
7!IR, i=1;:::;q.A cheap physical model
maysometimes byitself actasasurrogate for theexpensive model.
In the parts of this presentation about space mapping, namely Section 2.3.3
and 2.4.4, we consider vector valued response functions, e.g. f : IR n
7! IR m
,
m>n. For thispurpose we introduce aconvex meritfunction H :IR m
7!IR,
usuallya norm. So(2.1) becomes
min
x
H(f(x)): (2.2)
Suchvectorvaluedresponsefunctionsarisee.g.inelectricalengineeringwhere
design problemisdened asminimizing theresidual between thesignals and
given design specications in some norm, sothat the residues constitute the
response functions.
Inthefollowingwepresent someofthe mostpopularapproachestosurrogate
modelling and optimization, and show how they relate. We aim to keep the
exposition clear by using a simple consistent notation throughout. However,
bycommitting toasimple notationwemustacceptthatweat thesame time
cannotcapturethedetailsofthemorespecializedapproachespresentedinthe
literature.
2.2 Surrogates Based on Functional Models
There isalarge numberof methods for generatingsurrogates inthecategory
of functionalmodels.So,we have limited this presentation to an overviewof
the most commonly used approximations for expensive models, namely the
regressionmodels,radialfunctions andsinglepoint approximations.
The regression models cover thebroadest and most widely used class of ap-
proximations. They are based on algebraic expressions, the so-called basis
functions, that aretted to the pre-sampled data. We will in particular deal
with polynomials, response surface models (including methods for design of
experiments) and wavelets.
Theradialfunctionscoveraclassofapproximationswhicharebasedoncombi-
nations ofbasisfunctionslocalizedaround thepre-sampledpoints. Wewill in
particular deal withkriging (including DACE), radial basisfunctions, neural
networksand aspecialclassof splines.
Thesinglepointapproximationsisaspecialclassoflocalmodels,thatincludes
Taylor models (i.e. truncated Taylor series). These approximations are usu-
ally not very attractive for approximating expensive functions, due to their
local nature. But certain types of approximations in this class have found
use in the structural engineering community. We will in particular consider
the reciprocal approximations, conservative approximations and posynomial
approximations.
2.2.1 Regression Models
Regressionis theprocessof ttingsome regressionmodel, representedbythe
function s(x;),to the pre-sampled data (t;y), i=1;:::;p of theresponse
function f fromthe expensive model. The parameters solve theregression
ordata ttingproblem
min
^
X
(y
i s(t
i
;
^
)) 2
: (2.3)
Here the Euclidean norm is used, but other norms or merits may be used
as well. Another widely used formulation, augmenting problem (2.3) , is the
weighted least-squares problem. For simplicity, we will only consider the un-
weightedproblem.
A number of statistical methods have been developed to help determine if a
regressionmodelyieldsagoodttothedata.Thesemethodsinclude residual
analysis, testing for lack of t, analysis of variance (ANOVA). To overcome
problemswithlackoft, statisticians relyon methods like transformation of
thevariables.See[13 ,42]forathoroughtreatmentofthesestatisticalmethods.
Further,in[11 , 19 ,33]practical usage ofthese ideas areillustrated.
Theregressionmodelmaytake manyforms.We startbydiscussingthelinear
regressionmodels, that is, theregression models where s(x;) islinear in.
We write s(x;) = T
v(x), where v : IR n
7! IR u
, is a vector with u basis
functionsv
j
(x), j=1;:::;u.
Response Surface Models
Themost frequentlyusedbasisfunctionsinlinearleast-squaresregressionare
loworderpolynomial regressionmodels.Hence, basisfunctionsofthe form
v(x)= 8
<
: 1
x (i)
x (i)
x (j)
i;j2f1;:::;ng
yielding approximations like
s(x)= 8
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
:
0 +
T
1
x (i)
0 +
T
1 x+
n
X
i=1 n
X
j=1
j6=i
(i;j)
2 x
(i)
x (j)
(ii)
0 +
T
x+x T
2
x (iii)
(2.4)
where
k
, k = 0;1;2, The models are called (i) rst-order model, (ii)
rst-order modelwith interaction termsand (iii) second-order, or quadratic,
model.
Itisobviousthatthemodelsin(2.4)aresosimplethattheygenerallywillnot
interpolatethedata,especiallyasitusuallyrequiredthattheregressionprob-
lem is over-determined. In fact we mayonly expectthese models to describe
aglobal trend inthe functionalbehaviour.
Exceptions, where the simple models may be adequate, are for instance in
cases where thesampled response is very noisy,see [26 ] for an example with
a noisy aircraft wing design problem, and in cases where the response is so
smooth thatevenalinearmodelisavalidapproximationforalarge regionof
thedesign space.
Moresophisticatedregressionmodelsareoftenused,butchoosingawellsuited
regression model for a particular problem requires specic knowledge about
the expected behaviour of the system from which the data originate, or at
leastextensiveanalysisofthesampleddata. Wenotethatmoresophisticated
models mayenlarge the region of acceptable approximationcompared to the
simplemodels presentedabove.However,a drawbackofintroducingmoreso-
phisticatedregressionmodelsisthateventhoughsuchamodelmayinterpolate
thegiven data,itisnot necessarilygoodatdescribing thebehaviourbetween
knowndatapoints,orinextrapolatingoutsidethe regionspannedbythedata
points. Thisfactshows, e.g. forhigher order polynomial approximations.
Inthestatisticsliteratureregressionmodelsasthosein(2.4) ,inparticularthe
quadraticmodel(iii),areassociatedwiththetermresponse surface models.A
very popular optimization framework,employing response surface models,is
theresponse surface methodology, which we present inSection 2.4.1.
First we will discuss some statistical strategies on how to choose the pre-
sampledpoints.
Design of Experiments
The functional models we have considered so far are constructed on basisof
pre-samplesdata(t
i
;y
i
),i=1;:::;p,where t
i 2IR
n
arecalleddatapointsor
designsites, andy
i
=f(t
i
)aretheresponsesfromtheexpensivemodelat the
designsites. Theprocessofdeterminingwhere toplacethedesignsitesinthe
parameter space is in the statistical literature called Design of Experiments
In design of experiments the eort is on laying out experiments, i.e. the ac-
tualplacement ofthet
i
's, incertain optimal ways. D-optimalityis apopular
measurerelatedto the determinant of thecovariance matrix inquestion, but
othermeasuresexist.Frequently usedexperimental designsarefactorial,cen-
tralcompositeandBox-Behnken designs(seereferencesbelow).Traditionally
muchofthefocusinDOEhasbeenonreducingnoiseintheexperiment,byre-
peated runswithblocking of factors. However, more and moreexperiments
are conducted using computer models, and DOE for computer experiments
needspecialattention sincethecomputer models
are deterministic, i.e.always returnthe same resultwhen evaluated for
the same parameters (the numerical noise is assumed to be negligible),
sorepeated runsarenot needed,
areunaectedinresponsebytheorderingoftheexperiment, henceex-
perimental blocking is not needed,
oftenaredenedoverwideparameterranges,andofteninmanyparam-
eters.
Regardingthelastpoint,computerexperimentersoftenseektondacomplex
approximationextendingoverawiderangeofthedesignvariables,hencelarge
regionsneed to be coveredby theexperimental design.
Thesecharacteristicscallforaspecialtypeofexperimentaldesigncalledspace-
llingdesigns,whichaimtoexercisealltheparametersovertheirentireranges.
McKayetal.[41]presentedanintuitiveapproach,whichhasbecomeverypop-
ular, called Latin hypercube sampling from which stochastic space-lling de-
signsareeasilyobtained,evenwhenalargenumberofdesignsitesarerequired.
There also existdeterministic experimental designswithspace-lling proper-
ties.Experimentaldesignsarecoveredthoroughlyin[13 ,30 ,42 ],where[13,42]
havespecialfocusonresponsesurfacemethodology,thetopicofSection2.4.1.
Areviewof methods for design of optimal experiments aregiven in[28].
Wavelets
Wavelets provide a large number of orthonormal basisfunctions that can be
usedin linearleast-squares regression. The wavelet basis functionsare avail-
able at dierent scales, and each of these basis functions has a localized re-
Haar, Daubechies and symmlet. We present here thediscrete wavelet formu-
lationinonedimension,x2IR,andtheparticularsimpleHaarfamilyofbasis
functions; ourpresentation is basedon[29] and [61 ].
First consider the piecewise constant function (called the father wavelet or
scaling function)
^ v(x)=
1 forx2[0;1]
0 otherwise;
andthefunctions
v
j;k
(x)=2 j=2
^ v(2
j
x k) ; k;j2Z
+ :
The functions v
0;k
form an orthonormal 1
basis for functions with jumps at
the integers. Let the space spanned by this basis be denoted V
0
. Similarly
thefunctionsv
1;k
formanorthonormal basisfor aspace V
1 V
0
offunctions
piecewiseconstantonintervalsoflength 1
2
.MoregenerallywehaveV
2
V
1 V
0
,where V
j
isspanned bythe 2 j
functionsv
j;k
,k=0;1;:::;2 j
.
Wemight representafunction inV
j+1
bya componentinV
j
plusthecompo-
nentinthe orthogonalcomplementW
j ofV
j toV
j+1
,writtenV
j+1
=V
j W
j .
Fromthe mother wavelet
^
w(x)=^v(2x) v(2x^ 1)
we can generate thefunctions
w
j;k
(x)=2 j=2
^ w(2
j
x k) ; k;j 2Z
+ :
Thenit can be shown [61] thatthefunctionsw
j;k
form anorthonormal basis
for W
j .
Noticethatsince these spaces areorthogonal, all thebasisfunctionsv
j;k and
w
j;k
areorthonormal.NowV
j+1
=V
j
W
j
=V
j 1
W
j 1
W
j
=,sowe
canmake arepresentation oftheformV
j
=V
0
W
0
W
1
W
j 1
.Assume
that we were to construct an interpolation at 2 j
data points, so at most 2 j
basisfunctionsareneededfor interpolation. We coulduse the2 j
functionsin
V
j
,oralternativelythe2 j
1functionsinW
0
W
1
W
j 1
andoneinV
0 .
If a non-interpolating approximation is needed, one can chose to use only a
subset ofthe basis functions,e.g. approximation at levelj starting at leveli,
i<j,gives the 2 j
2 i
basisfunctionsinV
i
W
i
W
i+1
W
j 1 .
1
If we assume that the parameters x of our problem (2.1) is scaled to the
interval[0;1], theresulting wavelet approximationcan bewritten as
s(x)= T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@ v
i;0 (x)
.
.
.
v
i;2 i(x)
w
i;0 (x)
.
.
.
w
i;2 i(x)
.
.
.
w
j 1;0 (x)
.
.
.
w
j 1;2 j 1
(x) 1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
;
for the levelj approximationstarting at level i.The regressioncoecients
areusuallyfound by least-squares.
Acommonprocedureistoapplyathresholdtodiscardorlteroutthesmaller
coecients, and thereby reduce the number of basis functions in the nal
approximation. Choosingagoodlevelof approximationisdicultingeneral,
andout of the scope ofthis presentation.
We should note that in the context of engineering design, the Haar family
ofbasis functionsmay not be the best suited, due to their non-dierentiable
form,henceoneoftheothermoresmoothfamiliesofwaveletsshouldbechosen
instead. The concept of wavelets is thoroughly dealt with in[61 ], a classical
referenceonwavelets is[20].
2.2.2 Radial Functions
Aparticularsuccessfulclassofinterpolationmethodsarebasedonamodelof
theform
s(x)=v(x) T
+b(x) T
; (2.5)
where v is a vector with basis functions (as above) and is the solution to
ageneralized linearleast-squaresregression problem,which we willintroduce
later. The second part is a radial model, which consists of the function b :
IR n
7!IR p
that isavectorof radial functions
b (x)=(kx t k); j=1;:::;p; (2.6)
andthecoecients 2IR p
.
Aradialfunction dependsonly onthe radial distancefromtheorigin,or in
ourcasedistancetodesignpoints.Wewillusethenotation
j
(x)=(kx t
j k),
j = 1;:::;p. Various distance measures are used in practice, but the most
popular is the Euclidean norm. The nature of and will be more clear to
thereader as we present some radial function approximationmethods in the
following.
Kriging
The method of kriging [40] is very popular inthe geo-statistical community.
Thegeneral approximationis s(x)=(x) T
y,where thefunction is derived
inthefollowing, and y isthe vector withpre-sampledvalues off.
Kriging models decompose the function f into two parts. The rst part is a
linearregression modelrelated to a global trend inthedata, as theresponse
surface models in Section 2.2.1. The second part is a function z(x), being
the deviation between f and the regression model. Hence the interpolation
conditions forthekriging model are
f(t
i
) = y
i
= s(t
i )
= (t
i )
T
y
= v(t
i )
T
+z(t
i );
for i=1;:::;p.
Instatisticalterms, see[49],z isastochasticfunction,withmeanE[z(x)]=0
and variance E[z(x) 2
] =1, sampled along a suitable path. We consider z to
be a residual function, which we will approximate using a radial model, i.e.
thelast term in(2.5) .
LetV bethematrixwheretheithcolumnistheithbasisfunctionv
i
evaluated
at the design sites, v
i (t
j
), i = 1;:::;u, j = 1;:::;p. Let Z be the vector
containingthe residualsat the samplepoints, z(t
j
),j=1;:::;p.
For anyx,we have
s(x) f(x) = (x) T
y f(x)
= (x) T
(V+Z) (v(x) T
+z(x))
T T T
Wecannowdeterminethemean squarederror,whichistheexpectedapprox-
imationvariance,
E[(s(x) f(x)) 2
] = E[((x) T
Z z(x)) 2
]
= E[z(x) 2
+(x) T
ZZ T
(x) 2(x) T
Zz(x)]
=
2
1+(x) T
C(x) 2(x) T
b(x)
: (2.7)
Where 2
istheprocess variance, C2R pp
isa symmetricmatrixwhere the
(i;j)thelementis(kt
i t
j
k),i;j=1;:::;p.Instatisticaltermstheelements
ofC describethe covariance between design sites.
Now we determine thefunction (x), for xedx, bythe quadratic program-
mingproblem, minimizing theexpectedapproximationvariance (2.7) ,
min
(x) 1
2 (x)
T
C(x) (x) T
b(x)
s:t: V T
(x)=v(x) :
(2.8)
Usingthisformulationtoderive(x),theapproximationiscalledkriging with
atrend.Whentheproblem(2.8)isunconstrained,theapproximationiscalled
simplekriging and the solutionis (x) =C 1
b(x). When v(x)=1 for all x,
theapproximationiscalled ordinarykriging.
TheLagrangian function correspondingto (2.8) is
L((x);) = 1
2 (x)
T
C(x) (x) T
b(x)+ T
(V T
(x) v(x))
where aretheLagrangemultipliers.Thenecessaryconditionsforanoptimal
solution are
r
(x)
L((x);)=C(x) b(x)+ T
V T
= 0
r
L((x);)=V T
(x) v(x) = 0:
Fromthese equations we maynd thesolutionbysolvinga linearsystem,
C V
V T
0
(x)
=
b(x)
v(x)
(2.9)
)
(x)
=
C 1
(1 VUV T
C 1
) C 1
VU
T 1
b(x)
;
whereU =(V T
C 1
V) 1
,andassumingthatCissymmetricpositivedenite.
Hencewe can write
(x)
=
R W
W T
U
b(x)
v(x)
;
where R andW aredened bythesolution above.
Thekriging approximationthenis
s(x) = (x) T
y
= b(x) T
R y+v(x) T
W T
y
= b(x) T
+v(x) T
;
(2.10)
where
= (V
T
C T
V) 1
V T
C T
y
= C
T
(y V)
areindependent of x.Note that isthe generalized least-squares solution to
thelinearregressionproblem.Havingthisformulation,weneedonlycalculate
v(x) andb(x) and thesumof two dot products for everyevaluation ofs(x).
We could have derived and in another way, namely by considering the
problem
min
1
2
T
C
T
y
s:t: V T
=0 ;
(2.11)
We should note thatwe have not been able to motivate the quadratic prob-
lem (2.11) in the same way as (2.8) , which minimizes the expected approxi-
mationvariance.The correspondingLagrangian function to (2.11) is
L(;)= 1
2
T
C
T
y+ T
V T
wherearetheLagrangemultipliers.Thenecessaryconditionsforanoptimal
solutionare
r
L(;) =C y+ T
V T
= 0
r L(;)=V T
= 0:
Fromthese equations we maynd thesolutionbysolvinga linearsystem,
C V
V T
0
=
y
0
(2.12)
)
=
R W
W T
U
y
0
againassumingthatC issymmetricpositivedenite;R ,W and U dened as
above.Derivingtheformulation inthiswayisbysomecalleddual kriging,the
simplecase withV =0is e.g. described in[48 ].
Whenv
i
isat most rst order polynomials, theconstraint V T
=0in(2.11)
correspondsto therequirement thatjs(x)j=O(jxj) [46 ].
For certain choice of C,the kriging approach relates to approximation using
natural cubic splines, [62] shows this relation in one dimension. The relation
between the thin-plate spline formulation and kriging is shown in [25], and
presentedinSection 2.2.2 below.
First we will discuss how statisticians have usedthe kriging approach inap-
proximation ofcomputer basedmodels.
DACE DesignandAnalysis ofComputer Experiments(DACE), namedaf-
ter a seminal paper by Sacks et al. [49] in 1989, is a statistical framework
for dealing with kriging approximations to (complex or expensive)computer
models.Kriging, inparticular theDACE framework, hasgained wide accep-
tance in many engineering communities, e.g. in mechanical, aerospace and
electricalengineering,asamethodfor approximatingexpensivefunctions,see
e.g.[32, 33,49 , 50,57].
IntheDACEframeworkthekrigingcorrelationmodelis oftenpresentedasa
radialfunctionof theform
j (x)=
n
Y
i=1 (;jx
(i)
t (i)
j
j): (2.13)
Hencea product of radial functions or, instatistical terms, correlation func-
Examples offrequently usedkriging correlationfunctions are[49 ]
(;jx (i)
t (i)
j j)=
8
>
>
>
>
>
>
<
>
>
>
>
>
>
:
exp(
(i)
(x (i)
t (i)
j )
2
) (i)
max f0; 1 (i)
jx (i)
t (i)
j
jg (ii)
max f0; 1 (i)
(x (i)
t (i)
j )
2
+ (i+n)
jx (i)
t (i)
j j
3
g (iii) :
(2.14)
Eachofthesearealsoavailableinanisotropic version,i.e.where isconstant
for allcoordinate directions.
TheDACEframework isimplementedina Matlabtoolbox,see[36, 37]. This
particular implementation takes great care in solving the system (2.9) in a
safe numerical way. In many other implementations the matrix C is often
naivelyinverted,andsinceitisoftenill-conditioned,numericalerrorsarelikely
to dominate the results. Further, the implementation includes a method for
tting the radialfunctions to data, i.e. nding minimizing a certain merit,
namely maximum likelihood estimation, i.e. least-squares when assuming a
Gaussian process.
Often,when using maximum likelihoodestimation, a Gaussian process isas-
sumed, then it is vital for the approximation to determine basis functions v
suchthattheresidualsy
i
V followsthenormal distribution,[49 ,18,36,37]
elaborate furtheron this subject.
From the viewpoint of Bayesian statistics the choice of correlation function
correspondstoaBayesianpriorontheshape orsmoothnessofthefunction.
InthisviewKriging,andtherebyDACE,isaBayesianmethod,seee.g.[49,18 ].
Radial Basis Function Approximations
Theradialbasisfunctionapproximationisasin(2.5) ,andisthus identicalto
kriging.However,intheliteraturethereisadierenceinthewaythefunctions
v and b are chosen. In radial basis function approximations v is a vector of
polynomials of at most order n, and b isa vector of radial (basis) functions,
using the Euclidean norm as distance measure. The coecients and are
Examplesof commonlyusedradial basisfunctions
j (x)=
8
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
: kx t
j k
2
(i)
kx t
j k
3
2
(ii)
kx t
j k
2
2
logkx t
j k
2
(iii)
p
kx t
j k
2
2 +
2
(iv)
exp( kx t
j k
2
2
) (v)
(2.15)
where isaxedconstant,provided bytheuser. AsintheDACEframework
mentionedintheprevioussection,maybettedtothedata,providedasuit-
ablemeritfunction.Thesetofradialfunctionsin(2.15)includebothfunctions
which grow with distance and functions which vanish with distance. Unlike
the krigingcorrelation functionsin(2.14) whichall vanishwithdistance.
Some of these radial functions are related to certain Green's functions for
partial dierential equations. Specically, the partial dierential equations
L
j
(x) = Æ(x t
j
) for the operators L = r 2
and L = r 4
. In Table 2.1
the Green's functions are presented for the one to three dimensional cases.
Theassociationwiththeradialfunctionsin(2.15) isevident.Forexamplethe
L=r 2
L=r 4
1D jx t
j
j jx t
j j(x t
j )
2
2D logkx t
j k
2
kx t
j k
2
2
logkx t
j k
2
3D kx t
j k
1
2
kx t
j k
2
Table 2.1: Green'sfunctions
j
(x), solutionstoL
j
(x)=Æ(x t
j )
Green'sfunction
j
(x)=kx t
j k
2
2
logkx t
j k
2
isthatsplinewhichsolvesthe
minimal surface problem for a thin plate, with a point load at t
j
, hence its
name thin-plate spline. Roach [47] provides a thorough treatment of Green's
functions.
Guttmann [27] presents a method for global optimization using radial ba-
sis function approximations. Guttmann states thatthe two main advantages
of radialbasis function approximations are an available measure of so-called
bumpiness, and that uniqueness of an interpolant is achieved under very
bumpiness maybe usedina merit functionfor determining theplacement of
succeedingdesign sites.
Note that, choosing the function (v) in (2.15) makes the radial basis func-
tion approximation the same as the DACE kriging approximation, with the
isotropicversionofthecorrelationfunction(i)in(2.14).However,theconstant
isprovided bythe userinthe radial basisfunctioncase, andoftenfoundby
maximumlikelihood estimationinthe DACEcase, asdescribedpreviously.
Powell [46]providesathorough treatment ofradialbasisfunctionapproxima-
tions,and contains manyreferences toother works.
Multivariate Adaptive Regression Splines
Anotherformofradialbasisfunctionapproximationsisthemultivariateadap-
tive regression splines, see e.g. [29]. We will present the approximation here
for the onedimensional case, x2IR.
Theapproximation is almost asin(2.5) . However, v is always the zeroorder
polynomial and b is a vector function b : IR 7! IR q
, and 2 IR q
. Usually
q<p,hencetheapproximation isnot necessarilyinterpolatingthedata.The
elementsof b,b
i
=
i
(x),i=1;:::;q,areradial functions,
i (x)=
Y
j2M
i
j
(x); M
i
f1;:::;pg
with
j
(x)2fjx t
j j
+
;jt
j xj
+
g; j=1;:::;p;
where jj
+
=maxf;0g. So, each
i
is a linear radialfunction or a product
oftwo or more suchfunctions.
The reader should note the structural similarity with the DACE kriging ap-
proximations,usingtheradialfunction(ii)in(2.14) .Thetrickypartis,asfor
other approximation methods, inconstructing the functions
i
, i=1;:::;q,
one strategyis suggestedin[29 , p.284].
Neural Networks
Neural networks are nonlinear regression models [29]. The most widely used
hidden layer back-propagation network and single layer perceptron. The ap-
proximation is
s(x)=(
0 +
1 x)
T
2 +
3
where the vector function : IR n
7! IR u
contains the so-called activation
functions. The large number of constants
0 2 IR
n
,
1 2 IR
nn
,
2 2 IR
u
and
3
2IR are determined by nonlinearleast-squares regression, asin(2.3) ,
assumingthat thedataset issuciently large.
Apopularchoice ofactivation functionis the sigmoidfunction
(z)= 1
1+e z
:
Other popular choices of activation functions include the radial basis func-
tions (2.15) . The latter case is called a radial basis function neural network,
and is exactly the same approximation as the formulation in (2.5) , with v
being the zero order polynomial. However, instead of the procedure derived
in2.2.2 for determining the model parameters, the model parameters of the
radialbasisfunction neural networksare determined by least-squares regres-
sion[31].
Forthepurposeofdeterminingthe'sbyregression,i.e.solving(2.3) ,several
well-known optimization algorithms has been reinvented in the neural net-
work community. One of the rst methods to re-appear, and probably still
themostwidelyused,isthesteepest-descentalgorithm,whichneuralnetwork
advocates have named back-propagation learning. Of course, a range of stan-
dard optimization algorithms, see [23 ], can be used to solve the regression
problem(2.3) ,and steepest-descentis not the most obvious inthat respect.
There is no reason to believe that neural networks should be able to pro-
vide better approximations than other methods mentioned in this work. In
fact, we should stress that thefrequently used viewpoint of neural networks
as convenient, magic, black-box approximations, may mislead the user into
overlooking the possibility of posing ill-conditioned problems where the
number ofparameters to bedetermined byregression(the's) islarger than
theprovideddatasetnotto mentionarelatedproblemnamelyseriousrisk
ofover-tting.
2.2.3 Single Point Models
We will now discuss a class of models that is based on information obtained
from a single point, usually the current iterate x in an optimization proce-
dure.Taylormodels,i.e.truncatedorapproximateTaylorseries,areprominent
membersofthisclass.However,thelow-orderTaylormodels,thatarefeasible
to construct in practice, are only valid as approximations in a small region
aroundx
k .
Inthe following weconsider some alternative models,also basedon informa-
tionfromthe current iterateonly.Undercertainconditionstheyhavealarger
region of validity, than Taylor based models, which make them tractable as
surrogates for expensive models.
Reciprocal and conservative approximations
Consider the approximation
s
k
(x)=f(x
k )+
n
X
i=1 r
x (i)
f(x
k )(x
(i)
x (i)
k )
i (x
(i)
;x (i)
k )
where x=(x (1)
;:::;x (n)
) T
.Wesee thatenforcing therstorderrequirement
s 0
k (x
k )=f
0
(x
k
) it follows that
i (x
(i)
k
;x (i)
k
) =1.The choice
i (x
(i)
;x (i)
k ) =1
yields the (linear)rst order Taylorseries approximation.
In structural optimization the alternative form (x (i)
;x (i)
k ) = x
(i)
k
=x (i)
is of-
tenused, the approximation is thencalled a reciprocal approximation [10].A
signicant class of constraints in structural engineering can in this way be
transformedfromnonlinearto linearequations (attheexpenseofintroducing
nonlinearityintotheobjectivefunction).Asthereciprocalapproximationmay
becomeunbounded ifanyofthevariablesapproachzero.Analternativeisthe
modiedreciprocal approximation,(x (i)
;x (i)
k
)=(x (i)
k +c
(i)
)=(x (i)
+c (i)
),where
thevaluesofc (i)
'saretypicallysmallcomparedto representativevaluesofthe
corresponding x (i)
's. Another alternative is the conservative approximation,
having
(x (i)
;x (i)
k )=
(
1 ifx (i)
k r
x (i)f(x
k )>0
x (i)
k
=x (i)
otherwise :
(2.16)
Following[1 ]theconservativeapproximationhastheattractivefeatureoflead-
ing to a convex programming problem and thus is amenable to solution by
nonlinear programming techniques that take advantage of thedual problem,
We notethatthereciprocaland conservative approximations destroythe lin-
earity of theapproximation, and thus thepossibility ofdirectly useit witha
sequential linearprogrammingalgorithm.
Posynomialapproximation
Theposynomial approximation ofthe form
s
k
(x)=f(x
k )
n
Y
i=1 x
(i)
x (i)
k
!
(i)
;
where
= f
0
(x
k )
f(x
k )
;
can be treated using geometric programming techniques, which actually re-
quires this form. According to [1], geometric programming techniques will,
underappropriate conditions and when applied to a posynomial approxima-
tion of the original problem, converge to a stationary point of the original
problem(2.1) .
2.2.4 Summary
Inthelastsectionswehavereviewedmethodsforgeneratingfunctionalmodels
that can be used as surrogate models for expensive functions. We have pre-
sentedthemostcommonly usedofthese methods, namelyregressionmodels,
radialfunctionsandsingle point models.
The regression models consist of regression functions and parameters. The
parameters are found by tting the regression functionsto pre-sampleddata
usingleast-squares.Theparticularmethodswehavediscussedaretheresponse
surface models, wavelets and neural networks. The problem of positioning
the pre-sampled data has only briey been covered in this presentation. In
fact, it is a discipline initself, often referredto inthe literature as design of
experiments.
The radial functions are used in a class of models called kriging models or
radial basis function approximations. Here, a regression model is combined
consistsofradialfunctionsandparameters.Inthepresentationwederivedhow
to simultaneously determine the parameters of the regression model and the
radialmodel.Wealso brieydiscusseda relatedmethod,namely themethod
ofmultivariate adaptive regressionsplines.
Thesinglepointmodelsareaclassofmodelsvalidonlyinalocalregionaround
asinglepoint.Wehave presentedthe reciprocalapproximation,theconserva-
tive approximation and the posynomial approximation. In theliterature the
validarea ofthese approximations are claimedto be widercompared to that
of acorresponding Taylormodel. Models ofthis type have gained popularity
inthemechanical engineering community.
2.3 Surrogates Based on Physical Models
We will now turn the attention to surrogate models which are specic for
the particular physical system in question. We will assume that besides the
responsefunction f fromthe expensive model,a userprovided,cheap(lower-
delity), physical modelwiththeresponse functionc:IR n
7!IR isavailable.
For some problems, the cheap model may itself act as a surrogate for the
expensive model. However, we cannotingeneral expect thatany given cheap
modelapproximatesthe expensive modelwell. Often,thedeviationsbetween
two physical models can be referred to problems with incorrect alignment of
theresponses or the parameter spaces.
Inthecaseofincorrectlyalignedresponsefunctions,wecouldapplyaresponse
correction g on c and obtain the surrogate g(c(x)). The response correction
coulde.g.beasimplescalingfunction.Byimposingconditionsontheformand
behaviouroftheresponsecorrection g,wecanmakethesurrogateinterpolate
f and its gradients at given points. Two methods ofthis type is presented in
thenextsection.
One method for performing response scaling, called the multipoint method,
is presented in Section 2.3.2. Here, a number of cheap models, related to
subsystemsoftheoriginalphysical system,areusedasregressionfunctionsin
adata tting problem. Theresulting regression parameters can be viewed as
scalingparameters of the cheap models.
Inthe caseofincorrectlyaligned parameterspaces,wecouldapplyatransfor-
mationofthecheapmodelparameterspace,sayamappingP,andobtainthe
we can make the surrogate approximate f and its gradients at given points.
Spacemapping techniquesarediscussed inSection 2.3.3.
2.3.1 Response Correction
Scaling of response functions is a mean of correcting physical models which
lackapproximationabilitieslikeinterpolation.Consider thesurrogateswhich
isaresponsecorrectiong:IR7!IRoftheresponsefunctioncfromthephysical
model,hence
s(x)=g(c(x)): (2.17)
Such a response correction g could be identied by imposing conditions on
s, e.g. at a given point x
k
the function values and the gradients match, i.e.
s(x
k
)=f(x
k
) and s 0
(x
k )=f
0
(x
k
). Two methods of this type are considered
below, therst method,the -correlation method, performs a simple scaling
ofc,the secondmethodismoregeneral and seekto approximate anassumed
functiong for which s(x)=f(x).
The -correlation Method
We now consider the case of response correction (2.17) where g is a simple
scalingofc.Henceg(c(x))=a(x)c(x),wherea(x)isthescalingfunction.Such
aresponsescalingmethod,calledthe-correlationmethod,ispresentedin[16]
asagenericapproachto correctingalower-delitymodelresponsebyscaling.
Themethodassumes theexistenceof asmooth functiona(x) forwhich
a(x)c(x)=f(x) and r(a(x)c(x))=f 0
(x):
Taylor's theoremprovidesthefollowing approximation,
g(c(x+h))' a(x)+a 0
(x) T
h
c(x+h): (2.18)
Using (2.18) ,at a given point x
k
, weobtain theapproximation,
s
k
(x) = a(x
k )+a
0
(x
k )
T
(x x
k )
c(x)
=
f(x
k )
c(x ) +
f 0
(x
k )c(x
k ) c
0
(x
k )f(x
k )
(c(x )) 2
T
(x x
k )
!
c(x);
which performs a scaling of c such that s
k and s
0
k
interpolate f and f 0
at
x
k .If f
0
is too expensive to calculate, an approximation can be used[1 ], e.g.
obtainedusingasecant method.Alternatively,thederivativesa 0
(x
k
)couldbe
approximated directlyby asecantmethod.
A shortcoming of the -correlation method is that if c vanishes at a point
x
k
thescalingis undened atthis point.Themore general method presented
nextmends this shortcoming.
General Response Correction
Amore general approach to responsecorrection (2.17) isto assume existence
ofa smooth function g for which
f(x)=g(c(x)); (2.19)
and thenapproximate this function. We will showhow to make a rst order
approximationtogassumingknowledgeoftherstderivativesoff,thereafter
we show how a secant method can be usedinstead, not requiring knowledge
off 0
.
Taylor'stheoremprovidesthe following approximation
g(c(x+h))'g(c(x))+g 0
(c(x))[c(x+h) c(x)]: (2.20)
Ata point x
k
,applying the interpolation conditionss(x
k
)=f(x
k ) and
s 0
(x
k )=f
0
(x
k
)on (2.20) ,weobtain thesurrogate
s
k
(x)=f(x
k )+g
0
(c(x
k
))[c(x) c(x
k
)]; (2.21)
withg 0
(c(x
k ))c
0
(x
k )=f
0
(x
k
).Inpracticeweonlyexpect(2.19)toholdapprox-
imately,wecanthenchooseg 0
(c(x
k
))asthesolutionofthelinearleast-squares
problem
min
a2IR kac
0
(x
k ) f
0
(x
k )k
2 :
Similartothe-correlationmethoddescribedabove,iff 0
(x
k
)istooexpensive
to calculate, an approximation, e.g. obtained using a secant method, can be
used.
An alternative approach is to sequentially approximate thegradient g 0
(c(x))
by scalars a
k
2 IR, k = 0;1;:::, which, for a given sequence of points fx
k g,
obey the secant condition
f(x ) f(x )=a [c(x ) c(x )]; (2.22)
wherea
0
2IR is auser provided initial approximationto g 0
(c(x
0 )).
Thesurrogatebecomes
s
k
(x)=f(x
k )+a
k
[c(x) c(x
k
)]; (2.23)
wherea
k
isfound using (2.22) .Theuser hastoprovide an initial approxima-
tiona
0
,themost obviouschoice isto assumeno scaling, hencelet a
0
=1.
Theadvantageofthesurrogatein(2.23)over(2.21)isobvious,sincefor most
practicalproblemswecannotrelyontheexpensiveresponsederivativesf 0
(x
k )
beingavailable.
2.3.2 The Multipoint Method
The multipoint method proposed by Toropov in [55], and further described
in [54 , 56], is a method for creating physics based regression functions, for
systemsthatcan be partitioned into individual subsystems.
Themultipointmethodconstructsanapproximationbasedonpartitioningthe
physical systeminto,sayq,individualsubsystems, whichagain aredescribed
by empirical expressions or known analytical solutions c
i
, i = 1;:::;q. The
approximation inits simplestform is
s(x)= (0)
+ (1)
c
1
(x)+:::+ (q)
c
q (x)+
X
j>1
(j+q+1)
x (j)
s
(2.24)
where c
i
isbased on thephysics of the ith subsystem.The parameters are
modelparameters,whicharedeterminedbyleast-squaresregressionusingpre-
sampled data, as in(2.3) . The vector x
s
, is a subset of the design variables,
x
s
x, that has a global inuence on the physical system. The idea is that
eachsubsystemmaydependonlyonasubsetofthedesignvariables,andthat
onlyafew(global)variablesarerelatedwiththesystembehaviourasawhole.
Insteadof usinga simplelinear combination (2.24) , whichwecan write as
s(x)= (0)
+ X
j>1
(j)
v
j (x);
Toropov suggestedthree possible nonlinearformulations: multiplication,
s(x)= (0)
q
Y
v
j (x)
(j)
;