Conclusion

Symbols

kk unspecied norm

Euclidean norm, kxk

=(x T

x) 1

Frobenious norm

c responsefrom the coarsemodel,c:IR n

7!IR m

f responsefrom the nemodel, f :IR n

7!IR m

g responsecorrection factors

H convex function,usedasmeritfunction

m number ofresponsefunctions

n dimensionality of thedesign parameter space

p space mapping

regularized spacemapping

x optimizeable modelparameters off andc

minimizer of H(f(x))

cÆp

minimizer of H(c(p(x)))

minimizer of kp(x) z

minimizer of H(c(z))

regularization parameter inspace mapping denitions

References

[1] M.H. Bakr, J.W. Bandler, N.K. Georgieva, K. Madsen, A Hybrid

Ag-gressive Space Mapping Algorithm for EM Optimization, IEEE Trans.

Microwave Theory Tech., vol. 47,pp.24402449, 1999.

[2] M.H. Bakr, J.W. Bandler, K. Madsen, J. Søndergaard, An Introduction

to the Space Mapping Technique, Optimization and Engineering, vol. 2,

no.4,pp.369384, 2001.

[3] M.H. Bakr, J.W. Bandler, K. Madsen, J. Søndergaard, Review of the

Space MappingApproachtoEngineeringOptimizationandModeling,

Op-timization andEngineering, vol.1, no.3,pp.241276, 2000.

[4] J.W.Bandler,R.M.Biernacki,S.H.Chen,P.A.Grobelny,R.H.Hemmers,

Space MappingTechniqueforElectromagneticOptimization,IEEETrans.

Microwave Theory Tech., vol. 42,pp.25362544, 1994.

[5] J.W. Bandler, R.M. Biernacki, S.H. Chen, R.H. Hemmers, K.

Mad-sen, Electromagnetic Optimization Exploiting Aggressive Space Mapping,

IEEE Trans. Microwave Theory Tech., vol.43, pp.28742882, 1995.

[6] J.W. Bandler, Q. Cheng, S. Dakroury, A.S. Mohamed, M.H. Bakr, K.

Madsen,J.Søndergaard,SpaceMapping:TheStateoftheArt,submitted,

IEEE Trans. Microwave Theory Tech., 2004.

[7] R.H.Byrd,J. Nocedal,R.B.Schnabel, Representationsof Quasi-Newton

MatricesandTheirUsein Limited MemoryMethods,Mathematical

Pro-gramming,vol. 63,pp.129156, 1994.

[8] A.R. Conn, N.I.M. Gould, P.L. Toint, Trust-Region Methods, SIAM,

Philadelphia, 2000.

[9] J.E. Dennis, Jr., Dept. Computational and Applied Mathematics, Rice

University, Houston,Texas,USA, private discussions,2001.

[10] J.E. Dennis, Jr., R.B. Schnabel, Numerical Methods for Unconstrained

Optimization and Nonlinear Equations, Prentice-Hall, Englewood Clis,

NJ, 1983.

[11] J.Hald,K. Madsen,Combined LP andQuasi-NewtonMethods for

Mini-[12] K. Madsen, Minimization of Non-Linear Approximation Functions, Dr.

Techn. Thesis, Institute for NumericalAnalysis, Technical University of

Denmark, 1986.

[13] Matlab TM

version6.5and Matlab TM

Optimization Toolboxversion2.2.,

The MathWorks, Inc.,3 AppleHillDrive,NatickMA01760-2098, 2003.

[14] J.J. Moré, Recent Developments in Algorithms and Software for Trust

RegionMethods, Mathematical Programming:TheState ofTheArt, pp.

258287, 1982.

[15] H.B.Nielsen, Matlabimplementation of the Levenberg-Marquardt

method fornonlinear least-squares,IMM, DTU,2001.

Available at http://www.imm.dtu.dk/hbn/Software/

[16] J. Nocedal,S.J. Wright, Numerical Optimization, Springer Seriesin

Op-erations Research,Springer, 1999.

[17] F.Ø. Pedersen, Advances on the Space Mapping Optimization Method,

Master Thesis, IMM-THESIS-2001-35, Informatics and Mathematical

Modelling, Technical Universityof Denmark,2001.

[18] F.Ø. Pedersen,TechnicalUniversityof Denmark,Lyngby,Denmark,

pri-vatediscussions,2001.

[19] F.Ø. Pedersen,TechnicalUniversityof Denmark,Lyngby,Denmark,

pri-vatediscussions,2003.

[20] J. Søndergaard, Non-Linear Optimization Using Space Mapping,

Mas-terThesis,IMM-EKS-1999-23,InformaticsandMathematicalModelling,

Technical UniversityofDenmark, 1999.

Available at http://www.imm.dtu.dk/km/jsmaster.ps.gz

[21] L.N. Vicente, Space Mapping: Models, Sensitivities and Trust-region

Methods, to appearinOptimizationand Engineering, 2003.

Convergence of Hybrid

Space Mapping Algorithms

Included paper:

Convergence of Hybrid Space Mapping Algorithms,

Kaj Madsen and Jacob Søndergaard,

submitted for publication, Optimization and Engineering.

Convergence of Hybrid Space Mapping Algorithms

Kaj Madsen (km@imm.dtu.dk) and Jacob Søndergaard (js@imm.dtu.dk)

Informatics and Mathematical Modelling, Technical University of Denmark

Abstract. The space mapping technique is intended for optimization of engineering models which involve very expensive function evaluations. It may be considered a preprocessing method which often provides a very efficient initial phase of an optimization procedure. However, the ultimate rate of convergence may be poor, or the method may even fail to converge to a stationary point.

We consider a convex combination of the space mapping technique with a classical optimization technique. The function to be optimized has the form H ◦ f where H : IR ^m 7→ IR is convex and f : IR ⁿ 7→ IR ^m is smooth. Experience indicates that the combined method maintains the initial efficiency of the space mapping technique.

We prove that the global convergence property of the classical technique is also maintained: The combined method provides convergence to the set of stationary points of H ◦ f.

Keywords: space mapping, global convergence

1. Introduction

The subject of this paper is to prove global convergence of an optimiza-tion method which is a convex combinaoptimiza-tion of two strategies: One which is efficient initially in an iteration and another which has guaranteed global convergence. The first algorithm is the so-called space mapping technique, described and motivated in (Bakr et al., 2001), the other one is a classical 1. order Taylor based trust region algorithm.

The problem to be solved is the following:

x∈IR min ⁿ H (f (x)) (1)

where f : IR ⁿ 7→ IR ^m is a smooth function, often m n. H : IR ^m 7→ IR is a convex function. It may be a norm in IR ^m , typically the L ₁ , L ₂ or the L _∞ norm. The following minimax function,

H (y) ≡ max

16i6m { y _i }

where y = (y 1 , y 2 , ..., y m ) ^T , is also often used, e.g., in electromagnetic

design, which has been a major application area for the space mapping

technique. Thus it is important to cover the case where H is

non-differentiable.

2 Madsen & Søndergaard

In the smooth least squares case, H = k . k ² 2 , the 1. order Taylor based method is the Gauss-Newton method. In this case convergence of the combined algorithm is proved in (Vicente, 2002). For the non-differentiable choices of H mentioned above special versions of the Taylor based method have been published in (Madsen, 1975) and (Hald

& Madsen, 1985). For general convex H convergence of the Taylor based trust region algorithm was proved in (Madsen, 1985). Related results may be found in (Fletcher, 1982) and (Womersley & Fletcher, 1986).

The space mapping technique which was introduced in (Bandler et al., 1994) is intended for problems where f is computationally ex-pensive. It is an optimization technique for engineering design in the following situation: f is representing an accurate model of some phys-ical system. Besides this model of primary interest (denoted the fine model), access to a cheaper (coarse) model of the same physical system is assumed. The latter may be less accurate. The main idea of the space mapping technique is to use the coarse model to gain information about the fine model, and to apply this in the search for an optimal solution of the latter. Thus the technique iteratively establishes a mapping between the parameters of the two models which relate similar model responses.

Having this mapping, most of the model evaluations can be directed to the fast coarse model.

A review of the Space Mapping approach is given in (Bakr et al., 2000).

We give a description of the combined method in Section 2. The convergence is proved in Section 3.

2. Description of the Algorithms

The two algorithms which are combined are both iterative. In the descriptions below the current iterate is x _k ∈ IR ⁿ . H ◦ f is denoted by F .

The Space Mapping Algorithm (SMA) assumes two functions avail-able: The function f to be minimized, and a function c : IR ⁿ 7→ IR ^m which represents the coarse model that is related to the same physical model as f . The space mapping p : IR ⁿ 7→ IR ⁿ is intended to connect similar values of f and c. In the present description it satisfies the following for x ∈ IR ⁿ :

p(x) ∈ arg min

x∈IR ˜ ⁿ k c(˜ x) − f (x) k (2)

where k · k is a norm in IR ⁿ , usually the L ₂ norm. The tentative step h _k from x _k is based on the following approximation to p(x _k + h):

p k (h) = B k h + p(x k ) , (3)

Convergence of Hybrid Space Mapping Algorithms 3 where the matrix B _k is intended to approximate the gradient p ⁰ (x _k ) ^T of p at x _k . This approximation may be found using the gradients of f and c if they are available, otherwise a Broyden update (Broyden, 1965) or a difference approximation to p ⁰ (x k ) ^T has been used. It is not important in the present paper how B _k is found.

The SMA finds h _k as a solution to h _k ∈ arg min

h H(c(p _k (h))) ,

where the minimization is usually subject to a trust region.

In the Taylor-based method for minimizing f the tentative step from x _k is based on the following 1. order approximation to f(x _k + h),

f k (h) = D k h + f (x k ) .

In the present paper D _k = f ⁰ (x _k ) ^T . Otherwise a Broyden update or a difference approximation to f ⁰ (x _k ) ^T have been used.

The tentative step h _k is found as a solution to h _k ∈ arg min

h H(f _k (h)) , (4)

where the minimization is usually subject to a trust region.

In the combined algorithm (SMTA) for minimizing f the tentative step h _k from x _k is based on a convex combination of c ◦ p _k and f _k :

s ^w _k (h) = w c(p _k (h)) + (1 − w) f _k (h) where 0 6 w 6 1 is a transition parameter.

The method finds the tentative step h _k as a solution to h _k ∈ arg min

h H(s ^w _k ^k (h)) ,

where the minimization is usually subject to a trust region.

In the algorithm w ₀ = 1 and w _k is non-increasing function of k. The principle being that if c ◦ p _k is doing badly in approximating f then w k is decreased, and thereby the algorithm gradually switches to using the Taylor model f _k of f .

2.1. Details of the SMTA The trust region at x _k is the set

T _k = { x ∈ IR ⁿ | k x − x _k k 6 Λ _k } (5)

where k · k is a suitable norm in IR ⁿ and Λ k > 0.

4 Madsen & Søndergaard

At the current iterate x _k , the tentative step h _k ∈ IR ⁿ is a solution to

h _k ∈ arg min

h H (s ^w _k ^k (h))

s.t. x _k + h ∈ T _k (6)

The quality of a given tentative step h _k is measured by

∆S k (h k ) ≡ H(s ^w _k ^k (h k )) − H(f(x k )) .

− ∆S k (h k ) may be interpreted as a measure of the ability of the model s ^w _k to predict a decrease in F . Notice that ∆S _k (0) is not necessarily 0, however, i.e., the model does not necessarily interpolate at x _k . If h k is acceptable then we use x k + h k as the next iterate, otherwise we maintain x _k as the best iterate found. The acceptance of the step is based on the following criteria:

If the predicted decrease is non-positive,

− ∆S _k (h _k ) 6 0 , (7)

then the step is rejected. Otherwise the step is accepted if F decreases sufficiently:

F (x _k ) − F (x _k + h _k ) > δ ˜ 1 [ − ∆S _k (h _k )] (8) where 0 < ˜ δ 1 < 1.

In each iteration the local bound Λ _k and the transition parameter w k are adjusted as follows:

If (7) is true then Λ _k+1 = Λ _k , otherwise Λ _k+1 depends on the ratio between actual and the predicted decrease. If

F (x _k ) − F (x _k + h _k ) 6 δ ˜ ₂ [ − ∆S _k (h _k )] , (9)

˜ δ ₁ < δ ˜ ₂ < 1, is satisfied then Λ _k+1 = K ₁ Λ _k with 0 < K ₁ < 1.

If

F (x _k ) − F (x _k + h _k ) > δ ˜ ₃ [ − ∆S _k (h _k )] , (10)

˜ δ ₂ < δ ˜ ₃ < 1, is satisfied then Λ _k+1 = K ₂ Λ _k with K ₂ > 1.

If none of the conditions (9) or (10) are satisfied then we let Λ _k+1 = Λ _k . The transition parameter w _k is chosen as follows: Initially w _k = 1. If x _k +h _k is not accepted then we wish to move weight towards the Taylor model, and therefore we let

w k+1 = K 3 w k min { Λ k+1 , 1 } , (11)

Convergence of Hybrid Space Mapping Algorithms 5 where 0 6 K 3 < 1. Otherwise w _k+1 = w _k .

In order to ensure convergence, however, we need w _k → 0 for k → ∞ . Therefore we also apply (11) if it has not been used for the previous n iterations.

2.2. Summary of the SMTA Given Λ ₀ , B ₀ = I, w ₀ = 1, k = 0

0. Find x ₀ as a solution to min _x _˜ H (c(˜ x)) 1. Evaluate f(x 0 )

2. Find p(x ₀ ) by solving (2) 3. Find h _k by solving (6) 4. Evaluate f(x _k + h _k )

5. Find p(x _k + h _k ) by solving (2)

6. If (7) is false and (8) is true then let x _k+1 = x _k + h _k otherwise x _k+1 = x _k

7. Update Λ, w, B and D (only if w < 1) 8. Let k = k + 1

9. If not converged then goto 3

The steps 0, 1 and 2 are an initial phase where a (local) opti-mizer of H (c(x)) is found and the initial translation p(x ₀ ) in the linear approximation to the space mapping (3) is found.

Note that if (7) is true after step 3, then the algorithm can skip to step 8, letting x _k+1 , Λ _k+1 , B _k+1 and D _k+1 take the values from iteration k and updating w _k using (11). Hence we avoid the evaluation of f(x k + h k ) and f ⁰ (x k + h k ) in this case.

3. Proof of Convergence

We show that the SMTA satisfies the usual convergence condition for trust region methods. In the proof we do not need the actual updating scheme of the weights { w _k } , (11), we only need property (12) below.

Similarly, we do not need any properties of the SMA, we only need that c is bounded (Assumption A2 below). Thus, the proof covers a class of algorithms, including those presented in (Bakr et al., 2001) and (Pedersen, 2001). Probably also other pre-processing techniques will suit into this framework.

Throughout this section we use the notations { x _k } , { h _k } , { Λ _k } and { w _k } as they are defined in Subsection 2.1. It follows from the definition of the weights that they satisfy

w k = min { Λ k+1 , 1 } o(1) (12)

6 Madsen & Søndergaard

where o(1) → 0 for k → ∞ .

We make the following assumptions

A1: For each point z ∈ IR ⁿ there exist a neighbourhood N (z) ⊆ IR ⁿ such that

f(x + h) = f(x) + f ⁰ (x)h + o( k h k ) , x ∈ N (z) , h ∈ IR ⁿ , where o is uniform for x ∈ N (z).

f ⁰ is continuous on IR ⁿ .

A2: c is bounded in the region of interest.

A3: H is a convex function on IR ^m . A4: { x k } stays in a bounded region in IR ⁿ . 3.1. Prerequisites

The convexity of H implies that it satisfies a local Lipschitz condition.

H is not necessarily differentiable. However, we can define a generalized gradient which is a set of points (rather than always a single point).

Since a function which is Lipschitz on a finite dimensional space is dif-ferentiable almost everywhere, the generalized gradient can be defined as follows, see (Clarke, 1975) or (Clarke, 1983), Theorem 2.5.1:

DEFINITION 1. The generalized gradient of H at x, denoted ∂H (x), is the convex hull of the set of all limits of the form lim H ⁰ (x + e _i ), where H is differentiable at x + e _i and e _i → 0 as i → ∞ ,

∂H (x) ≡ conv { lim

e i → 0 H ⁰ (x + e _i ) } .

It is easily seen, (Clarke, 1983), Proposition 2.1.1 and (Madsen, 1985), Proposition 2.1, that ∂H(x) is non-empty and compact.

Since H is convex the generalized gradient coincides with what is called the subdifferential in convex analysis. The convexity of H also implies the existence of a directional derivative H _e ⁰ (y) for any y, e ∈ IR ^m , e 6 = 0,

H _e ⁰ (y) ≡ lim

t ↓ 0

H (y + te) − H (x)

t .

Now consider the composite function F = H ◦ f. Since f is smooth

∂F is well defined. A stationary point of F is defined as follows,

Convergence of Hybrid Space Mapping Algorithms 7 DEFINITION 2. x is a stationary point of F if

0 ∈ ∂F (x).

Using the convexity of H and the smoothness of f we can obtain the following characterization of a stationary point (Madsen, 1985), Proposition 2.15.

PROPOSITION 1. Let x ∈ IR ⁿ . Then 0 ∈ ∂F (x)

m

F _h ⁰ (x) > 0 for every direction h ∈ IR ⁿ , h 6 = 0 .

Below we shall use the following 1. order approximation ∆F to a change in F :

DEFINITION 3. Let x, h ∈ IR ⁿ . Define

∆F (x; h) ≡ H(f(x) + f ⁰ (x)h) − H(f (x)) . We shall use the following two properties of ∆F : PROPOSITION 2. For x, h ∈ IR ⁿ we have

∆F (x; th) = tF _h ⁰ (x) + o(t) , for t > 0 . Proof. (Madsen, 1985), Proposition 2.9.

PROPOSITION 3. For x, h ∈ IR ⁿ and 0 6 t 6 1 we have

∆F (x; th) 6 t∆F (x; h) .

Proof. The result is a simple consequence of assumptions A1 and A3.

3.2. Proof of convergence

The technicalities of the convergence proof for SMTA are contained in the following three lemmas.

LEMMA 1. Let x ∈ IR ⁿ be a non-stationary point. Then there exist positive numbers δ ₁ , δ ₂ and ε ₁ such that for x _k ∈ IR ⁿ

k x _k − x k 6 δ 1 and Λ _k 6 δ 2

⇓

∆S k (h k ) 6 − ε 1 Λ k if k > k ˜

8 Madsen & Søndergaard

Proof. Since x is a non-stationary point there exist, by Proposition 1, a direction h such that F _h ⁰ (x) < 0. Then it follows from Proposition 2 that there exist a point x + d, d = th, t > 0, such that ∆F(x; d) < 0.

Define d k by x _k + d _k = x + d .

If x _k → x then d _k → d. Therefore it follows from the uniform continuity of f, f ⁰ and H that there exists a neighbourhood N (x) of x and a number δ such that

∆F(x _k ; d _k ) 6 − δ < 0 (13)

when x _k ∈ N (x). Define δ ₁ > 0 and δ ₂ > 0 such that if k x _k − x k 6 δ ₁ then x _k ∈ N (x) and k d _k k > δ ₂ .

Let h ^t _k ∈ IR ⁿ be a solution to (4) subject to the trust region:

h ^t _k ∈ arg min

h H (f _k (h)) s.t. x k + h ∈ T k

(14) Suppose Λ _k 6 δ 2 . Let t _k = Λ _k / k d _k k and q _k = t _k d _k . Then x _k +q _k ∈ T _k and since H(f _k (h)) = ∆F (x _k ; h) + F (x _k ) we can use the definition of h ^t _k and Proposition 3 to obtain

∆F(x k ; h ^t _k ) 6 ∆F (x k ; q k ) 6 t k ∆F(x k ; d k ) . (15) Since { d _k } is bounded away from 0 this implies the existence of ε > 0, independent of k, such that

∆F(x _k ; h ^t _k ) 6 − εΛ _k (16)

(using (13)). Since H is locally Lipschitz, there exists a constant K such that

| ∆F (x k ; h ^t _k ) | 6 K k h ^t _k k and thus (16) implies

k h ^t _k k > ε

K Λ _k . (17)

Now let u _k = k h ^t _k k / k d _k k . The arguments showing (15) imply

∆F(x _k ; h ^t _k ) 6 u _k ∆F (x _k ; d _k ) .

Using (13) and the definition of ∆F , this implies

H (f k (h ^t _k )) 6 − u k δ + H(f (x k )) . (18)

Convergence of Hybrid Space Mapping Algorithms 9 Because of (17), property (12), A2, and the boundedness away from 0 of {k d _k k} , we have the following inequalities for all sufficiently large values of k, say k > k, ˜

| w _k H(f(x _k )) | 6 δ 4 u _k , w k H(c(p k (h ^t _k ))) 6 δ

4 u k , w _k 6 1

4 .

Therefore, using the convexity of H and (18),

H (s ^w _k ^k (h ^t _k )) 6 w _k H (c(p _k (h ^t _k ))) + (1 − w _k )H (f _k (h ^t _k )) 6 δ

4 u _k + (1 − w _k )( − u _k δ + H(f(x _k )))

= δ

4 u _k − u _k δ + H(f(x _k )) + w _k u _k δ − w _k H(f(x _k )) 6 − δ

4 u k + H (f(x k )) .

Since h _k minimizes H ◦ s ^w _k ^k subject to the trust region, it follows that

∆S _k (h _k ) = H(s ^w _k ^k (h _k )) − H (f (x _k )) 6 H(s ^w _k ^k (h ^t _k )) − H (f (x _k )) 6 − δ

4 k h ^t _k k / k d _k k 6 − δ

4 Λ k / k d k k ,

which proves Lemma 1.

LEMMA 2. Let x ∈ IR ⁿ be a non-stationary point. Let δ ₁ be defined as in Lemma 1. For every δ 3 > 0 there exists ε 2 > 0 such that

k x _k − x k 6 δ ₁ and Λ _k > δ ₃

⇓

∆S _k (h _k ) 6 − ε ₂ if k > ˜ k

Proof. Let δ 4 = min { δ 2 , δ 3 } , δ 2 being defined as in Lemma 1. Suppose

˜ h _k is generated by (6) from x _k with the trust region bound ˜ Λ _k = δ ₄ . Then it follows from Lemma 1 that, for k > ˜ k,

∆S k (˜ h k ) 6 − ε 1 δ 4

10 Madsen & Søndergaard

Suppose h _k is generated from x _k by (6) with the local bound Λ _k > δ 3 . Then

∆S _k (h _k ) 6 ∆S _k (˜ h _k ) 6 − ε ₁ δ ₄ ,

which proves Lemma 2.

LEMMA 3. If { x k } is convergent then the limit point is stationary.

Proof. Suppose x _k → x for k → ∞ and that x is non-stationary.

From Assumptions A1-A3 and (12) we obtain F (x _k + h _k ) = H(f _k (h _k ) + o( k h _k k ))

= H(s ^w _k ^k (h _k ) − w _k (c(p _k (h _k )) − f _k (h _k )) + o( k h _k k ))

= H(s ^w _k ^k (h k )) + Λ k o(1) + o( k h k k )

= H(s ^w _k ^k (h _k )) + Λ _k o(1) Therefore

F (x _k ) − F (x _k + h _k ) = − ∆S _k (h _k ) + Λ _k o(1) (19) Let δ 2 be defined as in Lemma 1. Using Lemma 1 if Λ _k 6 δ 2 and Lemma 2 if Λ _k > δ ₂ we find from (19)

F (x _k ) − F (x _k + h _k ) = − ∆S _k (h _k )(1 + o(1))

Therefore Lemma 1 and the rules (7), (8) and (10) for accepting a new point and for adjusting Λ k imply the existence of δ > 0, with δ 6 δ 2

such that for sufficiently large k,

Λ _k 6 δ ⇒ x _k+1 = x _k + h _k and Λ _k+1 = K ₂ k h _k k (20) where K ₂ > 1.

(20) implies that if Λ _k 6 δ, then Λ _k+1 > Λ _k . Hence the sequence of local bounds must be bounded away from zero,

Λ k > δ 3 > 0 for k = 0, 1, 2, 3, . . . (21) Therefore an infinite number of proposals (x _k + h _k ) are accepted by the algorithm, because otherwise we would have Λ _k → 0 for k → ∞ (using the fact that the bound is decreased linearly when a point is not accepted (see (9))). Furthermore, when a proposal (x _k +h _k ) is accepted and k is sufficiently large, then (8), Lemma 2 and (21) imply

F (x _k ) − F (x _k + h _k ) > δ ˜ ₁ [ − ∆S _k (h _k )] > δ ˜ ₁ ε ₂ > 0 .

Since the sequence of function values F(x _k ) is non-increasing we obtain

F(x k ) → −∞ for k → ∞ . This is a contradiction since x k → x and

Convergence of Hybrid Space Mapping Algorithms 11 F is continuous at x. Thus this assumption is wrong: x is a stationary

point.

The following theorem extends the result of Lemma 3.

THEOREM 1. Let S be the set of stationary points of (1), S = { x ∈ IR ⁿ | 0 ∈ ∂F (x) } . Let d(x _k , S) be the distance between x _k and S . Then

d(x _k , S ) → 0 for k → ∞

Proof. Suppose d(x _k , S) 9 0. Then infinitely many points x _k must be bounded away from S , and hence Assumption A4 implies that the sequence { x k } must have a cluster point x which is not stationary.

According to Lemma 3 { x _k } cannot converge to x. Thus, for all small ε > 0 infinitely many iterates x _k must have a distance less than ε from x and infinitely many must have a distance larger than 2ε from x. Let ε > 0 be chosen smaller than δ ₁ /2, δ ₁ being the bound used in Lemma 1 and Lemma 2. Then we shall prove that if k x _k − x k < ε and k x k+p − x k > 2ε we have

F (x _k ) − F (x _k+p ) > δ > 0 , (22) δ being independent of k and p. Since (22) holds for infinitely many values of k, and since the sequence { F (x k ) } is non-increasing, we obtain that F (x _k ) → −∞ for k → ∞ which contradicts that F is continuous at x and { x _k } converges to x. Therefore the result follows as a consequence of (22).

Equation (22) is proved by the following argument: Consider F (x k ) − F (x k+p ) =

k+p X − 1 j=k

[F (x j ) − F(x j+1 )] , (23) for k > k, with ˜ ˜ k as in Lemma 1. The terms of the sum are non-negative. Suppose that x _j+1 6 = x _j , i.e. the increment h _j is accepted.

Suppose further that if k x _j − x k 6 2ε then we can use the Lemmas 1 and 2. We obtain from (8) and these lemmas,

F (x _j ) − F (x _j+1 ) > ˜ δ ₁ [ − ∆S _j (h _j )]

>

 



δ ˜ ₁ ε ₁ Λ _j if Λ _j 6 δ ₂ δ ˜ 1 ε 2 otherwise

(24) Equation (22) now follows from (23) using (24): Let A k be the index set corresponding to the terms in (23) with x _j 6 = x _j+1 . If, for all of these terms, we have Λ _j 6 δ ₂ then

X j ∈A k

[F(x _j ) − F (x _j+1 )] > δ ˜ ₁ ε ₁ ^X

j ∈A k

Λ _j > ˜ δ ₁ ε ₁ ^X

j ∈A k

k h _j k (25)

12 Madsen & Søndergaard

The last sum exceeds ε since h j = x j+1 − x j for j ∈ A k , x j = x j+1 for j / ∈ A k , and since k x _k+p − x _k k > ε. Thus the sum in (23) exceeds ε when Λ _j 6 δ ₂ for all j ∈ A k . If the last condition is not fulfilled then there exists j ∈ A k with

F (x _j ) − F (x _j+1 ) > ˜ δ ₁ ε ₂

so in that case the sum exceeds a positive number which is independent of k and t. Thus we have proved (22) with

δ = min { δ ˜ 1 ε 1 ε, δ ˜ 1 ε 2 }

This completes the proof of Theorem 1.

4. Conclusion

We have considered the problem of minimizing functions of the type H ◦ f , where f : IR ⁿ 7→ IR ^m is smooth and H : IR ^m 7→ IR is convex. It is proved that the hybrid space mapping algorithm described in (Bakr et al., 2001) has guaranteed global convergence to the set of stationary points of H ◦ f . The proof covers a class of hybrid algorithms, including those presented in (Bakr et al., 2001) and (Pedersen, 2001).

References

M.H. Bakr, J.W. Bandler, K. Madsen, J. Søndergaard, Review of the Space Map-ping Approach to Engineering Optimization and Modelling, Optimization and Engineering, vol. 1, pp. 241–276, 2000.

M.H. Bakr, J.W. Bandler, K. Madsen and J. Søndergaard, An introduction to the space mapping technique, Optimization and Engineering, vol. 2, pp. 369–384, 2001.

J.W. Bandler, R.M. Biernacki, S.H. Chen, P.A. Grobelny and R.H. Hemmers, Space mapping technique for electromagnetic optimization, IEEE Trans. Microwave Theory Tech., vol. 42, pp. 2536–2544, 1994.

C.G. Broyden, A Class of Methods for Solving Non-Linear Simultaneous Equations, Math. Comp., vol. 19, pp. 577–593, 1965.

F.H. Clarke, Generalized Gradients and Applications, Trans. Am. Maths. Society, vol. 205, pp. 247–262, 1975.

F.H. Clarke, Optimization and Nonsmooth Analysis, Canadian Mathematical Society Series of Monographs and Advanced Texts, John Wiley & Sons pp. 1–308, 1983.

R. Fletcher, A Model Algorithm for Composite Nondifferentiable Optimization

Problems, in Nondifferentiable and Variational Techniques in Optimization,

D.C. Sorensen, R.J.-B. Wets, eds., Mathematical Programming Study 17,

North-Holland, Amsterdam, 1982.

Convergence of Hybrid Space Mapping Algorithms 13 J. Hald, K. Madsen, Combined LP and Quasi-Newton Methods for Non-Linear L1

Optimization, SIAM J.Num.Anal. 22, pp. 369–384 1985.

K. Madsen, An Algorithm for Minimax Solution of Overdetermined Systems of Non-linear Equations, J. Inst. Math. Appl. 16 pp. 321–328, 1975.

K. Madsen, Minimization of Non-Linear Approximation Functions, Dr. Techn. The-sis, Institute for Numerical AnalyThe-sis, Technical University of Denmark, pp. 1–141, 1985.

F.Ø. Pedersen, Advances on the Space Mapping Optimization Method, Master The-sis, IMM-THESIS-2001–35, Informatics and Mathematical Modelling, Technical University of Denmark, 2001.

L.N. Vicente, Space mapping: Models, Sensitivities, and Trust-regions Methods, to appear in Optimization and Engineering, 2003.

R.S. Womersley, R. Fletcher, An Algorithm for Composite Nonsmooth Optimization

Problems, J. Opt. Theo. Applns., 48, 493–523, 1986.

Conclusion

The aim of this study hasbeen to provide an overview of the eld of

surro-gateoptimization andtoexaminetheoreticalpropertiesofthespacemapping

technique.This chapter summarizes the conclusionsofthestudy.

Chapter 2 concerns a literature overview of the eld of surrogate modelling

andoptimization.Thepresentation dividestheeldintotwoparts,the

meth-odsforgeneratingsurrogatemodelsandmethodsforconducting optimization

usingsurrogatemodels.Thesurrogatemodels areagaindividedinto two

cat-egories,thefunctional modelsand the physical models.Where thefunctional

modelsaregenericmathematicalmodelswhichcanbeconstructedwithoutany

particular knowledgeof the underlying physical system. Thephysical models

are systemspecic models. Surrogates based on physical models are usually

constructedbymanipulatinga cheapermodelof thesamephysical systemas

the expensive model in question, so that the manipulated cheap model

ap-proximates the behaviour of the expensive model. The chapter also presents

fouralgorithms foroptimization using surrogatemodels.Two ofthese canbe

proved convergent.

The space mapping technique is one such method for constructing and

opti-mizing a surrogate model based on a cheap physical model. Here we usethe

name coarse model to denote the cheap model, and the name ne model to

rogateisthecoarse modelcomposedwithaparametermapping,theso-called

spacemapping,connectingsimilarresponsesofthecoarseandthenemodel.

Thespace mapping technique isthe focal point of thesucceedingchapters of

thethesis.

Chapter 3 provides an introduction and motivation for the space mapping

technique.Thebasicprinciplesofthespacemappingtechnique arepresented.

Itis shownhowthespace mapping technique can be combined withclassical

optimization strategiesina hybrid method.The hybrid method isillustrated

by two test problems,and the space mapping surrogate isshownempirically

tobeavalidapproximationinalargerareathanacorrespondinglinearTaylor

model. The space mapping technique is by example shown to be an ecient

pre-processingtechniquefordicultengineeringoptimizationproblems.Ifthe

solution accuracyisnot sucient,the technique can be combined withother

methods of optimization.

Chapter 4 concerns theoretical aspects of the space mapping technique. The

chapter presentstheoretical resultswhichcharacterize thespacemapping

un-der some ideal conditions. It is shown that if these conditions are met, the

solutionsprovided bytheoriginal spacemapping technique areminimizersof

thenemodel.However,inpracticewecannotexpectthattheidealconditions

are met, so the space mapping technique should be combined with classical

optimization methods in order to be convergent. The theoretical results are

motivated and illustrated by numerical examples. Deciencies of the usual

spacemappingdenitionarediscussedand fouralternativedenitions are

re-viewed. The two space mapping denitions relying on respectively gradient

information and multiple points areidentied to bethe most promising.But

further theoreticalinvestigations areneededinorderto arriveat a morerm

conclusion.

Chapter4 also discuses the approximation abilitiesof thecoarse model

com-posedwiththe space mapping.Anumerical exampleconrmsthetheoretical

results, that the mapped coarse model, with a Taylor approximation to the

space mapping, hasa lowerapproximation error for long steps, compared to

aTaylormodelofthene model. For short steps,however, theTaylormodel

ofthene modelisbest, due to exact interpolationat themodelorigin. Itis

alsoshownhowa responsecorrection mayenhancethemapped coarsemodel

approximation, withoutcompromising thesmall approximationerror onlong

steps.Withthe responsecorrection, themapped coarsemodelapproximation

hasthesame interpolationpropertyasthe Taylormodelof thenemodel.

In document An Introduction to the Space Mapping Technique (Sider 165-200)

Convergence of Hybrid Space Mapping Algorithms

Kaj Madsen (km@imm.dtu.dk) and Jacob Søndergaard (js@imm.dtu.dk)

Informatics and Mathematical Modelling, Technical University of Denmark

We prove that the global convergence property of the classical technique is also maintained: The combined method provides convergence to the set of stationary points of H ◦ f.

Keywords: space mapping, global convergence

1. Introduction

The problem to be solved is the following:

x∈IR min n H (f (x)) (1)

where f : IR n 7→ IR m is a smooth function, often m n. H : IR m 7→ IR is a convex function. It may be a norm in IR m , typically the L 1 , L 2 or the L ∞ norm. The following minimax function,

H (y) ≡ max

16i6m { y i }

where y = (y 1 , y 2 , ..., y m ) T , is also often used, e.g., in electromagnetic

design, which has been a major application area for the space mapping

technique. Thus it is important to cover the case where H is

non-differentiable.

2 Madsen & Søndergaard

& Madsen, 1985). For general convex H convergence of the Taylor based trust region algorithm was proved in (Madsen, 1985). Related results may be found in (Fletcher, 1982) and (Womersley & Fletcher, 1986).

Having this mapping, most of the model evaluations can be directed to the fast coarse model.

A review of the Space Mapping approach is given in (Bakr et al., 2000).

We give a description of the combined method in Section 2. The convergence is proved in Section 3.

2. Description of the Algorithms

The two algorithms which are combined are both iterative. In the descriptions below the current iterate is x k ∈ IR n . H ◦ f is denoted by F .

p(x) ∈ arg min

x∈IR ˜ n k c(˜ x) − f (x) k (2)

where k · k is a norm in IR n , usually the L 2 norm. The tentative step h k from x k is based on the following approximation to p(x k + h):

p k (h) = B k h + p(x k ) , (3)

The SMA finds h k as a solution to h k ∈ arg min

h H(c(p k (h))) ,

where the minimization is usually subject to a trust region.

In the Taylor-based method for minimizing f the tentative step from x k is based on the following 1. order approximation to f(x k + h),

f k (h) = D k h + f (x k ) .

In the present paper D k = f 0 (x k ) T . Otherwise a Broyden update or a difference approximation to f 0 (x k ) T have been used.

The tentative step h k is found as a solution to h k ∈ arg min

h H(f k (h)) , (4)

where the minimization is usually subject to a trust region.

In the combined algorithm (SMTA) for minimizing f the tentative step h k from x k is based on a convex combination of c ◦ p k and f k :

s w k (h) = w c(p k (h)) + (1 − w) f k (h) where 0 6 w 6 1 is a transition parameter.

The method finds the tentative step h k as a solution to h k ∈ arg min

h H(s w k k (h)) ,

where the minimization is usually subject to a trust region.

In the algorithm w 0 = 1 and w k is non-increasing function of k. The principle being that if c ◦ p k is doing badly in approximating f then w k is decreased, and thereby the algorithm gradually switches to using the Taylor model f k of f .

2.1. Details of the SMTA The trust region at x k is the set

T k = { x ∈ IR n | k x − x k k 6 Λ k } (5)

where k · k is a suitable norm in IR n and Λ k > 0.

4 Madsen & Søndergaard

At the current iterate x k , the tentative step h k ∈ IR n is a solution to

h k ∈ arg min

h H (s w k k (h))

s.t. x k + h ∈ T k (6)

The quality of a given tentative step h k is measured by

∆S k (h k ) ≡ H(s w k k (h k )) − H(f(x k )) .

If the predicted decrease is non-positive,

− ∆S k (h k ) 6 0 , (7)

then the step is rejected. Otherwise the step is accepted if F decreases sufficiently:

F (x k ) − F (x k + h k ) > δ ˜ 1 [ − ∆S k (h k )] (8) where 0 < ˜ δ 1 < 1.

In each iteration the local bound Λ k and the transition parameter w k are adjusted as follows:

If (7) is true then Λ k+1 = Λ k , otherwise Λ k+1 depends on the ratio between actual and the predicted decrease. If

F (x k ) − F (x k + h k ) 6 δ ˜ 2 [ − ∆S k (h k )] , (9)

˜ δ 1 < δ ˜ 2 < 1, is satisfied then Λ k+1 = K 1 Λ k with 0 < K 1 < 1.

If

F (x k ) − F (x k + h k ) > δ ˜ 3 [ − ∆S k (h k )] , (10)

˜ δ 2 < δ ˜ 3 < 1, is satisfied then Λ k+1 = K 2 Λ k with K 2 > 1.

If none of the conditions (9) or (10) are satisfied then we let Λ k+1 = Λ k . The transition parameter w k is chosen as follows: Initially w k = 1. If x k +h k is not accepted then we wish to move weight towards the Taylor model, and therefore we let

w k+1 = K 3 w k min { Λ k+1 , 1 } , (11)

Convergence of Hybrid Space Mapping Algorithms 5 where 0 6 K 3 < 1. Otherwise w k+1 = w k .

In order to ensure convergence, however, we need w k → 0 for k → ∞ . Therefore we also apply (11) if it has not been used for the previous n iterations.

2.2. Summary of the SMTA Given Λ 0 , B 0 = I, w 0 = 1, k = 0

0. Find x 0 as a solution to min x ˜ H (c(˜ x)) 1. Evaluate f(x 0 )

2. Find p(x 0 ) by solving (2) 3. Find h k by solving (6) 4. Evaluate f(x k + h k )

5. Find p(x k + h k ) by solving (2)

6. If (7) is false and (8) is true then let x k+1 = x k + h k otherwise x k+1 = x k

7. Update Λ, w, B and D (only if w < 1) 8. Let k = k + 1

9. If not converged then goto 3

The steps 0, 1 and 2 are an initial phase where a (local) opti-mizer of H (c(x)) is found and the initial translation p(x 0 ) in the linear approximation to the space mapping (3) is found.

Note that if (7) is true after step 3, then the algorithm can skip to step 8, letting x k+1 , Λ k+1 , B k+1 and D k+1 take the values from iteration k and updating w k using (11). Hence we avoid the evaluation of f(x k + h k ) and f 0 (x k + h k ) in this case.

3. Proof of Convergence

We show that the SMTA satisfies the usual convergence condition for trust region methods. In the proof we do not need the actual updating scheme of the weights { w k } , (11), we only need property (12) below.

Throughout this section we use the notations { x k } , { h k } , { Λ k } and { w k } as they are defined in Subsection 2.1. It follows from the definition of the weights that they satisfy

w k = min { Λ k+1 , 1 } o(1) (12)

x∈IR min ⁿ H (f (x)) (1)

where f : IR ⁿ 7→ IR ^m is a smooth function, often m n. H : IR ^m 7→ IR is a convex function. It may be a norm in IR ^m , typically the L ₁ , L ₂ or the L _∞ norm. The following minimax function,

16i6m { y _i }

where y = (y 1 , y 2 , ..., y m ) ^T , is also often used, e.g., in electromagnetic

The two algorithms which are combined are both iterative. In the descriptions below the current iterate is x _k ∈ IR ⁿ . H ◦ f is denoted by F .

x∈IR ˜ ⁿ k c(˜ x) − f (x) k (2)

where k · k is a norm in IR ⁿ , usually the L ₂ norm. The tentative step h _k from x _k is based on the following approximation to p(x _k + h):

The SMA finds h _k as a solution to h _k ∈ arg min

h H(c(p _k (h))) ,

In the Taylor-based method for minimizing f the tentative step from x _k is based on the following 1. order approximation to f(x _k + h),

In the present paper D _k = f ⁰ (x _k ) ^T . Otherwise a Broyden update or a difference approximation to f ⁰ (x _k ) ^T have been used.

The tentative step h _k is found as a solution to h _k ∈ arg min

h H(f _k (h)) , (4)

In the combined algorithm (SMTA) for minimizing f the tentative step h _k from x _k is based on a convex combination of c ◦ p _k and f _k :

s ^w _k (h) = w c(p _k (h)) + (1 − w) f _k (h) where 0 6 w 6 1 is a transition parameter.

The method finds the tentative step h _k as a solution to h _k ∈ arg min

h H(s ^w _k ^k (h)) ,

In the algorithm w ₀ = 1 and w _k is non-increasing function of k. The principle being that if c ◦ p _k is doing badly in approximating f then w k is decreased, and thereby the algorithm gradually switches to using the Taylor model f _k of f .

2.1. Details of the SMTA The trust region at x _k is the set

T _k = { x ∈ IR ⁿ | k x − x _k k 6 Λ _k } (5)

where k · k is a suitable norm in IR ⁿ and Λ k > 0.

At the current iterate x _k , the tentative step h _k ∈ IR ⁿ is a solution to

h _k ∈ arg min

h H (s ^w _k ^k (h))

s.t. x _k + h ∈ T _k (6)

The quality of a given tentative step h _k is measured by

∆S k (h k ) ≡ H(s ^w _k ^k (h k )) − H(f(x k )) .

− ∆S _k (h _k ) 6 0 , (7)

F (x _k ) − F (x _k + h _k ) > δ ˜ 1 [ − ∆S _k (h _k )] (8) where 0 < ˜ δ 1 < 1.

In each iteration the local bound Λ _k and the transition parameter w k are adjusted as follows:

If (7) is true then Λ _k+1 = Λ _k , otherwise Λ _k+1 depends on the ratio between actual and the predicted decrease. If

F (x _k ) − F (x _k + h _k ) 6 δ ˜ ₂ [ − ∆S _k (h _k )] , (9)

˜ δ ₁ < δ ˜ ₂ < 1, is satisfied then Λ _k+1 = K ₁ Λ _k with 0 < K ₁ < 1.

F (x _k ) − F (x _k + h _k ) > δ ˜ ₃ [ − ∆S _k (h _k )] , (10)

˜ δ ₂ < δ ˜ ₃ < 1, is satisfied then Λ _k+1 = K ₂ Λ _k with K ₂ > 1.

If none of the conditions (9) or (10) are satisfied then we let Λ _k+1 = Λ _k . The transition parameter w _k is chosen as follows: Initially w _k = 1. If x _k +h _k is not accepted then we wish to move weight towards the Taylor model, and therefore we let

Convergence of Hybrid Space Mapping Algorithms 5 where 0 6 K 3 < 1. Otherwise w _k+1 = w _k .

In order to ensure convergence, however, we need w _k → 0 for k → ∞ . Therefore we also apply (11) if it has not been used for the previous n iterations.

2.2. Summary of the SMTA Given Λ ₀ , B ₀ = I, w ₀ = 1, k = 0

0. Find x ₀ as a solution to min _x _˜ H (c(˜ x)) 1. Evaluate f(x 0 )

2. Find p(x ₀ ) by solving (2) 3. Find h _k by solving (6) 4. Evaluate f(x _k + h _k )

5. Find p(x _k + h _k ) by solving (2)

6. If (7) is false and (8) is true then let x _k+1 = x _k + h _k otherwise x _k+1 = x _k

The steps 0, 1 and 2 are an initial phase where a (local) opti-mizer of H (c(x)) is found and the initial translation p(x ₀ ) in the linear approximation to the space mapping (3) is found.

Note that if (7) is true after step 3, then the algorithm can skip to step 8, letting x _k+1 , Λ _k+1 , B _k+1 and D _k+1 take the values from iteration k and updating w _k using (11). Hence we avoid the evaluation of f(x k + h k ) and f ⁰ (x k + h k ) in this case.

We show that the SMTA satisfies the usual convergence condition for trust region methods. In the proof we do not need the actual updating scheme of the weights { w _k } , (11), we only need property (12) below.

Throughout this section we use the notations { x _k } , { h _k } , { Λ _k } and { w _k } as they are defined in Subsection 2.1. It follows from the definition of the weights that they satisfy

A1: For each point z ∈ IR ⁿ there exist a neighbourhood N (z) ⊆ IR ⁿ such that

f(x + h) = f(x) + f ⁰ (x)h + o( k h k ) , x ∈ N (z) , h ∈ IR ⁿ , where o is uniform for x ∈ N (z).

f ⁰ is continuous on IR ⁿ .

A3: H is a convex function on IR ^m . A4: { x k } stays in a bounded region in IR ⁿ . 3.1. Prerequisites

DEFINITION 1. The generalized gradient of H at x, denoted ∂H (x), is the convex hull of the set of all limits of the form lim H ⁰ (x + e _i ), where H is differentiable at x + e _i and e _i → 0 as i → ∞ ,

e i → 0 H ⁰ (x + e _i ) } .

Since H is convex the generalized gradient coincides with what is called the subdifferential in convex analysis. The convexity of H also implies the existence of a directional derivative H _e ⁰ (y) for any y, e ∈ IR ^m , e 6 = 0,

H _e ⁰ (y) ≡ lim

PROPOSITION 1. Let x ∈ IR ⁿ . Then 0 ∈ ∂F (x)

F _h ⁰ (x) > 0 for every direction h ∈ IR ⁿ , h 6 = 0 .

DEFINITION 3. Let x, h ∈ IR ⁿ . Define

∆F (x; h) ≡ H(f(x) + f ⁰ (x)h) − H(f (x)) . We shall use the following two properties of ∆F : PROPOSITION 2. For x, h ∈ IR ⁿ we have

∆F (x; th) = tF _h ⁰ (x) + o(t) , for t > 0 . Proof. (Madsen, 1985), Proposition 2.9.

PROPOSITION 3. For x, h ∈ IR ⁿ and 0 6 t 6 1 we have

LEMMA 1. Let x ∈ IR ⁿ be a non-stationary point. Then there exist positive numbers δ ₁ , δ ₂ and ε ₁ such that for x _k ∈ IR ⁿ

k x _k − x k 6 δ 1 and Λ _k 6 δ 2

Proof. Since x is a non-stationary point there exist, by Proposition 1, a direction h such that F _h ⁰ (x) < 0. Then it follows from Proposition 2 that there exist a point x + d, d = th, t > 0, such that ∆F(x; d) < 0.

Define d k by x _k + d _k = x + d .

If x _k → x then d _k → d. Therefore it follows from the uniform continuity of f, f ⁰ and H that there exists a neighbourhood N (x) of x and a number δ such that

∆F(x _k ; d _k ) 6 − δ < 0 (13)

when x _k ∈ N (x). Define δ ₁ > 0 and δ ₂ > 0 such that if k x _k − x k 6 δ ₁ then x _k ∈ N (x) and k d _k k > δ ₂ .

Let h ^t _k ∈ IR ⁿ be a solution to (4) subject to the trust region:

h ^t _k ∈ arg min

h H (f _k (h)) s.t. x k + h ∈ T k

(14) Suppose Λ _k 6 δ 2 . Let t _k = Λ _k / k d _k k and q _k = t _k d _k . Then x _k +q _k ∈ T _k and since H(f _k (h)) = ∆F (x _k ; h) + F (x _k ) we can use the definition of h ^t _k and Proposition 3 to obtain

∆F(x k ; h ^t _k ) 6 ∆F (x k ; q k ) 6 t k ∆F(x k ; d k ) . (15) Since { d _k } is bounded away from 0 this implies the existence of ε > 0, independent of k, such that

∆F(x _k ; h ^t _k ) 6 − εΛ _k (16)

| ∆F (x k ; h ^t _k ) | 6 K k h ^t _k k and thus (16) implies

k h ^t _k k > ε

K Λ _k . (17)

Now let u _k = k h ^t _k k / k d _k k . The arguments showing (15) imply

∆F(x _k ; h ^t _k ) 6 u _k ∆F (x _k ; d _k ) .

H (f k (h ^t _k )) 6 − u k δ + H(f (x k )) . (18)