Calculation of the Corrective Step - The First Order Corrective Step

3.2 The First Order Corrective Step

3.2.2 Calculation of the Corrective Step

We can now calculate the corrective step based on the reduced system J and f. We now formulate a problem, whose solution is the corrective step v.

min

v⁼β

2v^Tv s.t. fx h^? Jx hv βe (3.29) where f ^$ IR^t are those components of f that are active according to definition in (3.21) or (3.22) and linearly independent. βis a scalar, e ^$ IR^t is a vector of ones and v ^$ IRⁿ. The equation says that the corrective step v should be as short as possible and the active linearized functions should be equal at the new iterate. This last demand is natural because such a condition is expected to hold at the solution x⁶ . The corrective step v is minimized by using the 2 norm, to keep it as close to the basic step h as possible. By using the 2

norm, the solution to (3.29), can be found by solving a simple system of linear equations.

First we reformulate (3.29) minˆv

2ˆIˆv ^TˆIˆv s.t. f^¡J ⁰ e^¢ ˆv 0 (3.30) where f and J are short for fx h and Jx h. Furthermore we have

ˆv

β ^I ˆI

I 0

0 0 ^I I^$ IRⁿ^~ ⁿ (3.31)

By using the theory of Lagrangian multipliers L ^ˆvλ^- 1

2 ˆIˆv ^TˆIˆv^£ f ^¡J ⁰ e^¢ ˆv^¤ ^Tλ (3.32) and by using the first order Kuhn-Tucker conditions for optimality we have that the gradient of the Lagrangian should be zero

L⁴ ^ˆvλ^# ˆIˆν^¡J ⁰ e^¢ ^Tλ 0 (3.33) Further, because of the equality constraint: f ^¡J ⁰ e^¢ ˆv ^Tλ 0, is seen to be satisfied when

¡ J ⁰ e^¢ ˆv⁰ f (3.34)

To find a solution that satisfy both Kuhn-Tucker conditions, we use the following system of equations

H ˆI ˆA

Aˆ^T 0 ^I

ˆv λ ^I

0 f ^I ˆA^¦¥ J^T

0 e^T ^§ (3.35)

where ˆA^$ IR^rⁿ ¹^s~ ^t. By solving (3.35) we get the corrective step v. For a good introduction and description of the Kuhn-Tucker conditions, the reader is referred to [MNT01, Chapter 2].

In (3.29) we used the Jacobian J evaluated at x_t, however it was suggested in [JM94], that we instead could use J evaluated at x. As Figure 3.8 clearly indicate, this will give a more crude corrective step.

−2 −1 0 1 2

−1

−0.5 0 0.5 1 1.5 2

h x

f₁=f₂ tangent h FOC J(x) FOC J(x+h)

Figure 3.8: Various steps h of different length. The squares denote the corrective step v based on J^'x h⁽. The circles denote v based on J^'x⁽.

The figure shows that v based on Jx is not as good as v based on Jx_t. The latter gives a result much closer to the intersection between the two inner functions of the parabola test function. This goes in line with [JM94] stating that it would be advantageous to use (3.29) when the problem is highly non-linear and otherwise calculate v based on Jx and save the gradient evaluation at x_t.

It is interesting to notice that Figure 3.8 shows that the the step v based on Jx is perpendic-ular to h. We will investigate that in more detail in the following. The explanation assumes that the corrective step is based on Jx .

The length of the basic step h affects the corrective step v as seen in figure 3.8. This is because the length of v is proportional to the difference in the active inner functions f of F.

This is illustrated in figure 3.7. If the distance between the active inner functions are large, then the length of the corrective step will also be large and vice versa.

From the figure it is clear, that the difference in the function values of the active inner functions is proportional to the distance between x_t and ˜x_t. So the further apart the values of the active inner functions is, the further the corrective step becomes, and vice versa.

It turns out that the corrective step is only perpendicular to the basic step in certain cases. So the situation shown in figure 3.8 is a special case. The special case arises when the active set in the nonlinear problem is the same as in the linear problem. That is A ALP. We could also say that the basic step h has to be a tangent to the nonlinear kink where the active functions meet. An illustration of this is given in figure 3.9.

PSfrag replacements x

x_t h

v A B

PSfrag replacements x

x_t h

v A B

PSfrag replacements x

x_t h

v A B

Figure 3.9: The dashed lines illustrates the kink in: A, the LP landscape at x and B, the linearized landscape at x_t. Left: Only when the basic step is a tangent to the nonlinear kink, we have that h is perpendicular to v. Middle: Shows that h is not always perpendicular to v. Right: If J^'x_t⁽ is used to calculate v then the two kinks are not always parallel.

The figure shows two dashed lines, where A illustrates the kink in the LP landscape at x and B illustrates the kink in the linearized landscape at x_t. It is seen that the dashed line B moves because of the change in the active inner functions at x_t. This affects v, because it will always go to that place in the linearized landscape where the active functions at xt

becomes equal. Therefore the corrective step always goes from A to B.

When using Jx to calculate v, the dashed lines will be parallel, and hence v will be or-thogonal to A and B because it is the shortest distance between them. In general, however, it is always the case that v will be orthogonal to the dashed line B as illustrated in figure 3.9 right.

Figure 3.9 left, illustrates the case where x is situated at A, and where Jx is used to calculate v. In this case h will be perpendicular to v.

The same figure middle, shows a situation where x is not situated at A, hence h and v are not orthogonal to each other.

Figure 3.9 right, illustrates a situation where the Jacobian evaluated at xt is used. In this case, the dashed lines are not parallel to each other. Still, however, v is perpendicular to B, because that is the shortest distance from x_t to B.

The definition of the corrective step in (3.29) says that the corrective step should go to a place in the linearized landscape at x_t based on either Jx or Jx_t, where the active functions are equal. Also it says that we should keep the corrective step as short as possible.

We notice that if the active setALPis of size t 1, then the corrective step v will be equal to the null vector.

lengtht^# 1 then v 0 (3.36)

This is because the calculation of the corrective step has an equality constraint that says that all active functions should be equal. In the case where there is only one active function, every v^$ IRⁿ would satisfy the equality constraint. The cost function, however, demands that the length of v should be as short as possible, and when v ^$ IRⁿ then the null vector would be the corrective step with the shortest length. Hence v 0.

As stated in (3.29) we should only use the functions that are active in Fx to calculate the corrective step. The pseudo code for the corrective step is presented in the following.

Function 3.2.1 Corrective Step

v : corr step(fJi) 1

begin

Reduce i by e.g. QR factorization 2

if length(i) ^A 1, v := 0;

else

find ^oˆvλ^q^T by solving (3.35) returnv;

end

1 J is either evaluated at x or x_t. i is the index to the active functions of the LP problem in (3.2)

2 The active set is reduced so that the gradients of the active inner functions evaluated at x or xt are linearly independent.

In document Minimax Optimization without second order information (Sider 40-43)