Design of Optimal Inputs - Filtering the State

Filtering the State

6.3 Design of Optimal Inputs

where J() is to be minimized with respect to . Classical functions correspond to L- and D-optimality, see (Walter & Pronzato, 1990).

where

@ ⁼^,G^,₂¹(q)(@G1(q)

@ ⁾ut; (6.44)

@~t

@ ⁼^,G^,₂¹(q)(@G2(q)

@ ⁾t: (6.45)

After substituting these expressions in (6.42) and assuming no feedback in the system (^fut^gis independent of^ft^g), we get:

MF = 1

t=1(@t

@⁾⁽@t

@⁾^T⁺Mc; (6.46)

where Mc does not depend upon the choice of input signal.

Now make the following simplifying assumptions:

1) The experiment time (i.e. N) is large.

2) The input ^fut^gis restricted to the class admitting a spectral represen-tation with spectral distribution function F(!); ![^,;].

3) The allowable input power is constrained.

SinceNis large it is more convenient to work with the average information matrix, which gives the following:

MF = lim_N

NM1 ^F ⁼ 1

0( ~MF(!) + Mc)d(!); (6.47) where (!) is dened by

d(!) =

( 12dF(!) !=0;!=

dF(!) !]0;[ and

M~F(!) = Re^f1

^[@G1(e^j!)

@ ^]G^,₂¹(e^j!)

G^,₂¹(e^,^j!)[@G1(e^,^j!)

@ ^]^T^g ^(6.48)

and

Mc(!) = 1 2

,[@G2(e^j!)

@ ^]G^,₂¹(e^j!)

G^,₂¹(e^,^j!)[@G2(e^,^j!)

@ ^]^Td!

+ 1 2²⁽@

@⁾⁽@

@⁾^T: (6.49)

It should be remarked that both ~MF(!) and Mc(!) are dependent upon the parameters . This means that if a local design is used then only M~F(!) need to be considered. On the other hand if a Bayesian design like (6.38) is used then both ~MF(!) and Mc(!) must ve evaluated.

The power restriction of the input signal can be formulated as Pu= 1

0 d(!) =1 : (6.50)

We are now ready to give the following theorem, which states that it is always possible to nd an optimal input comprising a nite number of sinusoids.

Theorem 6.1 For any power constrained design1(!) with correspond-ing nn average information matrix MF(1), there always exists a power constrained design 2(!) which is piecewise constant with at most n(n+1)=2+1 discontinuities and MF(2) = MF(1). For the design criterionJ=^,logdet MF, optimal designs exist comprising not more thann(n+1)=2 sinusoids.

Proof: Only an outline of the proof is given. It can be shown that the set of all average information matrices corresponding to input power

constrained designs is the convex hull of the set of all average informa-tion matrices corresponding to single frequency designs. Hence, from Caratheodory's theorem (see (Fedorov, 1972)) the result for the rst part of the theorem follows. For a convex function of the informa-tion matrix the optimal design is a boundary point of the convex hull, therefore one less sinusoidal component is needed. For the complete proofs see (Fedorov, 1972; Goodwin & Payne, 1977).

6.3.1 Bayesian Approach

We now turn to the Bayesian approach, considering the generalized cri-terion (6.38). Assume that the prior knowledge of the system is given in terms of a prior distribution of the parameters. Hence we are able to eval-uate the expectation of the considered criterion with respect to the prior distribution of the parameters cf. (6.38), in stead of simply evaluating the criterion at some xed values of the parameters. In general the resulting design will be dierent for the two approaches, cf. the example following.

In the Bayesian case it is possible to prove a theorem similar to Theorem 6.1.

Theorem 6.2 Using J = E(^,logdet M) as criterion, optimal designs exist comprising not more thann(n+1)=2 sinusoidal components.

Proof: The criterion may be written as

J=^Z

^,logdet M((!);))p()d ; (6.51) where ^Rⁿis assumed to be a closed and bounded interval of the parameters andp()is the prior probability density of the parameters.

The mean-value theorem states, that for all(!)there exists a²

such that

J=K(^,logdet M((!);)p()) (6.52) for some constant K which is independent of (!) and. It is seen that (6.52) has the form of the criterion considered in the previous Theorem. The dierence is that nowdepends upon(!). But since (6.47) is still valid, we conclude that the set of all average informa-tion matrices is the convex hull of all average informainforma-tion matrices corresponding to single frequency designs, and the proof follows im-mediately.

Since in (6.52)depends upon (!) this is a more complicated optimiza-tion problem than the previously considered. In a practical applicaoptimiza-tion, though, it is not necessary to actually nd , one would use the result of the theorem and apply it to the criterion (6.51) directly. From the proof it is readily seen that in the general case (6.38) it is also possible to nd an optimal design comprising a nite number of sinusoids, as long as is a continuous convex function of the information matrix.

The criteria discussed so far have all been based on ML estimation of the parameters. If instead a MAP estimator is used to estimate the unknown parameters the criteria for optimality must be changed accordingly. The following theorem establishes a relation between the posterior covariance of the parameters and Fisher's information matrix when a MAP estimator is used.

Theorem 6.3 Assume that the prior knowledge about the model param-eters is embodied in a Gaussian distribution with covariance matrix

pre. Also assume that a MAP estimator is used to estimate the

un-known parameters based on sampled observations for a model brought into the regression form

yt='_Tt+t

where ^ft^g is a sequence of Gaussian random variables with known covariance, uncorrelated with ^f't^g. Then the posterior covariance matrix post is given by

post =^,_pre¹ +^MF =^,_pre¹ +N^MF (6.53)

MF is Fisher's information matrix,^MF the average information ma-trix, andN the length of experiment.

Proof: see (Sadegh et al., 1994).

In the following, the concept of information is related to Lindley's measure of average information. In this way, we are able to formulate design criteria also based on MAP estimators. First some denitions are needed.

Definition 6.1 The entropy of a random variable X having probability density functionp(X) is dened as

Hx=^,EX[logp(X)]: (6.54)

Definition 6.2 Lindley's measure of the average amount of information provided by an experiment with data y and parameters is dened as

J() =H^,Ey[H^jy]: (6.55)

Now the relation between Fisher's information matrix and Lindley's mea-sure of average information can be established via the following theorem (Sadegh et al., 1994).

Theorem 6.4 With the same assumptions as in Theorem 6.3, maxi-mizing Lindley's measure of the average amount of information,J(), is equivalent to solving the optimization problem

minJ

J=^,E[logdet^f^,_pre¹ +^MF^g] (6.56)

Proof: The estimation is regarded as a means by which further

in-formation about the system parameters is provided. Since MAP is the mode of the posterior distribution, the maximum amount of informa-tion with respect to Lindley's measure is obtained by MAP. Since the prior distribution of the parameters and the distribution of observa-tions given is Gaussian, the posterior distribution of the parameters is also Gaussian. Denote the posterior mean and covariance by and

post. From (6.53) we have

post =^,_pre¹ +^MF

From Denition 6.2

J() =^,E^flogp()^g+EyE^jy^flogp(^jy)^g (6.57)

=^,E^flogp()^g+EyE^jy^f,n

2 ^log(2)^g

,EyE^jy^f1

2^logdetpost+1

2⁽^,)^T^,_post¹ (^,)^g (6.58) As all the other terms obviously are constants, we only focus on the last two terms

EyE^jy^f1

2^logdetpost^g

=EEy^j^f1

2^logdetpost^g (6.59)

=E^f1

2^logdetpost^g (6.60)

=^,E^f1

2^logdet^,_post¹ ^g (6.61)

The other term can be written as EyE^jy^f1

2⁽^,)^T^,_post¹ (^,)^g

=EyE^jy^f1

2^trace^,^post¹ ⁽^,)(^,)^T^g (6.62)

=Ey^f1

2^trace^,^post¹ E^jy[(^,)(^,)^T]^g (6.63)

=Ey^fn 2^g⁼ n

2 ^(6.64)

where n is the number of parameters. Now using (6.53) establishes the theorem.

An approximation of the mean value in (6.56) is optained by setting the parameters equal to their prior mean values. This approximation, which corresponds to a local design, simplies the computations considerably.

Thus depending on the estimation method, ML or MAP, and the choice of local or average criterion the following criteria would be of interest:

J1=^,[logdet( MF)]=E^f^g

J2=^,[logdet(NMF+ ^,_pre¹)]=E^f^g

J3=^,E[logdet( MF)]

J4=^,E[logdet(NMF+ ^,_pre¹)]

(6.65) These criteria demonstrate dierent levels of including partial prior infor-mation about parameter values. Optimization with respect to J1 results in designs which are strongly dependent upon the prior information. The dependence is even more pronounced in J2. It may therefore be wise to

perform a sensitivity analysis, i.e. determine the sensitivity of the design to changes in the parameters, when using these criteria. An alternative is to choose an average criterion like J3 and J4. Table 6.1 summarizes the relation between the choice of estimators and the optimization criterion, expressed in both a local and average form. Applications with these criteria

ML MAP Local design J1 J2 Bayesian/average J3 J4

Table 6.1. Summary of the optimality criteria are found in e.g. (Melgaard et al., 1993; Sadegh et al., 1994).

In document IMM LYNGBY Ph.D.THESISNO. ByHenrikMelgaard Identi (Sider 139-147)