Criteria for Optimality - Filtering the State

Filtering the State

6.2 Criteria for Optimality

6.1.2 Nonlinear Models

Testing for structurally identiability in a nonlinear model is slightly more complicated. One method consist of linearizing the model around some equilibrium point and then apply the method for linear models. A second approach uses a series expansion of the output either in the time domain or in the time and input domain (Walter & Pronzato, 1990). M() =M() then implies that the coecients of the series should be equal, which, as in the linear case, yields a set of equations inparameterized by.

where ^MF is theFisher information matrix, dened by MF =EY^j

(@logp(^Y^j)

@ ⁾⁽@logp(^Yj)

@ ⁾^T

; (6.30)

Y is a vector of all observations, is the unknown parameter vector, and p(^Yj) is the conditional probability density of^Y for given . When the estimator is asymptotically ecient, the rationale for using the Fisher infor-mation matrix as a suitable characterization of the asymptotic parameter uncertainty is obtained.

It should be remarked that an experiment satisfying det^M_F() ⁶= 0 is called informative, this ensures local identiability for the model parame-ters. Designing an experiment by minimizing a suitable criterion

J() =(^M_F(;)) (6.31)

can thus be seen as maximizing a measure of identiability, where is a scalar function and is the design.

6.2.1 Local Design

A number of dierent standard measures of information has been studied.

For a review of the properties of the dierent measures of information see (Pazman, 1986; Walter & Pronzato, 1990). A large number of these stan-dard measures are particular cases of the generalLk-class of the optimality criteria. The criterion belonging to this class is dened by a function of the form

(^M_F) =

( [n^,¹tr(^VM^,_F¹^V^T)^k]^1=k if det^M_F ⁶=0

1 if det^MF =0 ^(6.32)

where k > 0 and ^V is a nonsingular nn matrix. The local optimal design refers to the case where the criterion is minimized for a given value of the parameters, say. This value is often chosen as the expected mean =E() calculated from the prior distribution of the parameters p().

D-optimality

The most studied criterion is theD-optimalitycriterion, which is dened by

(^MF()) =^,logdet^MF(): (6.33)

The criterion is obtained for^V=^I andk^!0in (6.32). Thus this design minimizes the generalized variance of the parameter estimates. The crite-rion has a geometrical interpretation: The asymptotic condence regions for the maximum likelihood estimate of are ellipsoids, and a D-optimal experiment thus minimizes the volume of these ellipsoids. An important

...

....................................................................................................................................................................................................

...

....

...

....

...

....

...

. .. .. .. .. .. .

.. .. .. .. .. ..

.. .. .. .. .. .. .

.. .. .. .. .. .. ..

.. .. .. .. .. .. .

.. .. .. .. .. .. ..

.. .. .. .. .. .. .. .

.. .. .. .. .. .. ..

.. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .

.. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .

.. .. .. .. .. .. ..

.. .. .. .. .. .. .. .

.. .. .. .. .. .. ..

.. .. .. .. .. .. .

.. .. .. .. .. .. ..

.. .. .. .. .. .. .

.. .. .. .. .. ..

.. .. .. .. .. .

....

...

....

...

....

...

....

...

Figure 6.1. D-optimal design minimizes the volume of the condence ellipsoid.

property of D-optimal experiments is their independence of the scaling of

parameters, due to the geometrical property of the determinant.

L-optimality

Criteria that are linear in the inverse Fisher information matrix (= disper-sion matrix of the best linear estimator) are obtained from (6.32) by setting k =1. The particular choices ^V=^I and^V= diag(^,_i¹;i =1;;n) re-spectively, correspond to A- and C-optimality. A-optimal experiments minimize the sum of variances of. C-optimal experiments are related to the relative precision of the estimates, actually they summarize1=t_2i, where ti is thet-statistic of theith parameter. This criterion is also independent of the parameter scale. The A- and C-optimalitycriteria does not take

...

.......................................................................................................................................................................................................

Figure 6.2. C-optimal design minimizes the relative precision of the single parameters, without paying attention to the correlation struc-ture.

the correlation between the parameters into account. From a pragmatic point of view such criteria should be seen with suspicion because they can lead to design of experiments in which the parameters are unidentiable, det^MF=0, see (Goodwin & Payne, 1977).

E-optimality

The E-optimalitycriterion is obtained from (6.32) by setting ^V=^I and k^!¹, this corresponds to minimizing the maximum eigenvalue of^M^,_F¹,

(^MF()) =max(^MF()^,¹) (6.34)

Geometrically this can be understood as minimizing the maximum diame-ter of the asymptotic condence ellipsoids for the paramediame-ters, because the semi-axes of the ellipsoids are directed as the eigenvectors of ^M^,_F¹ with lengths proportional to the eigenvalues of the matrix. In other words, an E-optimal design aims at improving the most uncertain region of the pa-rameter space and making the condence region as spherical as possible.

By using ^V= diag(^,_i¹;i=1;;n), the criterion is independent of the

...

....................................................................................................................................................................................................

...............................................

........

.......

........

.......

........

.......

...

... ...

a b

Figure 6.3. E-optimal design minimizes the maximum eigenvalue of the inverted information matrix, thus making the condence region of the parameters as spherical as possible.

scaling of parameters. The geometrical interpretation is still valid, but for thenormalizedinverted information matrix.

6.2.2 Physical Measures

Other criteria for optimality than the standard ones can be specied.

Specically when dealing with physical models nonstandard criteria may be of interest, since the physical parameters of greatest interest might not be directly entering into the system description but rather through some transformation of these parameters. Assume that the parameters of inter-est are described by the functional relation^f(), whereis the parameters of the model. It is important to take this transformation into account in the design of experiments. The asymptotic covariance of^fis calculated by using Gauss' formula

M ,1 F;f = @^f

@^T^M^,^F¹⁽)@^f

@ : ^(6.35)

Any of the standard measures of information can now be applied for^M_F;f. In Melgaard, Madsen, & Holst (1992b) a physical model of the thermal characteristics of a building is considered. The classical measures of in-formation all lead to the same optimal design of input signal, a certain PRBS sequence. It turned out, however, that the main interest is not fo-cused on the individual parameters of the model, but on the overall heat transmittance and internal heat capacity of the building. These physical characteristics are given as a function of the model parameters. An optimal design considering this application specic measure of information leads to an input signal, which is a step. This design turns out to be much dierent from the original design, (more weight on low frequencies), but it reects the demands of the building physicists.

6.2.3 Bayesian Design

In general the designs of the previous sections are dependent upon the unknown parameter values. Any design will and shall be dependent upon the prior information available about the system to be identied. From a statistical point of view it is obvious, that one way of incorporating this prior information is to use a Bayesian procedure. In this approach the prior knowledge of the parameters are expressed via their the prior distribution function, which expresses the partial information on the system available prior to an experiment.

In the Bayesian approach a loss functionL(;^{^}) is dened, which describes the consequences of obtaining ^, whenis the true parameter vector. This function is then used as a basis for obtaining the optimal experiment design.

Prior to the experiment the expected performance is

J1=E;Y[L(;^{^})]: (6.36)

The criterion may be optimized directly with respect to the allowable ex-perimental conditions. By making suitable assumptions and using trun-cated Taylor expansion J1 can be simplied to:

J1=E;Y[L(;^{^})] =EEY^j[L(;^{^})]

'E[L(;) +1

2^tr^f⁽@²L=@^{^}²)^M^,_F¹^g]; (6.37) where^M_Fis given by (6.30) and it is assumed that ^is an ecient unbiased estimator. Optimizing this criterion is thus equivalent to optimizing a criterion of the form E(tr^fWM^,_F¹^g), where ^W = @²L=@^{^}² is a weight matrix, see (Goodwin & Payne, 1977). Generalizing this criterion yields

J2=E((^MF(;))); (6.38)

where expectation is taken over the prior distribution of, andis suitable scalar function. In generalcan be of the form considered in the previous sections for local designs. As pointed out by Walter & Pronzato (1990) there are several average optimality criteria (6.38) corresponding to one local design (6.31). This is due to the fact that integration isnota com-mutative operator with nonlinear functions, e.g. inversion and logarithm.

This implies that considering the maximization of = det^MF leads to a dierent design than considering the minimization of = (det^M_F)^,¹ in the criterion (6.38). For a local design these functions result in the same design.

6.2.4 Minimax Design

The Bayesian design presented in the previous section is a design criterion based on the expected prior performance. The criterion takes into account the whole prior distribution of the parameters. This approach is preferred compared to the classical case of local design, where the prior distribu-tion, and specically the uncertainty, of the parameters is not considered.

However optimization of this criterion does not give any guard against a single non-informative experiment, because the average performance does not highlight the worst case performance. One might prefer to maximize the worst possible performance of the designed experiment. Such mini-maxapproach relies on the knowledge of some prior admissible domain of the parameters², without requesting any prior information about. Minimax criteria can be deduced from local optimality criteria by

J() = max

2(^MF(;)); (6.39)

where J() is to be minimized with respect to . Classical functions correspond to L- and D-optimality, see (Walter & Pronzato, 1990).

In document IMM LYNGBY Ph.D.THESISNO. ByHenrikMelgaard Identi (Sider 131-139)