Gaussian approximation validation

4.2 Gaussian case

4.2.3 Gaussian approximation validation

In this section a strategy to validate the Gaussian approximation from Theorem 4.2 is devised and applied to two base-case examples depicted on Figure 4.1 namely, MPC5

and MPC1. For those cases the MPC estimates are far from the boarder of their domain and the Gaussian approximation should adequate to infer their statistical distributions.

For each Monte Carlo simulation consider the estimated MPC value and denote MPCM C ∈ R^m×1 the vector of all the MPC estimates from all m Monte Carlo simulations. From the histogram, it is straightforward to infer and compute µM C

as its mean and σM C = p

var(MPCM C) as its standard deviation, where var is variance operator. Both terms are computed as sample means, also called the first two cumulants of the distribution. Considering the Gaussian assumption, both quantities are the only information needed to characterize the stochastic distribution of the considered MPC. Next let MPCM C be the normalized MPCM C, such that

MPCM C= (MPCM C−µM C)/σM C . (4.6)

Based on the Monte Carlo independence assumption and its expected Gaussian properties, the vector MPCM C should yield a histogram of the standard Gaussian distributionN(0,1). Now, consider the variance computation using the aforementioned perturbation theory. Denote σP T ∈ R^m×1 the vector of all standard deviations computed as in Theorem 4.2, where each component ofσP T is the proposed standard deviation estimate σP T ,j based solely on the j-th data set. Then, for j= 1. . . m, define

MPCP T ,j= (MPCM C,j−µM C)/σP T ,j (4.7)

as MPC estimate normalized by parameters computed with the perturbation theory.

Based on the Gaussian assumption, MPCP T ,j should be a realization of a standard normal distributionN(0,1). Since all MPCs are computed on independent data sets, the collection of all MPCP T ,j, namely MPCP T ∈R^m×1, should yield a histogram of the Gaussian distribution. Such histogram computed for respective MPC1and MPC5, along with the CDF of MPCM C and N(0,1), are presented on Figure 4.2 and on

4.2 Gaussian case 43

Figure 4.3. As expected, the plots illustrate that entries of MPCP T and MPCM C

followN(0,1) closer for MPC5 than for MPC1, however the Gaussian approximation of MPC1 is still visually acceptable.

-4 -3 -2 -1 0 1

MPC₁ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

MPC_MCCDF Standard Normal CDF

-5 -4 -3 -2 -1 0 1 2

MPC₁ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

MPC_PTCDF Standard Normal CDF

Figure 4.2: CDF of MPCM C (left) and MPCP T (right) computed for MPC1.

-3 -2 -1 0 1 2

MPC₅ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

MPC_MCCDF Standard Normal CDF

-4 -3 -2 -1 0 1 2 3

MPC₅ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

MPC_PTCDF Standard Normal CDF

Figure 4.3: CDF of MPCM C (left) and MPCP T (right) computed for MPC5.

The statistical uncertainties are quantified by confidence intervals thus a scheme to compare the approximated and theoretical confidence intervals is devised. For that define the theoretical two-sided normal cumulative confidence interval (CCI) function asft,cci = 2(ft,cdf −0.5), where ft,cdf(t) is the function for the standard normal cumulative distribution i.e the integral of the density from minus infinity to t, andfP T ,cci is the similarly defined cumulative function for computing two-sided confidence interval corresponding to MPCP T. Functionft,cci is purely theoretical, whereasfP T ,cci is derived empirically from the histogram of MPCP T. A comparison of bothft,cciandfP T ,cci for the base-case MPC1 and MPC5 is illustrated on Figure 4.4. As expected, both functions coincide much more for MPC5 than for MPC1.

0 2 4 6 8 10 12 14 16 18 20 MPC₁

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

Theoretical normal cumulative CI distribution, f^t,cci First order perturbation theory cumulative CI distribution, f^PT,cci

0 2 4 6 8 10 12 14 16 18 20

MPC₅ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

Theoretical normal cumulative CI distribution, f^t,cci First order perturbation theory cumulative CI distribution, f^PT,cci

Figure 4.4: ft,cciandf_{P T ,cci}computed for MPC1 (left) and MPC5 (right).

In practice, the proposed framework is applied to assess uncertainty about one MPC estimate, computed solely on a single data set. Until now, this section showed a comparison between the MC histogram and the perturbation-based histogram.

Both are mixing all information from all the available simulations. By a proper normalization, it was possible to compare them to the standard normal distribution, which reveals whether or not entries of MPCP T and MPCM C are Gaussian and illustrates the dispersion in all the estimated parameters.

A procedure that quantifies the errors in the Gaussian approximation when using just a single data set is now proposed. For each simulationj, assume that the computed standard deviation σP T ,j is a correct estimate of the desiredσM C. Then, define a properly normalized vector MPC^jP T as the collection of normalized MPCM C,ksuch that MPC^j_{P T ,k} = (MPCM C,k−µM C)/σP T ,j. Under the Gaussian approximation, assuming independence and correct variance estimation, the histogram derived from MPC^jP T should be close to the histogram of the standard Gaussian distribution. Such closeness can be calculated by a classical Pearson Goodness of Fit test, aχ²statistics computed between the theoreticalN(0,1) and the distribution corresponding to each simulation. The test itself is defined as

P_χ2=

b_n

i=1

(Oi−Ei)² Ei

, (4.8)

whereOiare observations of MPC^j_{P T} within each i-th interval,Eiare counts corre-sponding to a theoreticalN(0,1) distribution andbndenotes a number of intervals used. As such, a median, best and worst quantiles of the Pearson statistics can be derived from the approximate histogram of its distribution. Best and worst cases are defined as the 2.5% and 97.5% quantiles of thePχ² distribution.

The CDFs of the best, median and worst quantiles among all MPC^k_{P T} are plotted in the left parts of Figure 4.5 and Figure 4.6. Distributions ofP_χ2 for both MPC5

and MPC1 are displayed in the right parts of Figure 4.5 and Figure 4.6. Notice that the first quantile measures the best possible outcome; there are 2.5% among all the realizations, which are equal or better than the plotted best quantile plot. The last quantile similarly measures the worst possible outcome; there are 2.5% among all the realizations, which are equal or worse than the plottedworst quantile plot. The median measures the most central outcome.

4.2 Gaussian case 45

-5 -4 -3 -2 -1 0 1 2

MPC₁ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

median fit CDF 0.975 quantile fit CDF 0.025 quantile fit CDF Standard Normal CDF

0 100 200 300 400 500 600 700 800

Pearson ²statistics MPC₁ 0

0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018

pdf

errors median 0.975 quantile 0.025 quantile

2fit mean errors

2nDOF

Figure 4.5: Best, median and worst Gaussian fits for MPC1 (left). Histogram of Pearson χ² statistics with corresponding cases of Gaussian fits to MPC1 (right).

-4 -3 -2 -1 0 1 2 3

MPC₅ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Probability

median fit CDF 0.975 quantile fit CDF 0.025 quantile fit CDF Standard Normal CDF

0 10 20 30 40 50 60 70

Pearson ²statistics MPC₅ 0

0.02 0.04 0.06 0.08 0.1 0.12

pdf

errors median 0.975 quantile 0.025 quantile

2fit mean errors

2nDOF

Figure 4.6: Best, median and worst Gaussian fits for MPC5 (left). Histogram of Pearson χ² statistics with corresponding cases of Gaussian fits to MPC5 (right).

Figure 4.5 and Figure 4.6 show the performance of the Gaussian approximation to fit to a standard Gaussian law for MPC1and MPC5.

For MPC1the results are dispersive; inaccurate when looking at the worst quantile but satisfying for the median and the best quantiles. It means that the Gaussian approximation for MPC1should be adequate, on average, for a small set of experiments.

The same plots for MPC5 show a total equivalence to the Gaussian approximation even for the worst quantile, truly showing the Gaussianity of this mode.

To conclude this section median, best and worst Gaussian fits to the base-case MPCs are illustrated on Figure 4.7.

0.97 0.9720.9740.9760.978 0.980.9820.984 0.986 MPC₅

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

probability

MPC5 0.025 quantile fit 0.975 quantile fit median fit

0.99984 0.99988 0.99992 0.99996 1

MPC₃ 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

probability

MPC3 0.025 quantile fit 0.975 quantile fit median fit

0.998 0.9984 0.9988 0.9992 0.9996 1

MPC₁ 0

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

probability

MPC1 0.025 quantile fit 0.975 quantile fit median fit

Figure 4.7: Gaussian fits to empirical probability distribution of base case MPCs based on median, 0.95 and 0.025 quantiles of Pearsonχ²statistics.

It appears that the Gaussian approximation is good for MPC5 (left) and adequate for MPC1(right), whereas being inexact for MPC3 (center), which is expected from the inspection of the histograms. This is also expected not to be able to approximate a parameter at the border of its domain by a Gaussian law. Notice that the variance of different fits for MPC1 is still a good indicator of the dispersion of its Monte Carlo estimates.

The distributions of MPC5 and MPC1 can be approximated with a Gaussian reasonably well, however a better approximation scheme, with deeper theoretical insight than the usual classical Gaussian approximation is needed to accurately characterize MPCs very close to 1, such as MPC3. Before focusing on a proper approximation for MPC3, the behavior of MPC1 as the number of samples increases is investigated.

4.2.4 Influence of sample length on distribution of MPC: a Gaussian

In document Aalborg Universitet Vibration-based monitoring of structures algorithms for fault detection and uncertainty quantification of modal indicators Gres, Szymon (Sider 64-68)