• Ingen resultater fundet

Principal component analysis

Figure 4.10: Comparing the mean residuals of normal and faulty test examples

4.2.3 Covariance structure

In theMFICAalgorithm the structure of the covariance is assumed to be either isotropic (same noise variance for alldsamples), or diagonal (independent noise variance for alldsamples). While the later gived−1 additional parameters it also allows for modeling that the noise variance is not constant through out a cycle, but varying as a function of angular position. Moreover, as Figure 4.10 indicate the obtained residuals with both noise models vary considerably with the angular position. The diagonal noise variance leads directly to that some of the peaks in the residual becomes higher (also seen inFigure 4.10), and further this happens with both the normal and faulty test examples from experiment 1.

InFigure 4.11we see that the covariance structure has influence on the ability to detect faults, the separation of normal and faulty examples is better with the diagonal than both with the isotropic noise and for the PCA.

4.3 Principal component analysis

ThePCAis a simple (at least when compared toICA) and yet powerful method that finds the orthogonal directions in the observations with the highest

vari-58 Condition modeling

Figure 4.11: ComparingNLLvalues obtained with isotropic and diagonal noise covari-ance. The example numbers are arbitrary. The diagonal covariance allows for better performance as there is fewer of the normal examples that exceed the threshold.

ance. If a singular value decomposition (SVD) is applied to the observed data, i.e., X = U DV>, then the mixing matrix A = U and the source matrix S =DV> in Equation 4.2. The directions found in observed data with PCA are orthogonal/uncorrelated, which is a harsher but numerically simpler con-straint than the linear independence required by ICA. Additionally, the PCA mixing matrix in it full size is a unitary matrix, such that A−1 = A>, for the economically sized SVD the pseudo inverse inherit that property so that:

A =A>. This makes the PCA models easily applicable to new examples as snew =A>xnew. When only a subset of the principal directions is used, the noiseν in Equation 4.2is modeled by the discarded subspace.

The further difference between independent, orthogonal and uncorrelated is out-linedsection 4.1and some implications are encountered for the constrained PCA insubsection 4.3.1.

There is no likelihood model connected to the SVD algorithm, however since

4.3 Principal component analysis 59

the PCA is based on ranked subspaces, the discarded subspaces will span the residual, and a likelihood model of that can easily be obtained. The likelihood model given inHansen and Larsen[1996] is adapted here. The noise is modeled by a white Gaussian noise source with variance equal to the sum of the variance in the remaining subspaces.

4.3.1 Positive PCA

Since the observed data is already positive, it could be interesting to try varia-tions of PCA where the elements of the matrices are assumed non-negative. If we can specifyPCA as a (constrained) optimization problem we could add the non-negativity constraint to obtain a special case of the PCAalgorithm. First we need to analyzePCAto find those normal constraints.

The principal components are found along those pair wise orthogonal directions (columns inW) that explains most variance in the observed dataX. If we want to pursue that as an optimization problem, then

Wpca= argWmax{W XX>W>}, s.t.W>W =I (4.16) Without additional constraintsWpca is equal to the eigenvectors of the sample covariance matrixXX>.

In positive PCA we also require that all elements in W are non-negative, and this is problematic! Imagine some points in a three-dimensional space (x, y, z).

Having selected the first positive principal direction as (1/2,0,3/4), now the second positive principal has to be (0,1,0) since non-zero elements in x or z dimension would break the orthogonallity.

In general the non-negativity along with the orthogonallity imply that only one principal direction can have energy in a feature, thus only one element in each row ofW can be is non-zero. Therefore, while it is possible to specify and find a set of positive principal components, the combination of the two constraints is indeed problematic. The orthogonallity constraint prevents the principal com-ponents from sharing dimensions, and prevent reasonable situations where the variance of one feature is connected to two other non-connected features can-not be modeled. A positive PCA algorithm was implemented in Matlab using the constrained minimizerfminconwith the given constraints but showed very small usability due to the above issues.

The non-negative matrix factorization (NMF) due toLee and Seung[1999] esti-mates something that is similar to “positive”PCA, but it is not positivePCAas

60 Condition modeling

such since it does not obey the orthogonallity constraint. It has been much more successful than its “correct cousin” and achieved much attention. GivenX, the NMF estimates W LH = X where the sizes of the three matrices are equal to PCA, and the columns of W and rows of H have unit length and the ele-ments are non-negative - however the columns are not orthogonal. Besides from not having an associated likelihood function,Donoho and Stodden[2003] have shown that NMF has problems when some parts are repeated throughout all examples, which is problematic with the highly repetitive engine cycles. They referred to problems with the repeated torso in matchstick swimmer figures.

Hansen et al. [2005] states the repetition makes the components non-unique.

The data that Lee and Seung [1999] initially presented NMF with was hand-written digits 0-9. Moreover, a subdivision of the parts in the digits 0-9 does not result in one part that is always on, especially not with handwritten digits.

InHøjen-Sørensen et al. [2002] the parts of handwritten examples of the digit 3 is extracted using the MFICA algorithm with same mixing matrix constraint and source prior as used here with the diesel engine signals.

4.4 Information maximization independent