• Ingen resultater fundet

Principal component analysis

N/A
N/A
Info
Hent
Protected

Academic year: 2022

Del "Principal component analysis"

Copied!
41
0
0

Indlæser.... (se fuldtekst nu)

Hele teksten

(1)

Principal component analysis

Course 27411

Biological dataanalysis and chemometrics

Jens C. Frisvad

Department of Systems Biology Building 221

Technical University of Denmark 2800 Kgs. Lyngby – Denmark E-mail: jcf@bio.dtu.dk Thanks to Lars Nørgaard,

CAMO, Michael Edberg and Harald Martens for some of the figures

Principal component analysis

OUTLINE

Some matrix algebra

What is PCA?

Goals of PCA

Projection

Q and R analysis

Matrix notation

Double example (continous variables and a series of single variables)

Connection to PCR and PLSR

(2)

Scalars, Vectors and Matrices

Scalar: variable described by a single number (magnitude) Temperature = 20 °C

Density = 1 g.cm-3

Image intensity (pixel value) = 2546 a. u.

Vector: variable described by magnitude and direction

Matrix: rectangular array of scalars

Column vector

Row vector

Square (3 x 3) Rectangular (3 x 2) d i j : ith row, jth column 3

2

e n

v v v

4 7 6

1 4 5

3 2 1 A

8 3

7 2

4 1 C

33 32 31

23 22 21

13 12 11

d d d

d d d

d d d D

2 1 1 b

3 4 9

d

Vector Operations

Transpose operator

column → row row → column

2 1 1

b bT 1 1 2 d3 4 9

9 4 3 dT

4 7 6

1 4 5

3 2 1 A

4 1 3

7 4 2

6 5 1 AT

(3)

Vector Operations

Inner product = scalar

Length of a vector

i

i i

T xy x y x y xy

y y y x x

x

3

1 3 3 2 2 1 1 3 2 1 3 2

y 1

x

i

n

i i

n n

T xy

y y y x x

x

1 2 1

2

1 ...

y x

|| x || = (x12+ x22 )1/2

|| x || = (x12+ x22 + x32 )1/2

Inner product of a vector with itself = (vector length)2

xT x =x12+ x22 +x32 = (|| x ||)2

x2

x1

||x||

Matrix Operations

• Addition (matrix of same size) Commutative: A+B=B+A

Associative: (A+B)+C=A+(B+C)

3 3

3 3 1 1

1 1 2 2

2 B 2 A

(4)

Matrix Operations

• Multiplication

number of columns in first matrix = number of rows in second

– Associative: (A B) C = A (B C) – Distributive: A (B+C) = A B + A C – Not commutative: AB  BA !!!

(A B)T = BT AT

C = A B

(m x p) = (m x n) (n x p)

cij = inner product between ith row in A and jth column in B

2 x 3 3 x 2 2 x 2

32 122

14 50 3 6 2 5 1 4 9 6 8 5 7 4

3 3 2 2 1 1 9 3 8 2 7 1 3 9

2 8

1 7 6 5 4

3 2 B 1 A

Some Definitions …

Identity Matrix

Diagonal Matrix

Symmetric Matrix

I A = A I = A

1 0 0

0 1 0

0 0 1 I

7 0 0

0 5 0

0 0 3 D

0 1 3

(5)

Matrix Inverse

A-1 A = A A-1 = I

Properties

A-1 only exists if A is square (n x n)

If A-1 exists then A is non-singular (invertible) (A B) -1 = B-1 A-1; B-1 A-1 A B = B-1 B = I

(AT) -1 = (A-1)T; (A-1)T AT = (A A-1)T = I

I D

D

1 0 0

0 1 0

0 0 1

7 0 1 0

5 0 0 1

0 3 0 1

7 0 0

0 5 0

0 0 3

1

Samples as observations in a Multi/hyper-dimensional Space:

Objects are characterized by a profile of features.

Features are dimensions.

Objects are points in a multidimensional space.

Mathematical notation

Data as observations

(6)

PCA

Developed by Karl Pearson in 1901

Pearson, K. (1901) On lines and planes of closest fit to systems of points in space.

Philosophical Magazine (6) 2: 559-572.

Also called:

Singular value decomposition Karhunen-Loéve expansion Eigenvector analysis

Latent vector analysis

Characteristic vector analysis Hotelling transformation

Principal Component Analysis (PCA)

Projection method

Exploratory data analysis

Extract information and remove noise

Reduce dimensionality / Compression

Classification and clustering

(7)

Goals of PCA

• Simplification

• Data reduction

• Modeling

• Outlier detection

• Variable selection

• Classification (unsupervised)

• Unmixing (curve resolution)

PCA, which distance measure is used?

• PCA uses Euclidean distance!

(8)

PCA in a nutshell

• First...

... The illustrational/intuitive approach:

Projection of data

• Linear transformation

By proper rotation

Feature observations in 3D space Feature observations in reduced 2D space

(9)

Projection of data

By proper linear transformation

The PCA approach:

Rotation according to maximum variance in data.

By proper rotation

Feature observations in 3D space Feature observations in reduced 2D space

Projection of data

By proper linear transformation

By proper rotation

Feature observations in 3D space Feature observations in reduced 2D space

(10)

Principal Component Analysis

• For a given dataset:

x2

x1

Principal Component Analysis

• Calculate the centroid (=”mean in all directions”):

x2

(11)

Principal Component Analysis

• Subtract the mean:

x2

x1 x2

x1

Principal Component Analysis

• Take this as our new coordinate system:

x2

x

(12)

Principal Component Analysis

• Calculate the direction in which the variance is

maximal: x

2

x1 p1

Principal Component Analysis

And repeat this for the next orthonormal axis (direction with second most variance):

x2

x1 p1

p2

(13)

Principal Component Analysis

Leaving us with a rotated grid:

p1

p2

Principal Component Analysis

• Which we can rotate to a “normal” position:

p1 p2

(14)

Principal Component Analysis

• Showing us maximal variance:

p1 p2

Principal Component Analysis

• We can also use this to reduce the complexity of the data set:

p1 p2

(15)

Principal Component Analysis

• By eliminating a number of axis by projection of the points:

p1 p2

Principal Component Analysis

• In this example moving from two…:

p1 p2

(16)

Principal Component Analysis

• … to one dimensional data points:

In 3D

1st PC

2nd PC Projections

x2 x3

Var(PC1)

Var(PC2)

(17)

Principal Component Analysis

• Assumuming variation equals species diversity...:

x2

x1

Principal Component Analysis

• … the first PCA depicts this information…:

p1 p2

(18)

Scores and Loadings

Scores

Map of samples

Displays distribution of samples in the new space defined by the PC’s

Loadings

Map of variables

Shows how the original variables are related to the PC’s

Data as observations

There are often two matrices to compare

M M M M

x x x x

x x x x

x x x x

x x x x

4 43 42 41

3 33 32 31

2 23 22 21

1 13 12 11

For each sample

n = 1 n = 2 n = 3 n = 4

y1 = (y11,...,y1P)

P P P P

y y

y y

y y

y y

4 4 1

3 3 1

2 2 1

1

1 1 y3 = (y31,...,y3P)

y4 = (y41,...,y4P) y2 = (y21,...,y2P)

Covariates A priori

(19)

Data as observations

• For each sample

n = 1 n = 2 n = 3 n = 4

n = N

Covariates A priori

X Y

y1 = (y11,...,y1P)

y3 = (y31,...,y3P)

y4 = (y41,...,y4P)

yN = (yN1,...,yNP) y2 = (y21,...,y2P)

Data as observations

Extensions of PCA: PCR, PLS etc.

• For each sample

n = 1 n = 2 n = 3

Covariates A priori

X

Can we model Y

Y

OBJECTIVE:

y1 = (y11,...,y1P)

y3 = (y31,...,y3P) y2 = (y21,...,y2P)

(20)

PCA

• PCA is used for analysis of one datamatrix, in a hope to distinguish latent signals from noise in data

X

Nr. samples (n)

Nr. variables (p) 1

1

(i)

(k)

Principal component analysis

E '

P T

x

X  

Scores X

 

Loadings Errors Av. of all

Meas. data X

(21)

PCA is depending on feature size and scale

• Variables = features = characteristics = attributes = characters

• When in anyone variable the largest value is 8 times greater than the lowest value, use log transformation xlog1 = log (x+1)

• Often the data are centered: xik (new) = xik- xi

n x x

n

i

i

1

Mean (avg)

1 ) (

)

( 1

2

n x x x

Var

n

i i

Variance

Mean, variance and standard deviation

(22)

Autoscaling = standardization

• To avoid the problem of different scales the data are often centered and all values are divided by the standard deviation for each variable

X – Xavg

Xaut =

Xstd

Q versus R analysis

• Sometimes the object could as well be the variables, in that case the matrix X can be transposed (to X')

• Objects are rarely transformed regarding the variables

• Normalization is rarely done (all variables have sum 1 or 100%): one risks closure of the data

(23)

Objects and variables

In contingency tables, objects and variables may have equal status. Often correspondence analysis is used on such data

In some cases only the objects or only the variables are interesting

Example: Flourescence or other spectrophotometric data are occasionally used just for characterization of the objects. Those soft curves are often corrected f.ex. by Multiple Scatter Correction before PCA, but they are not always used in the final evaluation of the data

Spectral measurements on sugar samples from three different factories (B, E, F): 34 samples and 1023 variables (Lars Nørgaard example)

(24)

Raw data

Centering

Lars Nørgaard, Foss

Raw spectrum Mean spectrum 1. loading 2. loading Residual

(25)

Two dimensional score plot, PC1 vs. PC2

E

B

F

Lars Nørgaard, Foss

Two-dimensional score plot, PC1 vs. PC3

E

F

B

(26)

Chemical/physical measurements on sugar samples from 3 different factories (B, E, F); 34 samples, 10 variables

Lars Nørgaard, Foss

Raw data

Centering

(27)

Score plot

E

F B

Lars Nørgaard, Foss

Principal component analysis

Raw data (scaled) Mean spectrum 1. loading 2. loading Residual

(28)

Loading plot

Lars Nørgaard, Foss

Biplot

(29)

NIPALS algorithm

Non-linear Iterative PArtial Least Squares

1. Center or autoscale X

2. Choose a random column in X as a start guess on t 3. Solve X = tp' + E in regard to p; "p = X/t" (normalize p to

length one: p = p/|p|)

4. Solve X = tp' + E in regard to t: "t = X/p'' 5. Repeat step 3 & 4 until convergence 6. Orthogonalize X: Xnew = X - tp'

7. Start from 2. again with Xnew to extract another factor (scores and loadings)

Principal Component Analysis

The i’th principal component of X is the projection

The vector Y

X P

Y T

Y

1

X p Y T

i i

(30)

Principal Component Analysis

• New variables Yi that are linear combination of the original variables (xi):

Yi= ai1x1 + ai2x2+ … + aipxp ; i = 1..p

• The new variables Yi are derived in decreasing order of importance.

• They are called “principal components”.

Principal Component Analysis

The direction of p1 is given by the eigenvector λ1 corresponding to the largest eigenvalue of matrix C

The second vector that is orthogonal (uncorrelated) to the first is the one that has the second highest variance which comes to be the eignevector corresponding to the second eigenvalue

And so on …

(31)

Principal Component Analysis

• Let C = cov(X)

• The eigenvalues λi of C are found by solving the equation

det(C- λI) = 0

• Eigenvectors are columns (rows) of the matrix P

Example

Let us take two variables with covariance c>0

Solving the eigenvalue problem

  



1

cov 1

c X c C

0

1 det 1

det 



c

λ c

I

C

(32)

Principal Component Analysis

99 . 0 96 . 0

96 . 0 87 . 1

0 . 0 , 0 . 0 Σ μ

1,2  2.50,0.39

0.85 - 0.53 -

0.53 0.85 -

, 2

1 p p P

PCA

39 . 0 00 . 0

00 . 0 50 . 2

0 . 0 , 0 . 0 Σ μ

Principal Component Analysis

96 . 0 87 . 1

0 . 0 , 0 . μ 0

1,2  2.50,0.39

0.85 - 0.53 -

0.53 0.85 -

, 2

1 p p P

PCA

00 . 0 50 . 2

0 . 0 , 0 . μ 0

p1

p2

(33)

Principal Component Analysis

99 . 0 96 . 0

96 . 0 87 . 1

0 . 0 , 0 . 0 Σ μ

1,2  2.50,0.39

0.85 - 0.53 -

0.53 0.85 -

, 2

1 p p P

PCA

39 . 0 00 . 0

00 . 0 50 . 2

0 . 0 , 0 . 0 Σ μ

Principal Component Analysis

From the PCA we may extract the set of linear combinations that explains the most variation

And hereby condense and reduce the dimensionality of the feature space.

From the example before we see, that the linear combinations explain q

k m

m

1 1

(34)

Output

• Loadings The weights

• Scores

• Plots

Loadings plot Scores plot Biplot

Overview of PCA

Select objects (Sneath rule of thumb: use 24 objects per supposed class)

Select variables (in general use many)

Examine the raw data (plot and calculate averages and standard deviations)

Consider transformations of raw data

Perform the PCA

Use cross-validation to estimate the number of components A

Other tests for the number of components can also be used (broken stick etc.)

In general it is good to have more than 75 % of the variance in the data explained

Plot scores

Plot loadings (maybe also biplots)

Look for outliers (representatives probably of a new class (or more often an error) (leverage or high residual outliers)

(35)

Validation

• Cross-validation (will be explained in detail in lecture on PCR)

• Jack-knifing

• Bootstrapping

• PTP analysis & Monte Carlo simulations

Crossvalidation (number of PC’s ?)

Under fit Over fit

Predicted error on Y

Optimum

(36)

Crossvalidation (methods)

• “Leave one out” or “full validation” leaves one sample out each time

•Segmented (category) validation

•On replicates (on independent replicates), leaves one replicate group (one segment) out each time

•Mixtures, leaves one mixture group (one segment) out each time

•Remember that repeated experiments are dangerous to use in cross-validation (i.e. use real biological/chemical replicates!!)

3 & 8 out

(37)

3 & 7 out, 8 & 4 double)

(38)

Can we always use PCA?

(test not used in UNSCRAMBLER, but in NTSYS)

Bartlett‘s (1950) sphericity test (an approximate chi-squared test)

p R p n

chi p ln

6 ) 5 2 ) ( 1 2 (

) ( 2

2 



•ln R is the natural log of the determinant of the correlation matrix

•The p formula is the number of degrees of freedom associated with the chi- square test statistci

•p = the number of variables

•n = the number of observations

Bartlett‘s (1951a,b) test whether all of the roots except the first are equal

(in NTSYS under Eigenvectors)

) ln ) 1 ( ln 3)(

2 6

5 ( 2

2

2

p p n chi

p

i i

p

i

p 2 i

1 1

where

(39)

UNSCRAMBLER EXAMPLE

• Datafile: Fischerout, but add a first variable telling the grouping (1, 2, 3)

• Task: PCA, select objects (all), select variables (2-5: Iris petal & sepal width and length)

• Use autoscaling & cross-validation

(40)

Colouring the groups

• You may colour each class (if you know them!) Fx use PCA on each of the classes, and create a

group for each class, when you have these classes, you use the for sample grouping (right click on score plot) (you can use these classes for SIMCA later!)

• An alternative: use K-means clustering or fuzzy clustering to get the variable showing the

groups (classes)

(41)

Pros and Cons of PCA

Positives

Can deal with large datasets (both in objects and variables) There are no special assumptions on the data and PCA can be

applied on all data-sets

Negatives

Non-linear structure is hard to model with PCA

The meaning of the original variables may be difficult to assess directly on latent variables (but use the loading plot) or Varimax, factor analysis etc.

Still:

Non-linear PCA methods exist (kernel methods)

Sparseness of supervised projections can be introduced to emphasize important features

Exercises in PCA

• The classical Iris data set (150 Iris plants characterized by 4 variables): Fischerout (Excell file)

• The wine data set (178 wines characterized by

Referencer

RELATEREDE DOKUMENTER

11 isolates were chosen as representatives: SID-3-13, identified as Cryobacterium, SID-6A-39, iden- tified as Micrococcus, SID-1-21, identified as Methylobacterium, SID-1-5,

We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings.. We show

An unsupervised learning algorithm is defined as cognitive component analysis if the ensuing group structure is well-aligned with that resulting from human

Seventy-three samples of charred straw and charcoal from slag-pit furnaces on eleven different iron smelting sites in jutland have been radiocarbon dated in order to

For heterostructures with AlGaN thickness ≤ 5 nm (samples 1-3) Hall measurements showed resistance  on MOhm scale, thus, yielding very low charge carrier concentration n and electron

b) Correct wrong data, if any (in the excel file), and use PCA again. Does the score and loa- ding plot look significantly different now?.. c) Try PCA without standardization:

Concentration of (a) Si (full symbols) and Al (empty symbols) and (b) Ca (full symbols), Mg (empty symbols), and Fe (crossed symbols) determined with AAS for stone wool fiber

Principal geodesic analysis on symmetric spaces: statistics of diffusion