• Ingen resultater fundet

August18,2009 SummerSchoolonManifoldLearninginImageandSignalAnalysis,August17-21,2009,HvenOleWinther PCA,ICAandbeyond

N/A
N/A
Info
Hent
Protected

Academic year: 2022

Del "August18,2009 SummerSchoolonManifoldLearninginImageandSignalAnalysis,August17-21,2009,HvenOleWinther PCA,ICAandbeyond"

Copied!
36
0
0

Indlæser.... (se fuldtekst nu)

Hele teksten

(1)

PCA, ICA and beyond

Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven

Ole Winther

Technical University of Denmark (DTU) & University of Copenhagen (KU)

August 18, 2009

(2)

Motivation – multivariate data

Principal component analysis (PCA) classic

Model based approach – probabilistic PCA (pPCA)

Identifiability – independent component analysis (ICA)

InfoMax, smoothness and beyond.

(3)

Motivation – multivariate data

Data is often (but not always) represented as a matrix ofd featuresandN samples:

size(X) = [d N]

In statsd =p,N=nand data matrix transposedXXT

Collaborative filtering:

X=item-user matrix

Gene expression:

X=gene-tissue matrix

Text analysis:

(4)

Collaborative filtering

(5)

Collaborative filtering

Netflix - online movie rental (DVDs).

Collaborative filtering – predict user rating from past behavior of user.

Improve Netflix own system by 10% to win.

training.txt –R=108ratings, scale 1 to 5 ford =17.770 movies andN =480.189 users.

qualifying.txt – 2.817.131 movie-user pairs, (continuous) predictions submitted to Netflix returns a RMSE.

Rating matrixXmostly missing values, 98.5%.

(6)

Collaborative filtering

Collaborative filtering task

Relatively large data set - 108data points

Very heterogeneous - viewers and movies with few ratings

Ratings∈ {1,2,3,4,5}noisy (subjective use of scale, non-stationary,. . . )

Complex model needed to capture latent structure

Regularization!

(7)

Collaborative filtering

Netflix prize - some key performance numbers

Method RMSE % Improv.

Cinematch 0.9514 0%

“PCA” ∼0.89-0.92 ∼5-6%

Grand prize 0.8563 10%

RMSE = root mean squared error

Two teams (Ensemble and BellKor’s Pragmatic Chaos) are above 10%

but prize not handed out yet (of Aug 2009).

(8)

Gene Expression

DNA Micro-Arrayd ∼6000 andN ∼60 cancer tissues.

10 20 30 40 50 60

1000

2000

3000

4000

5000

6000

(9)

Gene Expression

Protein signalling network textbook – Sachs et. al. Science, 2005.

(10)

Gene Expression

Single cell flow cytometry measurements of 11 phosphorylated proteins and phospholipids.

Data was generated from a series of stimulatory cues and inhibitory interventions.

Observational data: 1755 general stimulatory conditions,

Experimental data∼80%not used in our approach.

Not “smallnlargep”!

(11)

Latent semantic analysis (LSA)

Bag of words representation – term-document matrix

(12)

Principal component analysis

Principal components (PCs): the orthogonal directions with most variance.

Empirical co-variance of (centered) data:

S= 1 NXXT

size(S) = [ d d ]

PCs: eigen-vectors ofS Suiiui

−2 −1 0 1 2

−2

−1.5

−1

−0.5 0 0.5 1 1.5 2

Plot axis√ λiui

(13)

Principal component analysis

2 4 6

40 50 60 70 80 90 100

−2 0 2

−2 0 2

−2 0 2

−2 0 2

Typical steps in PCA.

(14)

PCA – maximum variance formulation

Project data{xn}n=1,...,N onto directions{ui}i=1,...,M.

We find directions sequentially,u1first.

Meanof projected data: uT1¯xwith

x¯= 1 N

N

X

n=1

xn.

Varianceof projected data:

1 N

N

X

n=1

n

uT1(xnx)¯ o2

=uT1Su1.

(15)

PCA – maximum variance formulation

Maximize variance

uT1Su1 with respect tou1.

But we need a constraint to avoidu1→ ∞:

uT1Su11(1−uT1u1)

λ1Lagrange mulitplier.

Solution - eigenvalue problem Su11u1.

(16)

PCA – minimum error reconstruction

x2

x1

xn

xen

u1

Find best reconstructing orthonormal directions{ui}.

(17)

PCA – minimum error reconstruction

Orthonormal directions{ui}:uTi ujij.

Lower dimensional subspace:

x˜n=

M

X

i=1

αniui+

D

X

i=M+1

biui

Minimize J({ui}) = 1

N

N

X

n=1

||xn−˜xn||2=. . .=

D

X

i=M+1

uTi Sui =

D

X

i=M+1

λi.

(18)

PCA – minimum error reconstruction

Database ofN,d =28×28=784 pixel values

Mean and first four PCs

Mean

Reconstruction x˜n=

M

X

i=1

(xTnui)ui+

D

X

i=M+1

xTnuu)ui= ¯x+

M

X

i=1

h

(xnx)¯ Tuii ui

(19)

PCA – minimum error reconstruction

Where is the signal?

(a)

0 200 400 600

0 1 2 3

x 105

(b)

0 200 400 600

0 1 2 3x 106

Original

(20)

Singular Value Decomposition (SVD)

SVD– a simpler way to do PCA:[U,D,V] =SVD(X) X=UDVT

UandVared×d andN×N orthonormal matrices:

UTU=Id VTV=IN

Ddiagonald×Nofsingular values(≥0) sorted:

S= 1

NXTX= 1

NUDVTVDTUT = 1

NUDTDUT

Columns ofUare the eigenvectors ofS,the PCs:

Sui = Dii2

N ui and λi = Dii2 N .

Projection of all data onith PC:uTi X=uTi UDVT =DiivTi .

(21)

Singular Value Decomposition (SVD)

Project onto PCs: UM,d×M:

X˜MUTMX=UTMUDVT =DMVTM

DM,M×MandVM,N×M.

Covariance of projected data:

S˜ = 1

NX˜MX˜TM = 1

NDMVTMVMDM = 1 ND2M

Whitening XbM ≡√

ND−1M UTMX=

NVTMSb=IM .

Lossyprojection backtod-dim. space:

(22)

Continuous Latent Variable Models

Latent variablezis unobserved but can be learned from data.

Translations (2) and rotation (1) latent variables in the example.

Mapping(W,zn)→xnin generalnon-linear.

We often consider simplerlinearmodels:

xn =Wzn+n.

(23)

Continuous Latent Variable Models

Explain data by latent variables + noise:

x=Wz+ X=WZ+E

x,d dimensional data vector

W,d×M dimensional ‘mixing matrix’.

z,M dimensional latent variable or source vector.

,d dimensional noise or residual vector,

xin

Wim zmn

in σi2

i=1, . . . ,d

n=1, . . . ,N m=1, . . . ,M

(24)

Continuous Latent Variable Models

Linear latent variable model forcollaborative filtering

vn: M-dimensional “taste” vector of viewern.

um :M-dimensional “profile” vector moviem.

Latent variablehmn:

hmn=um·vn+mn N(mn|0, σ2)

σ2is the noise level.

LearnUandVtraining data and predictrm0n0um0·vn0

(25)

Probabilistic PCA

Let us try to understand PCA as a latent variable model:

x=Wz+

SVD gives a hint:

X = WZ+E=UDVT =UMDMVTM+E

W = UMDM and Z=VTM

(26)

Probabilistic PCA

Tipping and Bishop considered a specific assumption:

P(z) = Norm(z;0,I) P(;σ2) = Norm(;0, σ2I)

Under this modelxis Gaussian with x = Wz+=0

xxT = WzzTWT +T =WWT2I

Distribution of datum

P(x;W, σ2) =Norm(x;0,WWT2I)

(27)

Probabilistic PCA

Log likelihood forWandσ2is joint distribution of all data:

logL(θ;X) = X

n

logP(xn|W, σ2)

= −N 2

n

log det 2πΣ +Tr h

Σ−1Sio

Model covariance: Σ =WWT2I

Empirical covariance: S= N1XXT

The solution: theMPCs will the largest eigenvalues.

The remaining variance will be explainedσ2I.

(28)

Probabilistic PCA

Let us try to solve the cocktail party problem:

Recordings=Mixing×Speakers Or

x=Wz

Use pPCA to estimateWandz.

Ignore complications of room acoustics.

(29)

Probabilistic PCA

Stop sign! Non-uniqueness of solution!

Likelihood only depends uponWthroughΣ =WWT2I

RotateW:

WWUe

leave covariance unchanged

WWT =WUUe TWe =WeWeT .

This can also be seen directly from the model:

Wz = WUUe Tez=Weez z = UTez

The distribution is invariant

(30)

Independent component analysis (ICA)

Prior knowledge to the rescue!

Real signals are not Gaussian

Examplex=w1z1+w2z2

withz1andz2independent and heavy tailed.

We exploit this information by putting this into our model!

−40 −20 0 20 40

−10

−5 0 5 10 15

(31)

Independent component analysis (ICA)

Allow for more generalz-distribution

Still assume independent identically distribution (iid):

P(Z) =Y

mn

P(zmn)

Many choices possible:heavy tailed(positive kurtosis), uniform,discrete(think of wireless communication), positive(used to decompose spectra, images, etc.) and negative kurtosis.

Extension totemporal/spatial correlations(time series,

(32)

Independent component analysis (ICA)

Summary linear generative models x=Wz+.

Probabilistic PCA:

p(z,) =N(z;0,I)N(;0, σ2I)

Factor analysis

p(z,) =N(z;0,I)N(;0,diag(σ12, . . . , σD2))

Independent component analysis p(z,) =

M

Y

m=1

p(zm)p()

Encode a priori knowledge inp(zm), e.g. heavy tails.

(33)

Independent component analysis (ICA)

Bell and Sejnowski Algorithm aka InfoMax

Assumption square mixing and no noise x=Wz W: d×d

Likelihood - one sample p(x|W) =

Z

dzP(x|W,z)P(z) = Z

dzδ(x−Wz)P(z)

Make change of variablesy=Wzanddy=|W|dz:

p(x|W) = 1

|W|

Z

dyδ(xy)P(W−1y)

= 1

P(W−1x)

(34)

Independent component analysis (ICA)

Non-iid data – temporal/spatial correlations zmnzm0n0mm0Km,nn0

It is easy to prove that rotation ofz:Uzwill no longer leave statistics ofzunchanged,

if kernels are differentKm6=Km0 for different variables m6=m0!

Second statistics alone is therefore enough for identifiability!

Molgedey and Schuster algorithm (aka MAF) one example using second order statistics

Use of Gaussian processes (GP) another.

(35)

Beyond PCA and ICA

Kernel PCA –Component analysis in feature space x→Φ(x)

Nonlinear latent variable model x=Wf(z) +

Fully probabilistic (and Bayesian) rather than one hammer (SVD) for all data

Sparsity, Bayesian networks and latent variables models

−1 0 1

−1 0 1

0.5 1

(36)

Summary and reading

PCA and SVD

Generative model – continuous latent variables

Probabilistic PCA and ICA

Non-linear and Bayesian extensions

Books: C. Bishop, Pattern Recognition and Machine Learning, Springer and D.

MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge

Referencer

RELATEREDE DOKUMENTER

Høringsuttalelser er relativt jevnt fordelt mellom de 21 regionrådene i analysen, selv om enkelte regionråd ikke har noen slike (i alle fall ikke i 2009). Høringsuttalel- sene

Sekretariatet har udmeldt en bagudrettet indtægtsramme for reguleringsåret 2009, hvori sekretariatet udmeldte en forhøjelse af indtægtsrammen for 2009 på kr. 4.874.221

Der er her tale om unge med psykiatriske vanskeligheder, så som ADHD og/eller andre adfærdsforstyrrelser, men også unge med meget problematiske re- aktionsmønstre, der ikke

The table below shows the development of activity for validation and recognition of prior learning in 2008 and 2009, based on data from the Ministry of Education in

– Hvor de ekstroverte usikre medarbejdere får energi af at være sammen med andre og selv søger feedback fra lederen på, hvordan de frem- står og præsterer, så de selv kan

- Jeg gør meget ud af at være her for kunderne. Nogle mødre kommer ned til mig, fordi de ved, at jeg holder lidt øje med de unge, som hænger ud foran min kiosk. Jeg siger til

This section asks about your financial relationships with entities in the bio-medical arena that could be perceived to influence, or that give the appearance of

This section asks about your financial relationships with entities in the bio-medical arena that could be perceived to influence, or that give the appearance of