Senest søgte

Ingen resultater fundet

Tags

Ingen resultater fundet

Dokument

Ingen resultater fundet

Hjem Skoler Emner

Log på

Variational Bayesian Factor Analysis

In document 8=HE=JE= )FFH=?D J .=?JH )=OIEI =@ 4A=JA@ @AI (Sider 38-46)

Figure 3.2: Graphical model for variational Bayesian factor analysis The model proposed for variational Bayesian factor analysis (VBFA) can be seen on gure 3.2. Compared to MLFA we have now placed distributions on the factor loading matrix and the noise covariance. The noise covariance matrix Ψ in MLFA is replaced by its inverse, the noise precision matrix ϕ = Ψ⁻¹. In order to do automatic latent dimensionality determination a hierarchical prior is used on the factor loading matrixA. Theαnode serves as regularization parameter that eectively can shut o unneeded factors.

The priors in the model are dened as

p(A|α) = YK

k=1

N¡

a_k|0,_α¹

kI¢

(3.15)

p(α|a^α,b^α) = YK

k=1

G¡

α_k|a^α_k, b^α_k¢

(3.16)

p(ϕ|a^ϕ,b^ϕ) = Yp

j=1

G¡

ϕ_j|a^ϕ_j, b^ϕ_j¢

(3.17) where a_k denotes a column vector that corresponds to the k^th column in A. We have an isotropic Gaussian on each column in A, where the hyper-parameter α_k controls the precision. Since each column in A is given zero mean, what will happen ifα_kgoes to innity is that the variance for thek^th column will go to zero. When the variance for thek^thcolumn goes to zero we eectively shut down the k^th factor. One can think of this as an ARD prior that can switch o unneeded factors, where the factors are the inputs to the system. Both α and ϕ are precisions, so they are given factorized Gamma priors. Remember thatϕis a diagonal matrix and the prior is therefore only specied for the diagonal elements.

3.2. VARIATIONAL BAYESIAN FACTOR ANALYSIS 27 The log marginal likelihood for the model we lower bound as

L = ln

whereq(S,A,α,ϕ)is our approximation to the true posteriorp(S,A,α,ϕ|X) which is not analytically tractable. To derive an VBEM algorithm we need to the assume that the hidden variables and the parameters are independent q(S,A,α,ϕ) ≈q(S)q(A,α,ϕ). Furthermore we approximate q(A,α,ϕ)≈ q(A)q(α)q(ϕ) so the resulting variational posterior is fully factorized. This further approximation signicantly eases the derivation since now all varia-tional posteriors are on the same form as the conjugate priors.

The complete log likelihood for the model is given by

L(θ,S) = N

To derive the VBE-step we proceed by writing up the expected complete log likelihoodhL(θ,S)i_q(θ) with respect to the parameter posteriors and neglect terms not depending on the hidden variables

hL(θ,S)i_q(θ)= from which we can infer thatq(S) is of the form

q(S) =

where

To compute the expectationhA^>ϕAi we use the fact thatϕ is diagonal hA^>ϕAi=

Xp

j=1

hϕ_jiha_ja^>_j i (3.24) wherea_j is a column vector corresponding to the j^th row of A. Note that the VBE-step is equivalent to the E-step in MLFA, but with expected values for the parameters. This concludes the VBE-step.

The VBM-step

To derive the VBM-step we need to write the expected complete log likeli-hoodhL(θ,S)i_q(θ_j6=i_)q(S)with respect to all other variational posteriors. The derivations follow below.

The q(A) distribution

In order to estimateq(A) we start from equation (3.19) and retain only the terms that containA. To infer the posterior we furthermore have to write the expected complete likelihood as a sum over the rows ofA. Leta_j denote a column vector that corresponds to thej^th row in A and let α = diag [α]

from which we can infer thatq(A)is of the form q(A) =

3.2. VARIATIONAL BAYESIAN FACTOR ANALYSIS 29

In order to estimate q(α) we start again from equation (3.19) and retain only the terms that contain α. Thek^th column of Ais denoted by a_k from which we can infer thatq(α) is of the form

q(α) =

We can compute the expectationha^>_ka_ki by

ha^>_ka_ki=

In order to estimate q(ϕ) we start again from equation (3.19) and retain only the terms that contain ϕ.

hL(θ,S)iq(A)q(α)q(S)= from which we can infer thatq(ϕ)is of the form

q(ϕ) =

We can compute the expectationD

(x_ji−a^>_j s_i)² This concludes the VBM-step.

Hyperparameter Optimization

The model has hyperparameters{a^ϕ_j, b^ϕ_j}and hyperhyperparameters{a^α_k, b^α_k}. We can optimize these by writing the expected complete log likelihood as a function of only these parameters, dierentiate and set to zero. Writing the expected complete log likelihood as a function of{a^ϕ_j, b^ϕ_j}and neglecting constant terms we get the xed point equations

hL(a^ϕ_j, b^ϕ_j)i_q(θ)q(S) = a^ϕ_j lnb^ϕ_j − ln Γ(a^ϕ_j) + (a^ϕ_j −1)hlnϕ_ji −b^ϕ_jhϕ_ji

3.2. VARIATIONAL BAYESIAN FACTOR ANALYSIS 31 where ψ(x) = _∂x^∂ ln Γ(x) is the digamma function and hlnϕ_ji = ψ(ba^ϕ_j)− lnbb^ϕ_j. Since the priors for α and ϕ are of same type, the xed point equa-tions for {a^α_k, b^α_k}are identical.

We can solve the xed point equations by Newton-Raphson, but we must assure that the Newton-Raphson iterations yield a solution where both a^ϕ_j andb^ϕ_j remains positive. This we can overcome by performing the iterations in the exponential domain. In this way subtraction becomes multiplication and we are assured that we nd a solution with positive values. The iterations are given by whereψ⁰(·) is the derivative of the digamma function. When a^ϕ_j,new is esti-mated we can simply insert in equation (3.41) to nd the update forb^ϕ_j,new. If we propose that the model should have equivalent priors, meaning that all a^ϕ_j =a^ϕ_j0 and b^ϕ_j =b^ϕ_j0 we simply just replace hlnϕ_ji by ¹_pP_p

j=1hlnϕ_ji and hϕ_ji by ¹_pP_p

j=1hϕ_ji in equation (3.42). In chapter 4, the eect of hyperpa-rameter optimization is investigated for both type of priors.

Calculation of F_m

We need to calculate the lower bound F_m if we want to use it as guide for model selection and to monitor convergence. The functional can be written

F_m =hlnp(X|S,A,α,ϕ)iq(S)q(A)q(α)q(ϕ)

−KL(q(S)||p(S))− hKL(q(A)||p(A|α))i_q(α)

−KL(q(α)||p(α|a^α,b^α))−KL(q(ϕ)||p(ϕ|a^ϕ,b^ϕ))

(3.43)

where the individual terms can be calculated by

hlnp(X|S,A,α,ϕ)iq(S)q(A)q(α)q(ϕ)=

hKL(q(A)||p(A|α))i_q(α)=−p 2

XK

k=1

³

ψ(ba^α_k)−lnbb^α_k

´

−1 2

Xp

j=1

³

ln|Σ^(j)_a |+ tr h

I−

³

Σ^(j)_a +m^(j)_a m^(j)>_a

´ hαi

i´

(3.46)

KL(q(α)||p(α|a^α,b^α) = XK

k=1

Ã b

a^α_klnbb^α_k−a^α_klnb^α_k−lnΓ(ba^α_k) Γ(a^α_k)

+b^α_khα_ki −ba^α_k+ (ba^α_k −a^α_k)(ψ(ba^α_k)−lnbb^α_k)

!(3.47)

KL(q(ϕ)||p(ϕ|a^ϕ,b^ϕ) = Xp

j=1

Ã b

a^ϕ_j lnbb^ϕ_j −a^ϕ_jlnb^ϕ_j −lnΓ(ba^ϕ_j) Γ(a^ϕ_j)

+b^ϕ_jhϕ_ji −ba^ϕ_j + (ba^ϕ_j −a^ϕ_j)(ψ(ba^ϕ_j)−lnbb^ϕ_j)

!(3.48)

3.2. VARIATIONAL BAYESIAN FACTOR ANALYSIS 33

A

A^vb

Φ

Φ^vb

0 200 400 600 800 1000

10⁻⁴ 10⁻² 10⁰ 10² 10⁴

log[∂F/∂t]

200 400 600 800 1000

10⁻² 10⁰

1 / α

Figure 3.3: Upper left show the reference factor loading matrix and noise pre-cisions. Bottom left show the learnt factor loading matrix and noise prepre-cisions.

Upper right show the log to the dierence of Fm during 10³ iterations. Plotting Fm in this way its more clear when the bound increases. Bottom right show the inverse precisions (variances) on the columns ofA. Notice thatFmincreases faster when a column is on its way to be shut down.

VBFA in Action

To demonstrate the model I created a reference factor analysis model with precisions drawn from Gamma distributions and a factor loading matrix drawn from a Gaussian distribution controlled by the precisions. From this model I generated a p = 10 dimensional training set of size N = 10³. The reference latent dimensionality wasK= 3. No hyperparameter optimization was invoked. The hyperparameters were set asa^α_k =b^α_k =b^ϕ_j =b^ϕ_j = 10⁻³ corresponding to non-informative priors for all practical purposes. The algo-rithm was run for10³iterations with a maximum dimensionality ofk_max = 6 and a random initialization. Figure 3.3 summarize the results. The learnt latent dimensionality is K = 3 since only three columns are left back. No-tice also that the columns in the learnt factor loading matrix resemble the reference model (no rotation indeterminacy). In chapter 4 we will discuss this further.

3.3 Maximum Likelihood Extended Factor

In document 8=HE=JE= )FFH=?D J .=?JH )=OIEI =@ 4A=JA@ @AI (Sider 38-46)

Hent nu "8=HE=JE= )FFH=?D J..."

Outline

RELATEREDE DOKUMENTER